Robots.txt, Creating, Editing, Need, Use and Carefulness needed-Robots.txt, is a very small text file containing a few characters and phrases but its effect are very crucial. A wrong Robots.txr file can totally spoil your blog, if fact it instructs search engines how they crawl your blog. Using robots.txt file you can allow/disallow the search engines to crawl your entire site, a specific section or item, a specific search engine. Before adding a robots.txt to your blog you should be extremely careful because a wrong robots. Txt may spoil your blog’s traffic totally. If you don’t edit the robots.txt file of your blog, it will use the default robots.txt file that is fine and the search engines will crawl your entire blog. Please note carefully that robots.txt is always used only to block the search engines from crawling something in your blog
Robots.txt, Creating, Editing, Need, Use and Carefulness needed |
Before starting we must learn about the following -
User-agent: - This a declaration or instruction about search engine name/s. Here you have to use the technical names of the search engines such as use ‘googlebot’ for Google search engine, ‘bingbot’ for Bing search engine and ‘*’(an asterisk) for all. Understand it with examples-(i) User-agent: googlebot (ii) User-agent: bingbot (iii) User-agent: * etc. If you don’t want to block any search engine, there is no need to edit or use ‘*’(The asterisk)’.
Disallow: - This the instruction to the search engines to neither crawl nor index your pages. This is used below User-agent:. For example if you write Disallow: /search, It means you are disallowing crawling of your entire blog.
Allow: – If you write Allow: /, this is an instruction to the search engine to allow search to search pages.
How to use robots.txt to block search engines from crawling
To block crawling of a particular blog post - Disallow: /“copy and paste the post URL here”.html
To block crawling of a particular blog page - Disallow: //“copy and paste the page URL here”.html
If you want to get indexed and crawled everything(entire contents) by all search engines your robots.txt file must be like this. I have given two files but the effect the both is same because both says all user agents. One uses ‘Disallow’ but indicates none of the search engines and other uses ‘Allow’ and indicates all search engines -
(1) User-agent: *
Disallow:
OR
(2) User-agent: *
Allow: /
If you don’t want to get indexed and crawled anything(entire contents) by all search engines your robots.txt file must be like this.
User-agent: *
Disallow: /
If you don’t want to get indexed and crawled a specific post or page your robots.txt file must be like this.
User-agent: *
Disallow: /URL here/
If you have a folder containing 5 files and you want that only one file is to be crawled and 4 must not be crawled and indexed your robots.txt file must be like this.
User-agent: Googlebot
Disallow: /folder/
Allow: /folder/file.html
No comments:
Post a Comment