Today we’ll talk about the robots.txt file and why it is needed on your website.
What is robots.txt?
robots.txt is a simple txt line of commands which help search engines bot, Which web pages has to crawl and which not. And it also help search engines bot to index your web pages in better manner.
Why robots.txt needed?
robots.txt is very important, If your website has some secure pages which you do not want to show to world, If you don’t use robots.txt file in your website then search engine bots will understand that there is no restriction to crawl all your web pages, So it may index your admin pages in search engine.
Now Lets talk about some basic commands of robots.txt
Create robots.txt file and place it in your website root folder for exp: http://www.example.com/robots.txt
Add below commands in your robots.txt file, If you don’t allow search engine bot to crawl your website pages.
User-agent: * Disallow: /
The “User-agent: *” means this section applies to all robots. The “Disallow: /” tells the robot that it should not visit any pages on the site.
But if you want to apply this command for a specific bot like msn bot should not crawl my web pages then use below line of command.
User-agent: msnbot Disallow: /
You can also disallow specific pages and folder add this command in your robots.txt file.
User-agent: * Disallow: /wp-admin
Above command shows that i don’t want to crawl my wordPress admin panel in search engines.
Suppose you have disallowed your whole website don’t crawl but wants some pages should crawl then use below commands.
User-agent: * Allow: /docs Allow: /reports
Add xml site map path in your robots.txt file so that search engine boats can understand whole architecture of your website.
Hope this simple tutorial will help you to understand basic use of robots.txt file to handle your search engine presence. You can study more about robots.txt file from here http://www.robotstxt.org/