Websites often consist of various elements which all amalgamate together to build the page you are shown on your browser.  In addition there often pages which provide online services or access to information which is either private or requires certain additional details before a site user will be allowed in.  Elements that create a site, such as images or code, are often stored away in directories.  This is the same for pages that are supposed to be private or secret.

When a search engine spider first hits your site, it will go in search of a file at the location

www.yoursite.com/robots.txt

A Robots.txt file is a simple text file which instructs a searching bot to not index certain directories found on a domain. It should be kept in mind that every element of a page which goes on to a server has an address, where it can individually found (an example: www.yoursite.com/images/welcomebanner.jpeg). Directories which contain images can be blocked so to prevent them showing in SERP’s as an alone image.  This ensures that when a search is made relevant to your site, actual pages are shown as opposed to random alone images or externalised coding.  Furthermore, by disallowing directories containing pages you do not want to appear in SERP’s you can keep private pages, private.

How to Create and Use a Robots.txt File

To create a Robots.txt file you simply need to open your preferred text editor and save a blank file with the name robots.txt. Sorted!

Now we can start looking at what you should write in the file to instruct the searching bots.

The first piece of text you will need to master is the “User-Agent:” code.

This allows the site owner to specify bots or provide instructions for all bots which read and take notice of the file.  In order to indicate that you want all bots to follow the later instructions you simply add the * symbol.  Your code should look as follows:

User-Agent: *

To specify an individual bot you will have to be aware of a reference for the bot, below is an example given for a Google Bot.

User-Agent: Google

Next, you want to know how to instruct a searching bot to not index certain directories.  To start, in order to block all pages of your site (which may be preferable if your site is still in development) simply enter:

Disallow: /

It is important to check when you’re up and running that this entry is removed, as it will prevent search bots from ever being able to index your site.

From here on out it is relatively simple to disallow further directories.  Simply add their extension from the site’s main URL after the Disallow: instruction.  Below are a few more examples.


It is worth keeping in mind that if you wanted you could simply make a directory with bits you do not want indexed and provide an instruction to simply not index everything found in that folder.

Disallow: /donotindex

So now all you have to do is save the file and upload it on to the server hosting your website.  This new file should sit in the root directory (the area where your homepage sits).

Sitemap: One Last Thing

Just to score maximum Brownie points, it is worth pointing at the end of your file, to your site’s sitemap with the command

Sitemap: http://www.yoursite.com/sitemap.xml

This will instruct spidering bots that there is a sitemap found at the location specified which will in turn assure that all pages of your site are indexed.

Lastly, Time to Test

To test that your robots.txt file is working properly, test it on Google’s Webmasters Tools.