Version française
Aïe ! robot !

The "robots.txt" file"

The robots.txt file is intended for the "spiders". Spiders are programs which explore the Web and enable search engines to discover your site and to analyze its contents.
Accueil > frobotseng

By leaving instructions to these spiders, you can:
- prohibit the exploration of your site to some spiders (also called "agents" or "bots")
- prohibit the exploration of certain pages of your site to the spiders,
- prohibit the exploration of certain pages to some spiders.

Note that the "robots" meta tag can also be added in each one of your pages to prohibit indexing.

How to create the robot.txt file Syntax

The syntax accepted by the robots offers a minimum of flexibility:
- spaces are optional
- the use of capital or lower case doesn't matter (It is not "case sensitive")

Every line must start with one of the 3 following options:

#

Comment. This will be ignored by robots.

User-Agent:

This mention can be followed by * or by the exact name of an existing spider.

Disallow:

This mention can be followed by only ONE repertory or file name.

Typical syntax is as follow:

User-Agent: AAAAAAA
Disallow: BBBBBBB
Disallow: CCCCCC

User-Agent: AAAAAAA'
Disallow: BBBBBBB'
Disallow: CCCCCC

etc.......

where AAAAAAA and AAAAAAA' are the names of the robots and BBBBBBB, BBBBBBB ', CCCCCCC and CCCCCCC' the names of the files and/or repertories that you wish to hide to these robots.

If you use the * instead of the name of a robot, the following lines will be regarded as prohibitions of indexing for ALL the robots.

If you use the "/" instead of the file name, NO file of the site will be indexed.

Construct the robot.txt file Building the robots.txt file

LinkSpirit is a free utility, downloadable on this site, that enables you to create or edit easily "robots" Meta-tags and the "robot.txt" file.

This utility carries out a checking of the syntax of your Robots.txt file by taking into account the rules appearing on: http://www.robotstxt.org/wc/norobots.html and the robots listed on: http://www.robotstxt.org/wc/active/html/index.html.

If you wish to proceed manually, you just need a text editor (Wordpad for example) to create a text file (with the .txt extension).
Here is a typical example of what can be the contents of the file robots.txt.

User-Agent: *
Disallow: /download/dwnld.php
Disallow: /sources/
Disallow: /admin/perso

a) User-agent: * indicates to the spider of any search engine that the access to the site is subjected to the following limitations:
b) Disallow:/download/dwnld.php the page "dwnld.php" located in the "download" directory can't be indexed
c) Disallow:/sources/ none of the files contained in the "sources" repertory can be indexed.
d) Disallow:/admin/perso/ none of the files contained in the "admin/perso" repertory can be indexed.

Note: During the transfer of this file to your FTP, be sure to use the ASCII transfert mode.

Rules to follow General rules

a) Only one robots.txt file must exist on the whole of your site. It must be located at the root of it.
b) If you wish to impose different rules for each search engine, you can (and you must) create several User-agent sections.
c) The name of the file (robots.txt) must be imperatively be written in low case.
d) Register only one directory or one file name behind the Disallow order. Disallow syntax: "file1.htm, file2.htm" is not authorized neither is "Dir1/, Dir2/".
E) Transfer your robots.txt file in ASCII mode. Many ftp-client modify the code of the txt files when they are not transferred in ASCII mode. This is the cause of the most frequently encountered problems with the robots.txt file.

Rules for all engines Standard rules

a) the asterisk (*) is only accepted in the User-agent field. "Disallow: *" or "Disallow: *.*" or "Disallow: *.gif" is not authorized
b) "Allow" field does not exist.

Rules for Google Google rules

a) Asterisk (*) and Dollars ($) can be used in the Disallow field. They enable to hide all the files of a particular type. "Disallow: /*.gif$" will hide all the gif files
b) The "Allow" field exists and enables to make exceptions to a general prohibition.

CAUTION: "Google rules" can make your robots.txt file incomprehensible to other robots if they appear in a "User-agent: *" zone. Thus allways put those particular instructions after "User-agent: Googlebot".

 

Read also our

meta tags description




If you copy the content of this article onto your own web site, please be fair and add this link to your page :

Original article by: <a href="http://www.rankspirit.com">RankSpirit, creating your web site</a>. Discover other articles from this site!

The design and pictures site of this site are protected and can't be duplicated.

Valid HTML 4.01 Transitional

Demandez un devis à l’agence SEO.fr (15 ans d’expérience)