Directives

Allow
Cache-delay
Clean-param
Comment
Crawl-delay
Disallow
Host
NoIndex
Request-rate
Robot-version
Sitemap
User-agent
Visit-time

Allow

The allow directive specifies paths that may be accessed by the designated crawlers. When no path is specified, the directive is ignored.

robots.txt:

allow: [path]

See also:

Disallow
NoIndex

References:

Google robots.txt specifications
Yandex robots.txt specifications
Sean Conner: "An Extended Standard for Robot Exclusion"
Martijn Koster: "A Method for Web Robots Control"

Cache-delay

The Cache-delay directive specifies the minimum interval (in seconds) for a robot to wait after caching one page, before starting to cache another.

robots.txt:

cache-delay: [seconds]

Note: This is an unofficial directive.

Library specific: When the value is requested but not found, the value of Crawl-delay is returned, to maintain compatibility.

See also:

Crawl-delay
Request-rate

Clean-param

If page addresses contain dynamic parameters that do not affect the content (e.g. identifiers of sessions, users, referrers etc.), they can be described using the Clean-param directive.

robots.txt:

clean-param: [parameter]

clean-param: [parameter] [path]

clean-param: [parameter1]&[parameter2]&[...]

clean-param: [parameter1]&[parameter2]&[...] [path]

References:

Yandex robots.txt specifications

Comment

Comments witch are supposed to be sent back to the author/user of the robot. It can be used to eg. provide contact information for white-listing requests, or even explain the robot policy of a site.

robots.txt:

comment: [text]

References:

Sean Conner: "An Extended Standard for Robot Exclusion"

Crawl-delay

The Crawl-delay directive specifies the minimum interval (in seconds) for a robot to wait after loading one page, before starting to load another.

robots.txt:

crawl-delay: [seconds]

See also:

Cache-delay
Request-rate

References:

Yandex robots.txt specifications

Disallow

The disallow directive specifies paths that must not be accessed by the designated crawlers. When no path is specified, the directive is ignored.

robots.txt:

disallow: [path]

See also:

Allow
NoIndex

References:

Google robots.txt specifications
Yandex robots.txt specifications
W3C Recommendation HTML 4.01 specification
Sean Conner: "An Extended Standard for Robot Exclusion"
Martijn Koster: "A Method for Web Robots Control"
Martijn Koster: "A Standard for Robot Exclusion"

Host

If a site has mirrors, the host directive is used to indicate which site is main one.

robots.txt:

host: [host]

NoIndex

The noindex directive is used to completely remove all traces of any matching site url from the search-engines.

noindex: [path]

See also:

Allow
Disallow

Request-rate

The request-rate directive specifies the minimum time of how often a robot can request a page, along with timestamps in UTC.

robots.txt:

request-rate: [rate]

request-rate: [rate] [time]-[time]

Library specific: When the value is requested but not found, the value of Crawl-delay is returned, to maintain compatibility.

See also:

Cache-delay
Crawl-delay
Visit-time

References:

Sean Conner: "An Extended Standard for Robot Exclusion"

Robot-version

Witch Robot exclusion standard version to use for parsing.

1.0
2.0 (draft)

robots.txt:

robot-version: [version]

Note: Due to the different interpretations and robot-specific extensions of the Robot exclusion standard, it has been suggested that the version number present is more for documentation purposes than for content negotiation.

References:

Sean Conner: "An Extended Standard for Robot Exclusion"

Sitemap

The sitemap directive is used to list URL's witch describes the site structure.

robots.txt:

sitemap: [url]

References:

Google robots.txt specifications
Yandex robots.txt specifications
Sitemaps.org protocol

User-agent

The user-agent directive is used as an start-of-group record, and specifies witch User-agent(s) the following rules should be applied to.

robots.txt:

user-agent: [name]

user-agent: [name]/[version]

References:

Google robots.txt specifications
Yandex robots.txt specifications
W3C Recommendation HTML 4.01 specification
Sean Conner: "An Extended Standard for Robot Exclusion"
Martijn Koster: "A Method for Web Robots Control"
Martijn Koster: "A Standard for Robot Exclusion"

Visit-time

The robot is requested to only visit the site inside the given visit-time window.

robots.txt:

visit-time: [time]-[time]

See also:

Request-rate

References:

Sean Conner: "An Extended Standard for Robot Exclusion"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Directives.md

Directives.md

Directives

Allow

Cache-delay

Clean-param

Comment

Crawl-delay

Disallow

Host

NoIndex

Request-rate

Robot-version

Sitemap

User-agent

Visit-time

Files

Directives.md

Latest commit

History

Directives.md

File metadata and controls

Directives

Allow

Cache-delay

Clean-param

Comment

Crawl-delay

Disallow

Host

NoIndex

Request-rate

Robot-version

Sitemap

User-agent

Visit-time