Skip to content

Latest commit

 

History

History
214 lines (168 loc) · 7.13 KB

Directives.md

File metadata and controls

214 lines (168 loc) · 7.13 KB

Directives

Allow

The allow directive specifies paths that may be accessed by the designated crawlers. When no path is specified, the directive is ignored.

robots.txt:

allow: [path]

See also:

References:

Cache-delay

The Cache-delay directive specifies the minimum interval (in seconds) for a robot to wait after caching one page, before starting to cache another.

robots.txt:

cache-delay: [seconds]

Note: This is an unofficial directive.

Library specific: When the value is requested but not found, the value of Crawl-delay is returned, to maintain compatibility.

See also:

Clean-param

If page addresses contain dynamic parameters that do not affect the content (e.g. identifiers of sessions, users, referrers etc.), they can be described using the Clean-param directive.

robots.txt:

clean-param: [parameter]
clean-param: [parameter] [path]
clean-param: [parameter1]&[parameter2]&[...]
clean-param: [parameter1]&[parameter2]&[...] [path]

References:

Comment

Comments witch are supposed to be sent back to the author/user of the robot. It can be used to eg. provide contact information for white-listing requests, or even explain the robot policy of a site.

robots.txt:

comment: [text]

References:

Crawl-delay

The Crawl-delay directive specifies the minimum interval (in seconds) for a robot to wait after loading one page, before starting to load another.

robots.txt:

crawl-delay: [seconds]

See also:

References:

Disallow

The disallow directive specifies paths that must not be accessed by the designated crawlers. When no path is specified, the directive is ignored.

robots.txt:

disallow: [path]

See also:

References:

Host

If a site has mirrors, the host directive is used to indicate which site is main one.

robots.txt:

host: [host]

NoIndex

The noindex directive is used to completely remove all traces of any matching site url from the search-engines.

noindex: [path]

See also:

Request-rate

The request-rate directive specifies the minimum time of how often a robot can request a page, along with timestamps in UTC.

robots.txt:

request-rate: [rate]
request-rate: [rate] [time]-[time]

Library specific: When the value is requested but not found, the value of Crawl-delay is returned, to maintain compatibility.

See also:

References:

Robot-version

Witch Robot exclusion standard version to use for parsing.

robots.txt:

robot-version: [version]

Note: Due to the different interpretations and robot-specific extensions of the Robot exclusion standard, it has been suggested that the version number present is more for documentation purposes than for content negotiation.

References:

Sitemap

The sitemap directive is used to list URL's witch describes the site structure.

robots.txt:

sitemap: [url]

References:

User-agent

The user-agent directive is used as an start-of-group record, and specifies witch User-agent(s) the following rules should be applied to.

robots.txt:

user-agent: [name]
user-agent: [name]/[version]

References:

Visit-time

The robot is requested to only visit the site inside the given visit-time window.

robots.txt:

visit-time: [time]-[time]

See also:

References: