Skip to content

Commit

Permalink
Docs: Add information about the robots.txt file
Browse files Browse the repository at this point in the history
  • Loading branch information
arthurvr authored and alrra committed Sep 19, 2014
1 parent b784b61 commit d7682c2
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 0 deletions.
26 changes: 26 additions & 0 deletions dist/doc/misc.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ table of contents](TOC.md)
* [.editorconfig](#editorconfig)
* [Server Configuration](#server-configuration)
* [crossdomain.xml](#crossdomainxml)
* [robots.txt](#robotstxt)

--

Expand Down Expand Up @@ -149,3 +150,28 @@ to the source domain and allow the client to continue with the transaction.

For more in-depth information, please see Adobe's [cross-domain policy file
specification](http://www.adobe.com/devnet/articles/crossdomain_policy_file_spec.html).


## robots.txt

The `robots.txt` file is used to give instructions to web robots on what can
be crawled from the website.

By default, the file provided by this project includes the next two lines:

* `User-agent: *` - the following rules apply to all web robots
* `Disallow:` - everything on the website is allowed to be crawled

If you want to disallow certain pages you will need to specify the path in a
`Disallow` directive (e.g.: `Disallow: /path`) or, if you want to disallow
crawling of all content, use `Disallow: /`.

The '/robots.txt' file is not intended for access control, so don't try to
use it as such. Think of it as a "No Entry" sign, rather than a locked door.
URLs disallowed by the `robots.txt` file might still be indexed without being
crawled, and the content from within the `robots.txt` file can be viewed by
anyone, potentially disclosing the location of your private content! So, if
you want to block access to private content, use proper authentication instead.

For more information about `robots.txt`, please see:
[robotstxt.org](http://www.robotstxt.org/).
26 changes: 26 additions & 0 deletions src/doc/misc.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ table of contents](TOC.md)
* [.editorconfig](#editorconfig)
* [Server Configuration](#server-configuration)
* [crossdomain.xml](#crossdomainxml)
* [robots.txt](#robotstxt)

--

Expand Down Expand Up @@ -149,3 +150,28 @@ to the source domain and allow the client to continue with the transaction.

For more in-depth information, please see Adobe's [cross-domain policy file
specification](http://www.adobe.com/devnet/articles/crossdomain_policy_file_spec.html).


## robots.txt

The `robots.txt` file is used to give instructions to web robots on what can
be crawled from the website.

By default, the file provided by this project includes the next two lines:

* `User-agent: *` - the following rules apply to all web robots
* `Disallow:` - everything on the website is allowed to be crawled

If you want to disallow certain pages you will need to specify the path in a
`Disallow` directive (e.g.: `Disallow: /path`) or, if you want to disallow
crawling of all content, use `Disallow: /`.

The '/robots.txt' file is not intended for access control, so don't try to
use it as such. Think of it as a "No Entry" sign, rather than a locked door.
URLs disallowed by the `robots.txt` file might still be indexed without being
crawled, and the content from within the `robots.txt` file can be viewed by
anyone, potentially disclosing the location of your private content! So, if
you want to block access to private content, use proper authentication instead.

For more information about `robots.txt`, please see:
[robotstxt.org](http://www.robotstxt.org/).

0 comments on commit d7682c2

Please sign in to comment.