-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Sitemap Generation
See original documentation at: http://gio.blog.archive.org/2015/02/02/ol-sitemap-generation/ http://gio.blog.archive.org/2015/02/04/ol-how-to-generate-the-sitemaps/
Sitemaps are generated on ol-home
using the latest data dump ol_dump.txt.gz
as the source file. See Generating Data Dumps for more on generating ol_dump
The last dump is available at: http://openlibrary.org/data/ol_dump_works_latest.txt.gz for more details, you can see https://openlibrary.org/developers/dumps.
To generate the Sitemap, execute this code on ol-home:
ssh ol-home
cd /1/var/tmp # this way you can rsync it from rsync://ol-home/var_1/tmp/
python /1/var/lib/openlibrary/deploy/openlibrary/openlibrary/data/sitemap.py ol_dump_works_latest.txt.gz
# or is this the right script? /opt/openlibrary/openlibrary/scripts/2009/01/sitemaps/sitemap.py
After the sitemap is generated you need to place it in /1/var/lib/openlibrary/sitemaps as defined in /olsystem/etc/nginx/sites-available/openlibrary.conf on ol-www1.
location ~ ^/static/(docs|tour|sitemaps|jsondumps|images/shelfview|sampledump.txt.gz)(/.*)?$ {
root /1/var/lib/openlibrary/sitemaps;
autoindex on;
rewrite ^/static/(.*)$ /$1 break;
}
You can do this using rsync:
sudo rsync -av rsync://ol-home/var_1/tmp/sitemaps/sitemaps /1/var/lib/openlibrary/sitemaps/
curl -I https://openlibrary.org/static/sitemaps/siteindex.xml.gz
HTTP/1.1 200 OK
Server: nginx/1.1.19
Date: Wed, 04 Feb 2015 16:48:50 GMT
Content-Type: text/plain
Content-Length: 14689
Last-Modified: Wed, 04 Feb 2015 12:48:06 GMT
Connection: keep-alive
Accept-Ranges: bytes
#!/bin/bash
SITEINDEX="/1/var/lib/openlibrary/sitemaps/sitemaps/siteindex.xml.gz"
SERVERS="ol-covers0 ol-www0"
for SERVER in $SERVERS; do
LAST_UPDATED=$(ssh $SERVER ls -l --time-style=long-iso $SITEINDEX | cut -d' ' -f6)
echo "Sitemaps on $SERVER were last updated on ${LAST_UPDATED}."
done
echo "Ensure that the file dates on servers ($SERVERS) are the first day of the current month."
Sitemaps on ol-covers0 were last updated on 2023-07-01.
Sitemaps on ol-www0 were last updated on 2023-07-01.
Ensure that the file dates on servers (ol-covers0 ol-www0) are the first day of the current month.
Getting Started & Contributing
- Setting up your developer environment
- Using
git
in Open Library - Finding good
First Issues
- Code Recipes
- Testing Your Code, Debugging & Performance Profiling
- Loading Production Site Data ↦ Dev Instance
- Submitting good Pull Requests
- Asking Questions on Gitter Chat
- Joining the Community Slack
- Attending Weekly Community Calls @ 9a PT
- Applying to Google Summer of Code & Fellowship Opportunities
Developer Resources
- FAQs: Frequently Asked Questions
- Front-end Guide: JS, CSS, HTML
- Internationalization
- Infogami & Data Model
- Solr Search Engine Manual
- Imports
- BookWorm / Affiliate Server
- Writing Bots
Developer Guides
- Developing the My Books & Reading Log
- Developing the Books page
- Understanding the "Read" Button
- Using cache
- Creating and Logging into New Users
- Feature Flagging
Other Portals
- Design
- Librarianship
- Communications
- Staff (internal)
Legacy
Old Getting Started
Orphaned Editions Planning
Canonical Books Page