Crawl a microformats2 site to find things like canonical URLs for h-entry
s
Note: this module does not really handle pages with more than one top-level microformats2 nodes.
npm install crawl-mf2
Start a crawl and log canonical h-entry URLs found on https://strugee.net/blog/
:
var crawl = require('crawl-mf2');
var crawler = crawl('https://strugee.net/blog/');
crawler.on('h-entry', function(url, mf2node) {
console.log(url);
});
The module exports a single function, crawlMf2
, which takes a single argument, the base URL to crawl from.
It returns an EventEmitter
.
Emitted when an error occurs. Currently this means either the microformats2 parser failed or an HTTP error occurred.
Note: treated specially by Node.js.
String
The URL being discovered
Emitted when a new URL is discovered, including the initial base URL.
String
The URL being parsedObject
The parsed microformats2 node, returned bymicroformat-node
's.get()
Emitted when a URL is parsed for microformats2 markup.
String
The URL containing theh-feed
Object
The parsed microformats2 node, returned bymicroformat-node
's.get()
Emitted when an h-feed
page is discovered.
String
The URL containing theh-entry
Object
The parsed microformats2 node, returned bymicroformat-node
's.get()
Emitted when an h-entry
page is discovered.
LGPL 3.0+
AJ Jordan alex@strugee.net