tl;dr: Support WebSub and Webmention on a static site
lazymention is a daemon that you run on a server. When you publish to your static IndieWeb site, you ping lazymention, and it will poll your site, parse the microformats2 markup, and send Webmentions and WebSub notifications for content it hasn't seen before.
Webmention functionality is currently done. WebSub is planned for the future.
$ npm install -g --production lazymention
lazymention makes some maintainability tradeoffs that may result in denial-of-service problems if you allow input from sites you don't trust. Specifically:
- lazymention buffers all HTML pages in memory before parsing microformats2. In the future it may also retrieve multiple HTML pages per job. This may lead to high memory usage.
- lazymention's handling of microformats2 structures within pages is rather brittle. For example, if you mark an element as both an
h-entry
and anh-feed
, lazymention will probably crash. I will take PRs to fix these problems, especially those caused by nonmalicious markup, but in the meantime your process will be very dead. - lazymention does not bother to check if it's sending Webmentions to
localhost
or loopback addresses. See the Security Considerations section of the Webmention spec.
Please be aware of these constraints before running lazymention in production. The domains
option should help with this.
Also, lazymention has no idea how to recurse into subpages of indexes. In the real world, I expect that this will not actually be a real problem.
lazymention will install a command called, fittingly, lazymention
. You need to provide, at minimum, a storage location (you can do this on the CLI by passing -s
), but not configuring anything else will give you a daemon running on port 8000, which is probably no good.
You can provide configuration options in three ways: CLI flags, environment variables, and JSON configuration files - CLI flags override environment variables and environment variables override JSON configs. The default config file location is /etc/lazymention.json
, but you can override this by passing -c <path_to_config.json>
. There's a sample file called lazymention.json.sample
but be sure to actually look at it; lazymention won't work for you if you don't adjust it a little bit.
If you configure with environment variables, just prefix the configuration values with LAZYMENTION_
. CLI flags are just prefixed with --
, as you'd expect - just run lazymention --help
if this is confusing. You can pass more than one value for --domains
(e.g. --domains example.com example.net
), as well as passing --domains
more than once (e.g. --domains example.com --domains example.net
). If you need to specify more than one domain via environment variables, separate them using ,
.
You should also set NODE_ENV=production
in the environment, regardless of how you're configuring lazymention.
Here's what you can configure:
Name | Description | Type | Default |
---|---|---|---|
storage |
Where to put data. Must be read/writable by the server user. | String | None |
port |
Port for the HTTP server to bind to. | Number | 8000 |
address |
Address for the HTTP server to bind to. | String | 0.0.0.0 |
logger |
Whether to have the logger write logs. | Boolean | True |
logfile |
Where to write logs to. | String | stdout |
logLevel |
Bunyan loglevel for the logger. | String | info |
domains |
Whitelist of domains to allow in job submissions. | Array | None |
The storage
and domains
options are required. If you don't want to use a whitelist you can set domains
to []
, but you're strongly discouraged from doing so because of the security implications. In fact, the reason you have to set this manually is to make sure you know what you're doing.
There's exactly one API endpoint, /jobs/submit
. To trigger lazymention, POST to this endpoint with a JSON payload in the body. The JSON should contain a url
key whose value is the URL to crawl.
If everything goes well, you'll get back 202 Accepted. If something went wrong, you'll get back 400 Bad Request, unless the domain of the URL you provided isn't in the whitelist in which case you'll get 403 Forbidden.
In the future the reponse will contain a URL that lets you check the status of your job, but in the meantime all you get back is a lame 2xx status code.
There's currently one API client, ping-lazymention
. It's nice for gulp tasks.
Have you written another client? Please send a Pull Request updating this section!
lazymention has an embedding API, in case you want to reuse its functionality in another Node app. You can access both the overall Express application or just the job submission route.
It is safe to concurrently embed unlimited instances of either in the same process so long as they don't share the same data directory.
Note: your app would have to be AGPL 3.0 or later to comply with the licensing terms, but if this is a big problem I might consider relaxing the requirements so just ping me.
Available as require('lazymention').makeApp()
. You need to pass three arguments. In order:
- A configurations object that contains, at minimum, a
domain
key with the appropriate value - A Bunyan
Logger
object (or compatible) - A persistence factory function (see below)
Not providing these options as specified will raise AssertionError
.
The Express app is configured to parse JSON bodies and to compress response bodies. You can't disable the former because it's required for anything useful to happen and you can't disable the latter because that's a bad idea.
req.log
is configured to be a Bunyan child log and req.config
is set to the app's config. The API is mounted at /jobs/submit
, just like the standalone daemon.
Available as require('lazymention').makeRouter()
. You need to pass a single argument, a persistence factory function (see below). Not providing this argument will raise AssertionError
.
The Router expects certain things:
- A
req.body
with something useful in it (how to populate this is up to you) - A Bunyan
Logger
(or compatible) available asreq.log
- A configuration object available as
req.config
and containing, at minimum, adomains
key
Everything else is up to you.
To work with the embedding API you are required to provide some layer for persistence. You can do whatever you want in this layer; the default implementation reads and writes flat files. You can either reuse the default or implement your own.
You can get to the default implementation with require('lazymention').persistence(<path>)
. Calling this will return a factory function configured with the filesystem <path>
you provided.
If you don't want to use the default implementation, you need to write a factory function. This function will be passed a single parameter, a child of the Bunyan Logger
object you passed. Keep in mind that it will be invoked with a different Logger
for each request, so connecting out to a database or whatever each and every time this is invoked is probably a terrible idea.
When invoked, your factory function should return an object with two keys, get
and set
. set
must be a function that takes a namespace (string), key (string), an object to set as the value of that key, and a callback with signature (err<Error>)
. get
must be a function that takes a namespace, a lookup key and a callback with signature (err<Error>, data<Object>)
, data
being the object that was set with set
.
Values are guaranteed to be JSON-serializable but otherwise there are no restrictions on what they can be.
AJ Jordan alex@strugee.net
AGPL 3.0 or later