clean html content for reading. simply pass in your content as html and get a readability object
$ yarn add clean-html-js
import cleanHtml from "clean-html-js";
const url = "https://www.a11ywatch.com";
async function grabReaderData() {
const source = await fetch(url);
const html = await source.text();
return await cleanHtml(html, url);
}
async function grabReaderDataSimple() {
return await cleanHtml("", url);
}
grabReaderData().then((data) => {
console.log(data);
});
// or just the url
grabReaderDataSimple().then((data) => {
console.log(data);
});
param | default | type | description |
---|---|---|---|
html | "" | string | Required: html string to parse |
sourceUrl | "" | string | Optional: url of the html source to prevent fetching extra resources |
config | {} | Config | Optional: config object |
If html is not provided and sourceUrl is found an attempt to fetch the html is done.
merges with config
prop | default | type | description |
---|---|---|---|
allowedTags | null | array of strings | html elements allowed note:(svgs must be inlined) |
nonTextTags | null | array of strings | html elements that should not be treated as text |
to test custom pages pass in your params seperated by commas into the jest test example yarn jest '-params=mozilla,https://www.mozilla.com'
or yarn jest '-params=a11ywatch,https://www.a11ywatch.com'
. First param is the html file being pulled from the examples
folder and the second is an optional uri for the resources.
npm test