Temme is a concise and convenient selector to extract JSON from HTML documents.
<!-- html used below -->
<ul>
<li data-fruit-id="1">
<span data-color="red">apple</span>
</li>
<li data-fruit-id="2">
<span data-color="white">pear</span>
</li>
<li data-fruit-id="3">
<span data-color="purple">grape</span>
</li>
</ul>
We could the following temme selector to extract an array of fruit color and name against the above html. (Online Version)
import temme from 'temme'
// Note the `.default` if you are using commonjs
// const temme = require('temme').default
const selector = `li@fruits {
span[data-color=$color]{$name};
}`
temme(html, selector)
//=>
// {
// "fruits": [
// { "color": "red", "name": "apple" },
// { "color": "white", "name": "pear" },
// { "color": "purple", "name": "grape" }
// ]
// }
If you are not familiar with temme, you could start with this stackoverflow example. There are some short examples in the online playground. This example extracts commits information from GitHub commits page, including time, author, commit message and links. This example extract issues information from GitHub issues page, including title, assignee and number of comments.
- 01-introduction
- 02-value-capture
- 03-array-capture
- 04-multiple-selector
- 05-assignments
- 06-javascript
- 07-filters
- 08-modifiers
- 09-procedures
- 10-snippets
0.8 introduces some breaking changes, mainly for introducing the modifier feature, and replacing content to procedure. And, class CaptureResult
gets a lot of simplification, please see the documentation of CaptureResult.
If you still need the old version documentation, you can find it here.
content/procedure does not supports multiple parts any more. You need to write the selector multiple times:
const prev = `div{ $text; find('foo', $bar); }`
const current = `
div{ $text };
div{ find('foo', $bar) };
`
In procedure, temme does not provide the special filters. But temme providers some built-in procedure to do the similiar task:
const prev = `
div{ $t|text };
div{ $h|html };
div{ $n|node };
div{ $o|outerHTML };
`
const current = `
div{ text($t) };
div{ html($h) };
div{ node($n) };
// 暂无 outerHTML procedure
`
Note: because outerHTML
API itself is kind of special, temme does not provide the outerHTML procedure for now. If you need this, please use JavaScript API manually.
Please use filter get
instead.