#

html-extraction

Here are 15 public repositories matching this topic...

hext

html-extract / hext

Domain-specific language for extracting structured data from HTML documents

ruby python html php node cpp dsl scraping data-extraction html-extraction

Updated Nov 3, 2024
C++

zanachka / number-parser

Parse numbers written in natural language

text-extraction html-extraction

Updated Oct 25, 2024
Python

zanachka / python-readability

fast python port of arc90's readability tool, updated to match latest readability.js!

text-extraction html-extraction

Updated Oct 15, 2024
Python

Whomrx666 / Xtract-htmlV2

Xtract-htmlV2 is a tool for getting the HTML code from the website you want and is the successor to the previous version

linux extract termux kali-linux html-extraction html-extractor termux-tool xtract-htmlv2

Updated Oct 15, 2024
Python

9dl / HTML-Dumper

extracts and saves HTML, CSS, and JavaScript files from a specified URL.

web-scraping html-extraction

Updated Oct 14, 2024
C#

Whomrx666 / Xtract-html

Xtract-html is a tool for extracting HTML display code from a website, which you can also use for your website.

linux html termux kali-linux html-extraction html-extractor termux-tool xtract-html

Updated Aug 31, 2024
Python

zanachka / extruct

Extract embedded metadata from HTML markup

text-extraction html-extraction

Updated May 30, 2024
Python

miso-belica / sumy

Module for automatic summarization of text documents and HTML pages.

python nlp pagerank-algorithm text-extraction reduction summarization html-page summary lsa sumy textteaser summarizer html-extraction html-extractor

Updated May 16, 2024
Python

bookieio / breadability

Reworked https://www.readability.com/ parsing library (now https://mercury.postlight.com/ is living alternative)

python text-mining text-extraction html-parsing html-extraction html-extractor

Updated May 9, 2024
HTML

zanachka / dateparser

python parser for human readable dates

text-extraction html-extraction

Updated Apr 12, 2024
Python

zanachka / price-parser

Extract price amount and currency symbol from a raw text string

text-extraction html-extraction

Updated Oct 18, 2023
Python

zanachka / article-extraction-benchmark

Article extraction benchmark: dataset and evaluation scripts

text-extraction html-extraction

Updated Jul 22, 2021
Python

zanachka / html-text

Extract text from HTML

text-extraction html-extraction

Updated Oct 21, 2020
HTML

zanachka / jusText

Heuristic based boilerplate removal tool

text-extraction html-extraction

Updated Oct 21, 2020
Python

shmdoc / unit-parser

Script for extracting units from http://vocab.nerc.ac.uk/collection/P06/current/ to easily add units to the database (This should only be temporarily to demonstrate how units can work)

linked-open-data html-extraction

Updated Jul 27, 2020
HTML

Improve this page

Add a description, image, and links to the html-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the html-extraction topic, visit your repo's landing page and select "manage topics."