A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
-
Updated
Dec 13, 2024 - Python
A parser turns its input (often text in form of a file) into a more advantageous representation (usually a certain data structure in memory) to perform a specific task.
Common examples include:
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
oletools - python tools to analyze MS OLE2 files (Structured Storage, Compound File Binary Format) and MS Office documents, for malware analysis, forensics and debugging.
RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
Type-safe YAML parser and validator.
A python module to repair invalid JSON, commonly used to parse the output of LLMs
Cinemagoer is a Python package useful to retrieve and manage the data of the IMDb (to which we are not affiliated in any way) movie database about movies, people, characters and companies
High-accuracy NLP parser with models for 11 languages.
Wiktionary dump file parser and multilingual data extractor
Uses tokenized query returned by python-sqlparse and generates query metadata