This repository contains all code necessary to implement a wikitext tree-sitter parser.
According to tree-sitter's website:
Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:
- General enough to parse any programming language
- Fast enough to parse on every keystroke in a text editor
- Robust enough to provide useful results even in the presence of syntax errors
- Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application
- Link to page explaining how to write parsers in the
grammar.js
file: Tree-Sitter creating parsers documentation - Documentation on how to use the resulting parser in Python with the
tree-sitter
python module Tree-Sitter using parsers in Python
The parser is written by Miel Peeters at GhentCDH to parse and convert a MediaWiki website.