Merge pull request #35 from drkane/add-plugin-support

Add plugin support
kanedata · Jul 9, 2023 · 1961244 · 1961244
2 parents b1457a6 + 36b8100
commit 1961244
Show file tree

Hide file tree

Showing 27 changed files with 872 additions and 190 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,28 @@
+name: docs 
+
+on:
+  push:
+    branches:
+      - main
+permissions:
+  contents: write
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    strategy:
+      max-parallel: 4
+      matrix:
+        python-version: ["3.10"]
+    steps:
+    - uses: actions/checkout@v3
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v4
+      with:
+        python-version: ${{ matrix.python-version }}
+        cache: pip
+    - name: Install dependencies
+      run: |
+        pip install -e .
+        pip install hatch
+    - name: Deploy docs
+      run: hatch run docs:deploy
diff --git a/README.md b/README.md
@@ -8,8 +8,12 @@
 
 A python module for getting useful data out of ixbrl files. The library is at an early stage - feedback and improvements are very welcome.
 
+Full documentation is available at [dkane.net/ixbrl-parse/](https://dkane.net/ixbrl-parse/)
+
 ## Changelog
 
+**New in version 0.7.0**: Add plugin support. Add documentation
+
 **New in version 0.6.0**: Switch to use the [hatch](https://hatch.pypa.io/latest/) build and development system.
 
 **New in version 0.5.4**: Added backreferences to BeautifulSoup objects - thanks to @avyfain for PR.
@@ -62,8 +66,6 @@ python -m ixbrlparse -h
 
 ### Use as a python module
 
-An example of usage is shown in [`test.py`](test.py).
-
 #### Import the `IXBRL` class which parses the file.
 
 ```python
@@ -159,7 +161,7 @@ Note that the error catching is only available for parsing of `.nonnumeric`
 and `numeric` items in the document. Any other errors with parsing will be
 thrown as normal no matter what `raise_on_error` is set to.
 
-## Code checks
+## Development
 
 The module is setup for development using [hatch](https://hatch.pypa.io/latest/).
 

diff --git a/docs/changelog.md b/docs/changelog.md
@@ -0,0 +1,13 @@
+# Changelog
+
+**New in version 0.7.0**: Add plugin support. Add documentation
+
+**New in version 0.6.0**: Switch to use the [hatch](https://hatch.pypa.io/latest/) build and development system.
+
+**New in version 0.5.4**: Added backreferences to BeautifulSoup objects - thanks to @avyfain for PR.
+
+**New in version 0.5.3**: Support for `exclude` and `continuation` elements within XBRL documents. Thanks to @wcollinscw for adding support for exclude elements.
+
+**New in version 0.5**: Support for Python 3.11 has been added. I've had some problems with Python 3.11 and Windows as lxml binaries aren't yet available. Also new in version 0.5 is type checking - the whole library now has types added. 
+
+**New in version 0.4**: I've added initial support for pure XBRL files as well as tagged HTML iXBRL files. Feedback on this feature is welcome - particularly around getting values out of numeric items.
diff --git a/docs/command-line.md b/docs/command-line.md
@@ -0,0 +1,22 @@
+# Command line
+
+You can run the module directly to extract data from an IXBRL file.
+
+```bash
+ixbrlparse example_file.html
+# or
+python -m ixbrlparse example_file.html
+```
+
+The various options for using this can be found through:
+
+```bash
+python -m ixbrlparse -h
+# optional arguments:
+#   -h, --help            show this help message and exit
+#   --outfile OUTFILE     Where to output the file
+#   --format {csv,json,jsonlines,jsonl}
+#                         format of the output
+#   --fields {numeric,nonnumeric,all}
+#                         Which fields to output
+```
diff --git a/docs/development.md b/docs/development.md
@@ -0,0 +1,62 @@
+# Development
+
+The module is setup for development using [hatch](https://hatch.pypa.io/latest/).
+
+## Run tests
+
+Tests can be run with `pytest`:
+
+```bash
+hatch run test
+```
+
+## Test coverage
+
+Run tests then report on coverage
+
+```bash
+hatch run cov
+```
+
+Run tests then run a server showing where coverage is missing
+
+```bash
+hatch run cov-html
+```
+
+## Run typing checks
+
+```bash
+hatch run lint:typing
+```
+
+## Linting
+
+Black and ruff should be run before committing any changes.
+
+To check for any changes needed:
+
+```bash
+hatch run lint:style
+```
+
+To run any autoformatting possible:
+
+```sh
+hatch run lint:fmt
+```
+
+## Run all checks at once
+
+```sh
+hatch run lint:all
+```
+
+# Publish to pypi
+
+```bash
+hatch build
+hatch publish
+git tag v<VERSION_NUMBER>
+git push origin v<VERSION_NUMBER>
+```
diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,29 @@
+# ixbrlParse
+
+![Test status](https://github.com/drkane/ixbrl-parse/workflows/tests/badge.svg)
+[![PyPI version](https://img.shields.io/pypi/v/ixbrlparse)](https://pypi.org/project/ixbrlparse/)
+![PyPI - Python Version](https://img.shields.io/pypi/pyversions/ixbrlparse)
+![PyPI - License](https://img.shields.io/pypi/l/ixbrlparse)
+
+A python module for getting useful data out of ixbrl files. The library is at an early stage - feedback and improvements are very welcome.
+
+## Requirements
+
+The module requires [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) and [lxml](https://lxml.de/) to parse the documents.
+
+[word2number](https://github.com/akshaynagpal/w2n) is used to process the
+numeric items with the `numsenwords` format.
+
+## How to install
+
+You can install from pypi using pip:
+
+```
+pip install ixbrlparse
+```
+
+## Acknowledgements
+
+Originally developed for a project with 
+[Power to Change](https://www.powertochange.org.uk/) looking at how to extract data from 
+financial documents of community businesses.
diff --git a/docs/plugins.md b/docs/plugins.md
@@ -0,0 +1,59 @@
+# Plugins
+
+The module allows for plugins to customize functionality, using the [pluggy](https://pluggy.readthedocs.io/en/stable/) framework.
+
+The only current plugin endpoint is to add more Formatters. A formatter takes a value from a ixbrl item and converts it into the appropriate python value. For example, the `ixtNumWordsEn` formatter would take a value like "eighty-five" and turn it into 85.
+
+The formats used within ixbrl files can vary between schemas and countries. Rather than try to cover everything in this module, you can write a plugin to support the format that you need.
+
+## Creating a plugin
+
+### Create a custom format class
+
+To create a plugin, you first need to create a new format class that subclasses `ixbrlparse.ixbrlFormat`. This has two key components:
+
+- a `format_names` attribute which consists of a tuple of possible names for the format. These are the values that will be checked against the ixbrl items. These names must not clash with other formats that have already been defined.
+- a `parse_value` function which takes the original text value and returns the processed value.
+
+An example class might look like (in the file `ixbrlparse-dateplugin/ixbrlparse_dateplugin.py`):
+
+```python
+import ixbrlparse
+
+class ixtParseIsoDate(ixbrlparse.ixbrlFormat):
+    format_names = ("isodateformat")
+
+    def parse_value(self, value):
+        return datetime.datetime.strptime(value, "%Y-%m-%d").astimezone().date()
+```
+
+### Hook into ixbrlparse
+
+Next you need to add a function which will hook into ixbrlparse at the right point. This function needs to be called `ixbrl_add_formats`, and returns a list of new format classes (added to the bottom of `ixbrlparse-dateplugin/ixbrlparse_dateplugin.py`):
+
+```python
+@ixbrlparse.hookimpl
+def ixbrl_add_formats():
+    return [ixtParseIsoDate]
+```
+
+You then need to add an entrypoint to `setup.py` or to `pyproject.toml` which will be activated when your project is installed. This should look something like (using an example `ixbrlparse-dateplugin/setup.py`):
+
+```python
+from setuptools import setup
+
+setup(
+    name="ixbrlparse-dateplugin",
+    install_requires="ixbrlparse",
+    entry_points={"ixbrlparse": ["dateplugin = ixbrlparse_dateplugin"]},
+    py_modules=["ixbrlparse_dateplugin"],
+)
+```
+
+### Install the plugin
+
+If you then install the plugin it should be picked up by ixbrlparse and will also include the additional formats when checking.
+
+## Acknowledgements
+
+The implementation of pluggy used here draws heavily on [pluggy's own tutorial](https://pluggy.readthedocs.io/en/stable/#a-complete-example) and @simonw's [implementation of plugins for datasette](https://docs.datasette.io/en/stable/plugins.html).
diff --git a/docs/python-module.md b/docs/python-module.md
@@ -0,0 +1,96 @@
+# Python module
+
+## Import the `IXBRL` class which parses the file.
+
+```python
+from ixbrlparse import IXBRL
+```
+
+## Initialise an object and parse the file
+
+You need to pass a file handle or other object with a `.read()` method.
+
+```python
+with open('sample_ixbrl.html', encoding="utf8") as a:
+  x = IXBRL(a)
+```
+
+If your IXBRL data comes as a string then use a `io.StringIO` wrapper to
+pass it to the class:
+
+```python
+import io
+from ixbrlparse import IXBRL
+
+content = '''<some ixbrl content>'''
+x = IXBRL(io.StringIO(content))
+```
+
+
+## Get the contexts and units used in the data
+
+These are held in the object. The contexts are stored as a dictionary with the context
+id as the key, and a `ixbrlContext` object as the value.
+
+```python
+print(x.contexts)
+# {
+#    "cfwd_2018_03_31": ixbrlContext(
+#       id="cfwd_2018_03_31",
+#       entity="0123456", # company number
+#       segments=[], # used for hypercubes
+#       instant="2018-03-31",
+#       startdate=None, # used for periods
+#       enddate=None, # used for periods
+#    ),
+#    ....
+# }
+```
+
+The units are stored as key:value dictionary entries
+```python
+print(x.units)
+# {
+#    "GBP": "ISO4107:GBP"
+#    "shares": "shares"
+# }
+```
+
+## Get financial facts
+
+Numeric facts are stored in `x.numeric` as a list of `ixbrlNumeric` objects.
+The `ixbrlNumeric.value` object contains the value as a parsed python number
+(after the sign and scale formatting values have been applied).
+
+`ixbrlNumeric.context` holds the context object relating to this value.
+The `.name` and `.schema` values give the key of this value, according to
+the applied schema.
+
+Non-numeric facts are stored in `x.nonnumeric` as a list of `ixbrlNonnumeric`
+objects, with similar `.value`, `.context`, `.name` and `.schema` values. 
+The value of `.value` will be a string for non-numeric facts.
+
+## Check for any parsing errors
+
+By default, the parser will throw an exception if it encounters an error
+when processing the document.
+
+You can parse `raise_on_error=False` to the initial object to suppress
+these exceptions. You can then access a list of the errors (and the element)
+that created them through the `.errors` attribute. For example:
+
+```python
+with open('sample_ixbrl.html', encoding="utf8") as a:
+  x = IXBRL(a, raise_on_error=False)
+  print(x.errors) # populated with any exceptions found
+  # [ eg...
+  #   {
+  #     "error": <NotImplementedError>,
+  #     "element": <BeautifulSoupElement>
+  #   }
+  # ]
+```
+
+Note that the error catching is only available for parsing of `.nonnumeric`
+and `numeric` items in the document. Any other errors with parsing will be
+thrown as normal no matter what `raise_on_error` is set to.
diff --git a/docs/reference.md b/docs/reference.md
@@ -0,0 +1,21 @@
+# API documentation
+
+## ixbrlparse.IXBRL
+
+::: src.ixbrlparse.core.IXBRL
+
+## ixbrlparse.ixbrlFormat
+
+::: src.ixbrlparse.components._base.ixbrlFormat
+
+## ixbrlparse.ixbrlContext
+
+::: src.ixbrlparse.components.context.ixbrlContext
+
+## ixbrlparse.ixbrlNonNumeric
+
+::: src.ixbrlparse.components.nonnumeric.ixbrlNonNumeric
+
+## ixbrlparse.ixbrlNumeric
+
+::: src.ixbrlparse.components.numeric.ixbrlNumeric