This module interfaces distance sampling analysis engines from Distance software, and possibly others in the future ; thus, it has been designed in order to make it easier :
- to run (in parallel) numerous Distance Sampling analyses with many (many) parameter variants on many field observation samples (possibly using some optimisation techniques for automated computation of right and left distance truncations),
- to select the best analysis variant results through a mostly automated process, based on customisable statistical quality indicators,
- to produce partly customisable reports in spreadsheet (numerical results only) and HTML formats (more complete, with full-featured plots like in Distance, and more).
As for now, only the Windows MCDS.exe 6.x (Distance 6 to 7.3) and 7.4 (Distance 7.4 and 7.5 at least) engine and Point Transect analyses are supported, and so, it runs only under Windows.
The module itself was actually tested extensively with:
- python 3.12.3
- numpy 1.26.4
- pandas 2.2.2
- openpyxl 3.1.2
- xlrd 2.0.1 (only for .xls format support)
- odfpy 1.4.1
- jinja2 3.1.4
- matplotlib 3.8.4
- packaging 24.0
- zoopt 0.4.2
It probably works as is with earlier versions, but not below python 3.9 as specified in setup.py and pandas 2.1 (but you'll need to run the whole test suite first to make sure).
If you need Python 3.8 compatibility, you can:
- use the 1.1.0 release (but you'll be limited to pandas 1.x),
- tweak this (source) release, but at your own risks, because for sure you'll have to do some fixes (hint: run the whole test suite to see what's happening).
As for testing dependencies:
- pytest, pytest-cov,
- plotly (sometimes, in old notebooks).
You can install pyaudisam from PyPI in your current python environment (conda or venv, whatever):
pip install pyaudisam
Or from a downloaded source package:
pip install pyaudisam-1.1.0.tar.gz
Or from a downloaded wheel package:
pip install pyaudisam-1.1.0-py3-none-any.whl
Or even directly from GitHub:
pip install git+https://github.com/pypa/sampleproject.git@1.1.0
pip install git+https://github.com/pypa/sampleproject.git@main
As a python package, pyaudisam can be used through its python API.
But there's also a command-line interface: try and run it with the -h/--help option.
python -m pyaudisam --help
Whichever method, the best way to go is to read the concrete quick-start guide : see Documentation below (but be aware that you'll need to install an external Distance Sampling engine, like MCDS, to run analyses with paudisam).
- a short "how it works" guide to understand the basics (also in French),
- a concrete "quick-start" guide with a real life use case and relevant field data to play with,
- another similar but shorter concrete "quick-start guide" (in French) (command-line only) with the full field data set of the "ACDC 2019" birding study.
Note: You can also get a detailed idea of how to use pyaudisam python API by playing with the fully functional jupyter notebook tests/valtests.ipynb (see below Running tests for how to obtain and run it).
You first need to clone the source tree or download and install a source package: once done, look in the tests sub-folder, everything's inside.
Then, you need to install test dependencies:
pip install pyaudisam[test]
Some tests are fully automated, simply run:
pytest
For code coverage during tests, simply run:
pytest --cov
Or even, if you want an HTML report with annotated code coverage:
pytest --cov --cov-report html
Notes:
- With pandas 2.2, you'll face the following warning a few times when running some tests: as the future behaviour is unknown, we don't know how to fix this, so we left things as is => when using later versions of pandas, when the current behavior is really deprecated, you may have to fix things yourself.
FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
- All the test suite has been fully automated (even if not 100% covering code) ; but the old test suite implemented as jupyter notebooks is still available (see tests/unintests.ipynb and tests/valtests.ipynb).
To build pyaudisam PyPI source and binary packages, you need:
- a source tree (clone the source tree or download and extract a source package),
- a python environment where pyaudisam works,
- the
build
module (to install through pip as an example).
Then, it's as simple as:
python -m build
You'll get 2 files in the dist
folder (ex. for version 1.1.0) :
- the wheel package:
pyaudisam-1.1.0-py3-none-any.whl
- the source package:
pyaudisam-1.1.0.tar.gz
Merge requests are very welcome !
And if you are lacking ideas, here are some good ones below ;-)
- documentation:
- complete the quick start guides above by other small and focused articles to explain some mandatory details:
- how to build a sample or analysis specification workbook (see a short draft in analyser.py:273),
- ...
- write a technical documentation of the whole module and sub-modules,
- write a guide for building the module API documentation (sphinx should work out of the box as reStructured text has been used in docstrings),
- complete the quick start guides above by other small and focused articles to explain some mandatory details:
- code quality and tests:
- add more tests for improving code coverage (thanks to HTML coverage report),
- configure and run pylint, and follow its useful advices,
- main: split _Application._run in feature sub-functions for clarity,
- features:
- add support for line transects (only point transects for the moment),
- add support for the co-variates feature of MCDS,
- integrate the notebook prototype of "final reports" (workbook, HTML, and OpenDoc text formats) to automate most of the work of producing a publication-grade "full results appendix" for a Distance Sampling study (based on the auto-filtered report, but with semi-automated diagnosis at sample and analysis level in order to help in the final choice for each sample),
- add more features for selecting sample data before running analyses (to avoid the need of creating multiple data sets, run multiple analysis sessions, and then re-aggregate results and reports manually): exclude some specific transects, pre-truncate data above some fixed distance, ...
- packaging:
- publish also pyaudisam on Conda Forge, probably following this recipe,
- platform support:
- make pyaudisam work under Linux / macOS (all python: OK, but ... calling MCDS.exe, that runs exclusively under Windows):
- or: through some kind of external client-server interface to MCDS.exe (that runs only under Windows),
- or: by porting MCDS to Linux (closed Fortran source, but old, so might be obtained through a polite request to this Distance Sampling forum ; BUT, you'll need an IMSL license, which is horribly expensive).
- or: by rewriting MCDS from scratch, or by porting the MRDS Distance package to Python,
- or: by rewriting MCDS using the MRDS Distance package, meaning some kind of interface to R,
- make pyaudisam work under Linux / macOS (all python: OK, but ... calling MCDS.exe, that runs exclusively under Windows):
- user interface:
- build a GUI for pyaudisam command-line (with some kind of "project" concept, and parameter set template, and ...),
- ...
- AnalysisResultsSet.toOpenDoc sometimes produces broken header rows (the 3-row multi-index header is not rendered as it is on the right side),
- The new undocumented MCDS 7.4 result column names are not translated correctly (switched fr and en translations) (minor, as not used actually for the moment),
- The "Details" table header is not translated in auto-filtered reports (whereas the "Synthesis" one is),
- Too many decimals rendered for the Max/Min dist figures in HTML reports when the distance unit is "meter",
- The colorisation of HTML and workbook auto-filtered reports is of no use as it is now (need for a full rework).
You can read them here :-)
Some formal things that I don't plan to change (let's concentrate on substantive content) :-)
- this code is not blacked or isorted or fully conform to pep8 (but it's clean, commented, and it works),
- the identifier naming scheme used is old-fashioned: camel case everywhere.