Skip to content

Commit

Permalink
Proposal to use pre-commit for continuous integration
Browse files Browse the repository at this point in the history
  • Loading branch information
dachengx committed Sep 30, 2023
1 parent 5aba752 commit 9f02265
Show file tree
Hide file tree
Showing 87 changed files with 6,680 additions and 6,215 deletions.
1 change: 0 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,3 @@
# Isolate binary files in case the auto-detection algorithm fails and
# marks them as text files (which could brick them).
*.{png,jpg,jpeg,gif,webp,woff,woff2} binary

2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,4 @@ A clear and concise description of what you expected to happen.
If applicable, add screenshots to help explain your problem.

**Versions**
Please add the version of strax and any related package
Please add the version of strax and any related package
41 changes: 41 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# See https://pre-commit.com for more information
# See https://pre-commit.com/hooks.html for more hooks
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files

- repo: https://github.com/psf/black
rev: 23.7.0
hooks:
- id: black
args: [--safe, --line-length=100, --preview]
- id: black-jupyter
args: [--safe, --line-length=100, --preview]
language_version: python3.9

- repo: https://github.com/pycqa/docformatter
rev: v1.7.5
hooks:
- id: docformatter

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.5.1
hooks:
- id: mypy
additional_dependencies: [
types-PyYAML, types-tqdm, types-pytz,
types-requests, types-setuptools,
]

- repo: https://github.com/pycqa/flake8
rev: 6.1.0
hooks:
- id: flake8

ci:
autoupdate_schedule: weekly
10 changes: 5 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,26 @@
## Contribution guidelines

You're welcome to contribute to strax!
You're welcome to contribute to strax!

Currently, many features are still in significant flux, and the documentation is still very basic. Until more people start getting involved in development, we're probably not even following our own advice below...

### Please fork
Please work in a fork, then submit pull requests.
Please work in a fork, then submit pull requests.
Only maintainers sometimes work in branches if there is a good reason for it.

### No large files
Avoid committing large (> 100 kB) files. We'd like to keep the repository no more than a few MB.

For example, do not commit jupyter notebooks with high-resolution plots (clear the output first), or long configuration files, or binary test data.
For example, do not commit jupyter notebooks with high-resolution plots (clear the output first), or long configuration files, or binary test data.

While it's possible to rewrite history to remove large files, this is a bit of work and messes with the repository's consistency. Once data has gone to master it's especially difficult, then there's a risk of others merging the files back in later unless they cooperate in the history-rewriting.

This is one reason to prefer forks over branches; if you commit a huge file by mistake it's just in your fork.
This is one reason to prefer forks over branches; if you commit a huge file by mistake it's just in your fork.

### Code style
Of course, please write nice and clean code :-)

PEP8-compatibility is great (you can test with flake8) but not as important as other good coding habits such as avoiding duplication. See e.g. the [famous beyond PEP8 talk](https://www.youtube.com/watch?v=wf-BqAjZb8M).
PEP8-compatibility is great (you can test with flake8) but not as important as other good coding habits such as avoiding duplication. See e.g. the [famous beyond PEP8 talk](https://www.youtube.com/watch?v=wf-BqAjZb8M).

In particular, don't go into code someone else is maintaining to "PEP8-ify" it (or worse, use some automatic styling tool)

Expand Down
42 changes: 21 additions & 21 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,10 +113,10 @@ New Contributors

1.2.2 / 2022-05-11
---------------------
- Add option to ignore errors in multirun loading (#653)
- Add option to ignore errors in multirun loading (#653)
- Auto version, fix #217 (#689)
- Add basics documentation - split Config and Plugin docs (#691)
- Add n_hits comment in code (#692)
- Add n_hits comment in code (#692)
- Rechunker script (#686)


Expand All @@ -129,7 +129,7 @@ New Contributors

1.2.0 / 2022-03-09
---------------------
- Added lone hit area to area per channel (#649)
- Added lone hit area to area per channel (#649)

1.1.8 / 2022-03-08
---------------------
Expand Down Expand Up @@ -159,7 +159,7 @@ New Contributors
- deprecate py3.6 py3.7 (#636)
- remove deprecated function (#632)
- Numba 0.55 (#634)


1.1.5 / 2022-01-10
---------------------
Expand All @@ -172,17 +172,17 @@ New Contributors
1.1.4 / 2021-12-16
---------------------
- Make truly HDR (#613)
- Remove tight coincidence channel from data_type (#614)
- Remove tight coincidence channel from data_type (#614)


1.1.3 / 2021-12-13
---------------------
- Add mode and tags to superrun. (#593)
- cache deps (#595)
- Fix online monitor bug for only md stored (#596)
- Add mode and tags to superrun. (#593)
- cache deps (#595)
- Fix online monitor bug for only md stored (#596)
- speedup get_source with lookupdict (#599)
- remove config warning and infer_dtype=False (#600)
- Require pymongo 3.* (#611)
- Require pymongo 3.* (#611)


1.1.2 / 2021-11-19
Expand All @@ -198,7 +198,7 @@ New Contributors

Notes:
- PRs #569, #586, #587 may cause a lot of warnings for options


1.1.1 / 2021-10-27
---------------------
Expand All @@ -213,23 +213,23 @@ Notes:
major / minor:

- Fix hitlet splitting (#549)
- Add tight channel (#551)
- Add tight channel (#551)

patch:

- Add read by index plus some extra checks (#529)
- Add drop column option (#530)
- Remove context.apply_selection (#531)
- Add option to support superruns for storage frontends. Adds test (#532)
- Fix issue #536 (#537)
- Fix issue #536 (#537)
- Two pbar patches (#538)
- Add get_zarr method to context (#540)
- Add get_zarr method to context (#540)
- Broken metadata error propagation (#541)
- few tests for MongoStorage frontend (#542)
- Fix caching (#545)
- Fix caching (#545)
- dds information about failing chunk (#548)
- remove rucio (#552)
- Allow loading SaveWhen.EXPLICIT time range selection (#553)
- remove rucio (#552)
- Allow loading SaveWhen.EXPLICIT time range selection (#553)
- Changes to savewhen behavior (#554)


Expand Down Expand Up @@ -275,7 +275,7 @@ patch:
- Remove outdated files/configs (#462)
- Remove overwrite from options (#467)


0.15.3 / 2021-06-03
---------------------
- Match cached buffer chunk start times OverlapWindowPlugin (#450)
Expand All @@ -297,7 +297,7 @@ patch:
---------------------
- Refactor hitlets (#430, #436)
- Update classifiers for pipy #437
- Allow Py39 in travis tests (#427)
- Allow Py39 in travis tests (#427)

0.15.0 / 2021-04-16
---------------------
Expand All @@ -310,7 +310,7 @@ patch:

0.14.0 / 2021-04-09
---------------------
- Check data availability for single run (#416)
- Check data availability for single run (#416)

0.13.11 / 2021-04-02
---------------------
Expand Down Expand Up @@ -346,7 +346,7 @@ patch:

0.13.4 / 2021-01-22
---------------------
- Nveto changes + highest density regions (#384)
- Nveto changes + highest density regions (#384)
- Parse requirements for testing (#383)
- Added keep_columns into docstring (#382)
- remove slow operators from mongo storage (#382)
Expand Down Expand Up @@ -522,7 +522,7 @@ patch:
------------------
- Small bugfixes:
- Fixes for multi-output plugins
- Use frozendict for Plugin.takes_config
- Use frozendict for Plugin.takes_config

0.8.6 / 2020-01-17
-------------------
Expand Down
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,3 @@ Streaming analysis for xenon experiments
Strax is an analysis framework for pulse-only digitization data, specialized for live data reduction at speeds of 50-100 MB(raw) / core / sec. For more information, please see the [strax documentation](https://strax.readthedocs.io).

Strax' primary aim is to support noble liquid TPC dark matter searches, such as XENONnT. The XENON-specific algorithms live in the separate package [straxen](https://github.com/XENONnT/straxen). If you want to try out strax, you probably want to start there. This package only contains the core framework and basic algorithms any TPCs would want to use.

2 changes: 1 addition & 1 deletion docs/make_docs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ make clean
rm -r source/reference
sphinx-apidoc -o source/reference ../strax
rm source/reference/modules.rst
make html
make html
1 change: 0 additions & 1 deletion docs/source/advanced/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -326,4 +326,3 @@ URL style configuration (used in `straxen <https://github.com/XENONnT/straxen>`_
kwargs[k] = v
return self.dispatch(url, **kwargs)
18 changes: 9 additions & 9 deletions docs/source/advanced/out_of_core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,18 @@ Out-of-core algorithms usually involve a few repeating steps:
2. load the data chunk by chunk
3. perform some computation on each chunk
4. save a summary of the results for each chunk
5. perform some combination of the per-chunk results into a final result.
5. perform some combination of the per-chunk results into a final result.

While it is of course possible to implement these operations yourself, it can be tedious and repetative and the code becomes very rigid to the specific calculations being performed.
A better approach is to use abstractions of commonly performed operations that use out-of-core algorithms under the hood to get the same result as if the operations were performed on the entire dataset.
Code written using these abstractions can then run both on in-memory datasets as well as out-of-core datasets alike.
More importantly the implmentations of these algorithms can be written once and packaged to then be used by all.
More importantly the implmentations of these algorithms can be written once and packaged to then be used by all.

Data chunking
-------------
The zarr package provides an abstraction of the data-access api of numpy arrays for chunked and compressed data stored in memory or disk.
zarr provides an array abstraction with identical behavior to a numpy array when accessing data but where the underlyign data is actually a collection of compressed (optional) chunks.
the strax context provides a convenience method for loading data directly into zarr arrays.
the strax context provides a convenience method for loading data directly into zarr arrays.

.. code-block:: python
Expand All @@ -35,35 +35,35 @@ the strax context provides a convenience method for loading data directly into z
zgrp = context.get_zarr(RUN_IDs, DATA_TYPES, **GET_ARRAY_KWARGS)
# the zarr group contains multiple arrays, one for each data type
z = zgrp.data_type
z = zgrp.data_type
# individual arrays are also accessible via the __getitem__ interface
z = zgrp['data_type']
# numpy-like data access, abstracting away the underlying
# data reading which may include readin multiple chunks from disk/memory
# and decompression then concatenation to return an in memory numpy array
# and decompression then concatenation to return an in memory numpy array
z[:100]
Data processing
---------------
The dask package provides abstractions for most of the numpy and pandas apis.
The dask.Array and dask.DataFrame objects implement their respective apis
The dask.Array and dask.DataFrame objects implement their respective apis
using fully distributed algorithms, only loading a fraction of the total data into memory
at any given moment for a given computing partition (thread/process/HPC-job).

.. code-block:: python
import dask.array as da
# easily convert to dask.Array abstraction for processing
darr = da.from_zarr(z)
darr = da.from_zarr(z)
# its recommended to rechunk to sizes more appropriate for processing
# see dask documentation for details
darr.rechunk(CHUNK_SIZE)
# you can also convert the dask.Array abstraction
# to a dask.DataFrame abstraction if you need the pandas api
ddf = darr.to_dask_dataframe()
ddf = darr.to_dask_dataframe()
4 changes: 2 additions & 2 deletions docs/source/advanced/superrun.rst
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ If you wish to make/store a superrun you have to specify the context option:
st.set_context_config({'write_superruns': True})
Superruns follow the same saving rules (SaveWhen.TARGET, SaveWhen.EXPLICIT or SaveWhen.ALWAYS) as regular runs.
Superruns follow the same saving rules (SaveWhen.TARGET, SaveWhen.EXPLICIT or SaveWhen.ALWAYS) as regular runs.
How superruns work
--------------------
Expand All @@ -128,4 +128,4 @@ but which gains from the file are actually used is dependent on the runid.
Thus, superruns won't help build data faster, but they will speed up loading data after it has been
built. This is important, because strax' overhead for loading a run is larger than hax, due to its
version and option tracking (this is only true if per-run-default options are allowed).
version and option tracking (this is only true if per-run-default options are allowed).
2 changes: 1 addition & 1 deletion docs/source/basics/context.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/basics/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -199,4 +199,4 @@ You can check the lineage e.g. by using the ``context.key_for`` method (which co
some_run-peaks-3g2rc4f3jg
some_run-peaks-vqo4oamp76
For more examples, checkout the developer and advanced documentation.
For more examples, checkout the developer and advanced documentation.
31 changes: 16 additions & 15 deletions docs/source/build_release_notes.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,17 @@


def convert_release_notes():
"""Convert the release notes to an RST page with links to PRs"""
"""Convert the release notes to an RST page with links to PRs."""
this_dir = os.path.dirname(os.path.realpath(__file__))
notes = os.path.join(this_dir, '..', '..', 'HISTORY.md')
with open(notes, 'r') as f:
notes = os.path.join(this_dir, "..", "..", "HISTORY.md")
with open(notes, "r") as f:
notes = f.read()
rst = convert(notes)
with_ref = ''
for line in rst.split('\n'):
with_ref = ""
for line in rst.split("\n"):
# Get URL for PR
if '#' in line:
pr_number = line.split('#')[1]
if "#" in line:
pr_number = line.split("#")[1]
while len(pr_number):
try:
pr_number = int(pr_number)
Expand All @@ -28,15 +28,16 @@ def convert_release_notes():
# Too many tailing characters to be an int
pr_number = pr_number[:-1]
if pr_number:
line = line.replace(f'#{pr_number}',
f'`#{pr_number} <https://github.com/AxFoundation/strax/pull/{pr_number}>`_'
)
with_ref += line + '\n'
target = os.path.join(this_dir, 'reference', 'release_notes.rst')
line = line.replace(
f"#{pr_number}",
f"`#{pr_number} <https://github.com/AxFoundation/strax/pull/{pr_number}>`_",
)
with_ref += line + "\n"
target = os.path.join(this_dir, "reference", "release_notes.rst")

with open(target, 'w') as f:
f.write(header+with_ref)
with open(target, "w") as f:
f.write(header + with_ref)


if __name__ == '__main__':
if __name__ == "__main__":
convert_release_notes()
Loading

0 comments on commit 9f02265

Please sign in to comment.