Proposal to use pre-commit for continuous integration

AxFoundation · Sep 30, 2023 · 9f02265 · 9f02265
1 parent 5aba752
commit 9f02265
Show file tree

Hide file tree

Showing 87 changed files with 6,680 additions and 6,215 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -6,4 +6,3 @@
 # Isolate binary files in case the auto-detection algorithm fails and
 # marks them as text files (which could brick them).
 *.{png,jpg,jpeg,gif,webp,woff,woff2} binary
-
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -23,4 +23,4 @@ A clear and concise description of what you expected to happen.
 If applicable, add screenshots to help explain your problem.
 
 **Versions**
-Please add the version of strax and any related package
+Please add the version of strax and any related package
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,41 @@
+# See https://pre-commit.com for more information
+# See https://pre-commit.com/hooks.html for more hooks
+repos:
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.4.0
+    hooks:
+    -   id: trailing-whitespace
+    -   id: end-of-file-fixer
+    -   id: check-yaml
+    -   id: check-added-large-files
+
+-   repo: https://github.com/psf/black
+    rev: 23.7.0
+    hooks:
+    -   id: black
+        args: [--safe, --line-length=100, --preview]
+    -   id: black-jupyter
+        args: [--safe, --line-length=100, --preview]
+        language_version: python3.9
+
+-   repo: https://github.com/pycqa/docformatter
+    rev: v1.7.5
+    hooks:
+    -   id: docformatter
+
+-   repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.5.1
+    hooks:
+    -   id: mypy
+        additional_dependencies: [
+            types-PyYAML, types-tqdm, types-pytz,
+            types-requests, types-setuptools,
+        ]
+
+-   repo: https://github.com/pycqa/flake8
+    rev: 6.1.0
+    hooks:
+    -   id: flake8
+
+ci:
+    autoupdate_schedule: weekly
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,26 +1,26 @@
 ## Contribution guidelines
 
-You're welcome to contribute to strax! 
+You're welcome to contribute to strax!
 
 Currently, many features are still in significant flux, and the documentation is still very basic. Until more people start getting involved in development, we're probably not even following our own advice below...
 
 ### Please fork
-Please work in a fork, then submit pull requests. 
+Please work in a fork, then submit pull requests.
 Only maintainers sometimes work in branches if there is a good reason for it.
 
 ### No large files
 Avoid committing large (> 100 kB) files. We'd like to keep the repository no more than a few MB.
 
-For example, do not commit jupyter notebooks with high-resolution plots (clear the output first), or long configuration files, or binary test data. 
+For example, do not commit jupyter notebooks with high-resolution plots (clear the output first), or long configuration files, or binary test data.
 
 While it's possible to rewrite history to remove large files, this is a bit of work and messes with the repository's consistency. Once data has gone to master it's especially difficult, then there's a risk of others merging the files back in later unless they cooperate in the history-rewriting.
 
-This is one reason to prefer forks over branches; if you commit a huge file by mistake it's just in your fork.  
+This is one reason to prefer forks over branches; if you commit a huge file by mistake it's just in your fork.
 
 ### Code style
 Of course, please write nice and clean code :-)
 
-PEP8-compatibility is great (you can test with flake8) but not as important as other good coding habits such as avoiding duplication. See e.g. the [famous beyond PEP8 talk](https://www.youtube.com/watch?v=wf-BqAjZb8M). 
+PEP8-compatibility is great (you can test with flake8) but not as important as other good coding habits such as avoiding duplication. See e.g. the [famous beyond PEP8 talk](https://www.youtube.com/watch?v=wf-BqAjZb8M).
 
 In particular, don't go into code someone else is maintaining to "PEP8-ify" it (or worse, use some automatic styling tool)
 

diff --git a/HISTORY.md b/HISTORY.md
@@ -113,10 +113,10 @@ New Contributors
 
 1.2.2 / 2022-05-11
 ---------------------
-- Add option to ignore errors in multirun loading (#653) 
+- Add option to ignore errors in multirun loading (#653)
 - Auto version, fix #217 (#689)
 - Add basics documentation - split Config and Plugin docs (#691)
-- Add n_hits comment in code (#692) 
+- Add n_hits comment in code (#692)
 - Rechunker script (#686)
 
 
@@ -129,7 +129,7 @@ New Contributors
 
 1.2.0 / 2022-03-09
 ---------------------
-- Added lone hit area to area per channel (#649) 
+- Added lone hit area to area per channel (#649)
 
 1.1.8 / 2022-03-08
 ---------------------
@@ -159,7 +159,7 @@ New Contributors
 - deprecate py3.6 py3.7 (#636)
 - remove deprecated function (#632)
 - Numba 0.55 (#634)
-  
+
 
 1.1.5 / 2022-01-10
 ---------------------
@@ -172,17 +172,17 @@ New Contributors
 1.1.4 / 2021-12-16
 ---------------------
 - Make truly HDR (#613)
-- Remove tight coincidence channel from data_type (#614) 
+- Remove tight coincidence channel from data_type (#614)
 
 
 1.1.3 / 2021-12-13
 ---------------------
--  Add mode and tags to superrun. (#593) 
--  cache deps (#595) 
--  Fix online monitor bug for only md stored (#596) 
+-  Add mode and tags to superrun. (#593)
+-  cache deps (#595)
+-  Fix online monitor bug for only md stored (#596)
 -  speedup get_source with lookupdict (#599)
 -  remove config warning and infer_dtype=False (#600)
--  Require pymongo 3.* (#611) 
+-  Require pymongo 3.* (#611)
 
 
 1.1.2 / 2021-11-19
@@ -198,7 +198,7 @@ New Contributors
 
 Notes:
  - PRs #569, #586, #587 may cause a lot of warnings for options
- 
+
 
 1.1.1 / 2021-10-27
 ---------------------
@@ -213,23 +213,23 @@ Notes:
 major / minor:
 
 - Fix hitlet splitting (#549)
-- Add tight channel (#551) 
+- Add tight channel (#551)
 
 patch:
 
 - Add read by index plus some extra checks (#529)
 - Add drop column option (#530)
 - Remove context.apply_selection (#531)
 - Add option to support superruns for storage frontends. Adds test (#532)
-- Fix issue #536 (#537) 
+- Fix issue #536 (#537)
 - Two pbar patches (#538)
-- Add get_zarr method to context (#540) 
+- Add get_zarr method to context (#540)
 - Broken metadata error propagation (#541)
 - few tests for MongoStorage frontend (#542)
-- Fix caching (#545) 
+- Fix caching (#545)
 - dds information about failing chunk (#548)
-- remove rucio (#552) 
-- Allow loading SaveWhen.EXPLICIT time range selection (#553) 
+- remove rucio (#552)
+- Allow loading SaveWhen.EXPLICIT time range selection (#553)
 - Changes to savewhen behavior (#554)
 
 
@@ -275,7 +275,7 @@ patch:
 - Remove outdated files/configs (#462)
 - Remove overwrite from options (#467)
 
- 
+
 0.15.3 / 2021-06-03
 ---------------------
 - Match cached buffer chunk start times OverlapWindowPlugin (#450)
@@ -297,7 +297,7 @@ patch:
 ---------------------
 - Refactor hitlets (#430, #436)
 - Update classifiers for pipy #437
-- Allow Py39 in travis tests (#427) 
+- Allow Py39 in travis tests (#427)
 
 0.15.0 / 2021-04-16
 ---------------------
@@ -310,7 +310,7 @@ patch:
 
 0.14.0 / 2021-04-09
 ---------------------
-- Check data availability for single run (#416) 
+- Check data availability for single run (#416)
 
 0.13.11 / 2021-04-02
 ---------------------
@@ -346,7 +346,7 @@ patch:
 
 0.13.4 / 2021-01-22
 ---------------------
-- Nveto changes + highest density regions (#384) 
+- Nveto changes + highest density regions (#384)
 - Parse requirements for testing (#383)
 - Added keep_columns into docstring (#382)
 - remove slow operators from mongo storage (#382)
@@ -522,7 +522,7 @@ patch:
 ------------------
 - Small bugfixes:
   - Fixes for multi-output plugins
-  - Use frozendict for Plugin.takes_config 
+  - Use frozendict for Plugin.takes_config
 
 0.8.6 / 2020-01-17
 -------------------

diff --git a/README.md b/README.md
@@ -16,4 +16,3 @@ Streaming analysis for xenon experiments
 Strax is an analysis framework for pulse-only digitization data, specialized for live data reduction at speeds of 50-100 MB(raw) / core / sec. For more information, please see the [strax documentation](https://strax.readthedocs.io).
 
 Strax' primary aim is to support noble liquid TPC dark matter searches, such as XENONnT. The XENON-specific algorithms live in the separate package [straxen](https://github.com/XENONnT/straxen). If you want to try out strax, you probably want to start there. This package only contains the core framework and basic algorithms any TPCs would want to use.
-
diff --git a/docs/make_docs.sh b/docs/make_docs.sh
@@ -3,4 +3,4 @@ make clean
 rm -r source/reference
 sphinx-apidoc -o source/reference ../strax
 rm source/reference/modules.rst
-make html
+make html
diff --git a/docs/source/advanced/config.rst b/docs/source/advanced/config.rst
@@ -326,4 +326,3 @@ URL style configuration (used in `straxen <https://github.com/XENONnT/straxen>`_
                     kwargs[k] = v
 
             return self.dispatch(url, **kwargs)
-
diff --git a/docs/source/advanced/out_of_core.rst b/docs/source/advanced/out_of_core.rst
@@ -12,18 +12,18 @@ Out-of-core algorithms usually involve a few repeating steps:
 2. load the data chunk by chunk
 3. perform some computation on each chunk
 4. save a summary of the results for each chunk
-5. perform some combination of the per-chunk results into a final result. 
+5. perform some combination of the per-chunk results into a final result.
 
 While it is of course possible to implement these operations yourself, it can be tedious and repetative and the code becomes very rigid to the specific calculations being performed.
 A better approach is to use abstractions of commonly performed operations that use out-of-core algorithms under the hood to get the same result as if the operations were performed on the entire dataset.
 Code written using these abstractions can then run both on in-memory datasets as well as out-of-core datasets alike.
-More importantly the implmentations of these algorithms can be written once and packaged to then be used by all. 
+More importantly the implmentations of these algorithms can be written once and packaged to then be used by all.
 
 Data chunking
 -------------
 The zarr package provides an abstraction of the data-access api of numpy arrays for chunked and compressed data stored in memory or disk.
 zarr provides an array abstraction with identical behavior to a numpy array when accessing data but where the underlyign data is actually a collection of compressed (optional) chunks.
-the strax context provides a convenience method for loading data directly into zarr arrays. 
+the strax context provides a convenience method for loading data directly into zarr arrays.
 
 .. code-block:: python
 
@@ -35,35 +35,35 @@ the strax context provides a convenience method for loading data directly into z
     zgrp = context.get_zarr(RUN_IDs, DATA_TYPES, **GET_ARRAY_KWARGS)
 
     # the zarr group contains multiple arrays, one for each data type
-    z = zgrp.data_type 
+    z = zgrp.data_type
 
     # individual arrays are also accessible via the __getitem__ interface
     z = zgrp['data_type']
 
     # numpy-like data access, abstracting away the underlying
     # data reading which may include readin multiple chunks from disk/memory
-    # and decompression then concatenation to return an in memory numpy array 
+    # and decompression then concatenation to return an in memory numpy array
     z[:100]
 
 
 Data processing
 ---------------
 The dask package provides abstractions for most of the numpy and pandas apis.
-The dask.Array and dask.DataFrame objects implement their respective apis 
+The dask.Array and dask.DataFrame objects implement their respective apis
 using fully distributed algorithms, only loading a fraction of the total data into memory
 at any given moment for a given computing partition (thread/process/HPC-job).
 
 .. code-block:: python
 
     import dask.array as da
-    
+
     # easily convert to dask.Array abstraction for processing
-    darr = da.from_zarr(z) 
+    darr = da.from_zarr(z)
 
     # its recommended to rechunk to sizes more appropriate for processing
     # see dask documentation for details
     darr.rechunk(CHUNK_SIZE)
 
     # you can also convert the dask.Array abstraction
     # to a dask.DataFrame abstraction if you need the pandas api
-    ddf = darr.to_dask_dataframe()
+    ddf = darr.to_dask_dataframe()
diff --git a/docs/source/advanced/superrun.rst b/docs/source/advanced/superrun.rst
@@ -108,7 +108,7 @@ If you wish to make/store a superrun you have to specify the context option:
     st.set_context_config({'write_superruns': True})
 
 
-Superruns follow the same saving rules (SaveWhen.TARGET, SaveWhen.EXPLICIT or SaveWhen.ALWAYS) as regular runs. 
+Superruns follow the same saving rules (SaveWhen.TARGET, SaveWhen.EXPLICIT or SaveWhen.ALWAYS) as regular runs.
 
 How superruns work
 --------------------
@@ -128,4 +128,4 @@ but which gains from the file are actually used is dependent on the runid.
 
 Thus, superruns won't help build data faster, but they will speed up loading data after it has been
 built. This is important, because strax' overhead for loading a run is larger than hax, due to its
-version and option tracking (this is only true if per-run-default options are allowed).
+version and option tracking (this is only true if per-run-default options are allowed).
diff --git a/docs/source/basics/context.svg b/docs/source/basics/context.svg
diff --git a/docs/source/basics/overview.rst b/docs/source/basics/overview.rst
@@ -199,4 +199,4 @@ You can check the lineage e.g. by using the ``context.key_for`` method (which co
     some_run-peaks-3g2rc4f3jg
     some_run-peaks-vqo4oamp76
 
-For more examples, checkout the developer and advanced documentation.
+For more examples, checkout the developer and advanced documentation.
diff --git a/docs/source/build_release_notes.py b/docs/source/build_release_notes.py
@@ -9,17 +9,17 @@
 
 
 def convert_release_notes():
-    """Convert the release notes to an RST page with links to PRs"""
+    """Convert the release notes to an RST page with links to PRs."""
     this_dir = os.path.dirname(os.path.realpath(__file__))
-    notes = os.path.join(this_dir, '..', '..', 'HISTORY.md')
-    with open(notes, 'r') as f:
+    notes = os.path.join(this_dir, "..", "..", "HISTORY.md")
+    with open(notes, "r") as f:
         notes = f.read()
     rst = convert(notes)
-    with_ref = ''
-    for line in rst.split('\n'):
+    with_ref = ""
+    for line in rst.split("\n"):
         # Get URL for PR
-        if '#' in line:
-            pr_number = line.split('#')[1]
+        if "#" in line:
+            pr_number = line.split("#")[1]
             while len(pr_number):
                 try:
                     pr_number = int(pr_number)
@@ -28,15 +28,16 @@ def convert_release_notes():
                     # Too many tailing characters to be an int
                     pr_number = pr_number[:-1]
             if pr_number:
-                line = line.replace(f'#{pr_number}',
-                                    f'`#{pr_number} <https://github.com/AxFoundation/strax/pull/{pr_number}>`_'
-                                    )
-        with_ref += line + '\n'
-    target = os.path.join(this_dir, 'reference', 'release_notes.rst')
+                line = line.replace(
+                    f"#{pr_number}",
+                    f"`#{pr_number} <https://github.com/AxFoundation/strax/pull/{pr_number}>`_",
+                )
+        with_ref += line + "\n"
+    target = os.path.join(this_dir, "reference", "release_notes.rst")
 
-    with open(target, 'w') as f:
-        f.write(header+with_ref)
+    with open(target, "w") as f:
+        f.write(header + with_ref)
 
 
-if __name__ == '__main__':
+if __name__ == "__main__":
     convert_release_notes()
Original file line number	Diff line number	Diff line change
Expand Up		@@ -6,4 +6,3 @@
		# Isolate binary files in case the auto-detection algorithm fails and
		# marks them as text files (which could brick them).
		*.{png,jpg,jpeg,gif,webp,woff,woff2} binary
Original file line number	Diff line number	Diff line change
Expand Up		@@ -16,4 +16,3 @@ Streaming analysis for xenon experiments
		Strax is an analysis framework for pulse-only digitization data, specialized for live data reduction at speeds of 50-100 MB(raw) / core / sec. For more information, please see the [strax documentation](https://strax.readthedocs.io).

		Strax' primary aim is to support noble liquid TPC dark matter searches, such as XENONnT. The XENON-specific algorithms live in the separate package [straxen](https://github.com/XENONnT/straxen). If you want to try out strax, you probably want to start there. This package only contains the core framework and basic algorithms any TPCs would want to use.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -326,4 +326,3 @@ URL style configuration (used in `straxen <https://github.com/XENONnT/straxen>`_
		kwargs[k] = v

		return self.dispatch(url, **kwargs)