Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DataSet.select_mtzdtype() to subset DataSet by MTZ column type #150

Merged
merged 6 commits into from
May 23, 2022

Conversation

JBGreisman
Copy link
Member

This PR adds a new function to DataSet, select_mtzdtype():

In [1]: rs.DataSet.select_mtzdtype?
Signature: rs.DataSet.select_mtzdtype(self, dtype)
Docstring:
Return the subset of the DataSets columns that are of the given dtype.

Parameters
----------
dtype : str or instance of MTZDtype class
    Single-letter MTZ code, name, or MTZDtype class to return

Returns
-------
DataSet
    Subset of the DataSet with columns matching the requested dtype. If
    no columns of the requested dtype are found, an empty DataSet is
    returned.

Raises
------
ValueError
    If `dtype` is not a string nor a MTZDtype class
File:      ~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/dataset.py
Type:      function

This PR fixes #104, making it easier to subset a DataSet to columns of a desired MTZ column type.

@JBGreisman JBGreisman added API Interface Decisions enhancement Improvement to existing feature MTZDtypes Issues related to custom dtypes labels May 20, 2022
@JBGreisman JBGreisman requested a review from kmdalton May 20, 2022 14:28
@codecov-commenter
Copy link

codecov-commenter commented May 20, 2022

Codecov Report

Merging #150 (7889081) into main (f7d2a2c) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #150      +/-   ##
==========================================
- Coverage   98.46%   98.46%   -0.01%     
==========================================
  Files          43       43              
  Lines        1696     1690       -6     
==========================================
- Hits         1670     1664       -6     
  Misses         26       26              
Flag Coverage Δ
unittests 98.46% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
reciprocalspaceship/dataset.py 98.04% <100.00%> (-0.03%) ⬇️
reciprocalspaceship/concat.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f7d2a2c...7889081. Read the comment docs.

Copy link
Member

@kmdalton kmdalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great. i made one small suggestion which might improve the test a tad.

tests/test_dataset.py Outdated Show resolved Hide resolved
@JBGreisman
Copy link
Member Author

Small bug:

In [1]: mtz.head()
Out[1]: 
            BATCH         I      SIGI  PARTIAL
H   K   L                                     
19  -11 2    1424 2066.5464 32.449394    False
3   18  -1    328 2354.8054 29.814323    False
-17 21  17    576  20.76507 6.5568223    False
21  26  -7     70 23.690462 6.6323786    False
-9  17  11    798 459.89502 17.939333    False

In [2]: mtz.select_mtzdtype("B") # Select BatchDtype
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/commandline/mtzdump.py in <module>
----> 1 mtz.select_mtzdtype("B") # Select BatchDtype

~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/dataset.py in select_mtzdtype(self, dtype)
    565             # One-letter code
    566             if len(dtype) == 1:
--> 567                 return self[[k for k in self if self[k].dtype.mtztype == dtype]]
    568             else:
    569                 return self[[k for k in self if self[k].dtype.name == dtype]]

~/Documents/Hekstra_Lab/github/reciprocalspaceship/reciprocalspaceship/dataset.py in <listcomp>(.0)
    565             # One-letter code
    566             if len(dtype) == 1:
--> 567                 return self[[k for k in self if self[k].dtype.mtztype == dtype]]
    568             else:
    569                 return self[[k for k in self if self[k].dtype.name == dtype]]

AttributeError: 'numpy.dtype[bool_]' object has no attribute 'mtztype'

This error occurs because only the custom MTZDtypes have the mtztype attribute, so this line fails for non-custom dtypes, such as the numpy bool dtype here.

# One-letter code
if len(dtype) == 1:
    return self[[k for k in self if self[k].dtype.mtztype == dtype]]

This can be fixed by first checking whether the mtztype attribute exists before checking its value:

# One-letter code
if len(dtype) == 1:
    return self[[k for k in self if hasattr(self[k].dtype, "mtztype") and self[k].dtype.mtztype == dtype]]

@JBGreisman JBGreisman requested a review from kmdalton May 23, 2022 14:48
Copy link
Member

@kmdalton kmdalton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only lingering question here is whether to explicitly test HKL dtypes outside of the index. I'm going to approve it, because I think that call is up to you.

reciprocalspaceship/dataset.py Show resolved Hide resolved
tests/test_dataset.py Outdated Show resolved Hide resolved
@JBGreisman
Copy link
Member Author

I updated the test to parametrize calls to reset_index(). This allows for HKLIndexDtypes to be caught by the test. I agree this makes the test more thorough.

Once CI passes, I'll merge this PR in.

@JBGreisman JBGreisman merged commit 118be80 into main May 23, 2022
@JBGreisman JBGreisman deleted the select_mtzdtypes branch May 23, 2022 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Interface Decisions enhancement Improvement to existing feature MTZDtypes Issues related to custom dtypes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataSet.select_dtypes does not support custom dtypes
3 participants