Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add crystfel streams to rs/io module #50

Merged
merged 6 commits into from
Feb 12, 2021

Conversation

marinegor
Copy link
Contributor

This PR adds:

  • reading CrystFEL's stream files into DataSet format without any metadata
  • computing columns ewald_offset, s1{x,y,z} upon reading
  • accompanying tests
  • example of CrystFEL's stream to test on.

@JBGreisman JBGreisman added API Interface Decisions enhancement Improvement to existing feature labels Feb 10, 2021
@JBGreisman JBGreisman self-assigned this Feb 10, 2021
@JBGreisman
Copy link
Member

Overall, this looks mostly good to me. I'm gonna review and request a few minor revisions.

High-level questions:

  • is a CrystFEL stream file only applicable to unmerged data? If so, we should explicitly set DataSet.merged = False
  • as is, DataSet.spacegroup and DataSet.cell are left unset. Is the spacegroup or cell parameters accessible in the CrystFEL stream? It may be a good idea to parse them out if that's possible -- I'd imagine cell parameters may not be reasonable to get this way for a serial experiment, but perhaps they could be averaged in a meaningful way?
  • It looks like read_crystfel() was given an optional logfile=None argument. I'm assuming this is a vestige from read_precognition(), but just wanted to check whether there are any additional arguments that this should take -- this could be a good place to add optional, user-specified spacegroup/cell parameters in case they aren't available in the stream itself.

@JBGreisman
Copy link
Member

JBGreisman commented Feb 10, 2021

I just noticed the test data is 98MB -- it looks like it's a raw text with many "crystal" entries, that I'm assuming belong to a serial experiment. We try to keep this test data fairly minimal to what's needed to assess the core functionality.

Do you mind reducing the data by only including something like 2 crystals? Whatever seems like a reasonable minimum number to test the parser.

Copy link
Member

@JBGreisman JBGreisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! On the whole it all looks great. All my comments were very minor. The only main things in my mind to change are:

  1. Let's reduce the test file size
  2. I think it would make sense to set the spacegroup, cell, and merged attributes, if at all possible

I'm happy to make most of the above minor revisions that I commented on, but let me know how you want to handle the above two things

@marinegor
Copy link
Contributor Author

marinegor commented Feb 12, 2021

Hi @JBGreisman , thanks for your detailed reply. I resolved most of your comments, here is the detailed reply:

is a CrystFEL stream file only applicable to unmerged data? If so, we should explicitly set DataSet.merged = False

Only unmerged, right. The flag is now set.

Is the spacegroup or cell parameters accessible in the CrystFEL stream?

During stream stage of processing, the symmetry is considered to be unknown. The only things that stream contains are cell parameters, lattice type and centering (you can see the notation yourself in the stream header, search for 'Begin unit cellline. If you know how to specify them in theDataset.spacegroup`, I'm open for your suggestions.

It may be a good idea to parse them out if that's possible -- I'd imagine cell parameters may not be reasonable to get this way for a serial experiment, but perhaps they could be averaged in a meaningful way?

Sure, I added parsing of cell parameters from the header and added respective test for equality (np.allclose-ity, to be precise).

Do you mind reducing the data by only including something like 2 crystals? Whatever seems like a reasonable minimum number to test the parser.

I choose 3 -- like odd numbers. My only with was for the stream to have crystals with both positive and negative ewald_offset for one of the tests -- I overdid this, for sure.

@codecov-io
Copy link

codecov-io commented Feb 12, 2021

Codecov Report

Merging #50 (6ba9d5e) into master (fea9493) will decrease coverage by 0.40%.
The diff coverage is 95.10%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #50      +/-   ##
==========================================
- Coverage   99.17%   98.77%   -0.41%     
==========================================
  Files          34       35       +1     
  Lines        1340     1470     +130     
==========================================
+ Hits         1329     1452     +123     
- Misses         11       18       +7     
Flag Coverage Δ
unittests 98.77% <95.10%> (-0.41%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
reciprocalspaceship/io/crystfel.py 94.57% <94.57%> (ø)
...alspaceship/algorithms/scale_merged_intensities.py 100.00% <100.00%> (ø)
reciprocalspaceship/io/__init__.py 100.00% <100.00%> (ø)
reciprocalspaceship/io/precognition.py 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a5ef47...6ba9d5e. Read the comment docs.

@JBGreisman
Copy link
Member

good catch with the BatchDtype

Copy link
Member

@JBGreisman JBGreisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'll merge this in shortly.

@JBGreisman JBGreisman merged commit a6b6fc6 into rs-station:master Feb 12, 2021
@JBGreisman
Copy link
Member

Thanks @marinegor!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Interface Decisions enhancement Improvement to existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants