Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix setup and add new schema #1

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

JosePizarro3
Copy link
Collaborator

@JosePizarro3 JosePizarro3 commented Jul 24, 2024

@jkshenton Sathya (sorry, I cannot tag him as he is not part of this repo),

I've been working on some changes in this repo. Maybe you can drop me a review so I can merge and add this new plugin into NOMAD? You can also add it then as part of the NOMAD OASIS and use it in your CCP-NC database.

I still need to:

  • Add testing for the parser under tests/test_parser.py
  • Fix workflow recognition with CASTEP output files

Changes 1 - new structure

I've adopted the new structure for plugins from NOMAD. This means a lot of changes, but the most important are the ones in pyproject.toml and the new folder structure under src. Some extra stuff:

  • You can release this project in PyPI and maintain it with versioning in the way you want.
  • I might need to do minor changes from time to time to reflect changes in nomad.
  • I added instructions in the README.md, so you can test if these work or not for you.
  • I added a MkDocs page with a NOMAD style, that you can of course delete or restructure at will.

Changes 2 - New schema nomad-simulations

I've changed the parsing to the new schema, nomad-simulations. Some points here:

  • You are welcome to adopt or not this schema, as we already talked. If you need more quantities or a different way of parsing, I suggest you work from something up. The example here would be how I used Outputs from the nomad-simulations schema to extend it with NMR specific properties, and how I used PhysicalProperty from this package to define such NMR properties.
  • Now, I think the properties are much nicer parsed, as they contained references to the specific atomic states, more similar to what you have in the magres file.
  • In FAIRmat we are developing right now a taxonomy for materials properties, so anyways these properties might end up defined in the nomad or nomad-simulations repo, as some other users might be using these anyways.
  • As you can see, these properties have some functionalities to extract information (like the isotropic part of the MagneticShielding tensor), and these could be extended. I left it for you to decide whether you want to do this here or in some other repo you have.
  • Ideally, and to keep a nice collaboration, these properties could be perfectly defined in the nomad-simulations repo. I will be happy if you want to contribute there and update these definitions in there. Other users might need them and it connects nicely with the taxonomy effort I comment on above.

@jkshenton
Copy link
Member

Thanks a lot @JosePizarro3 - really appreciate your continued help with this! The restructuring looks fine to me. The new schema (atoms state thing) will be much easier for us to work with I suspect and in general makes a lot of sense.

In terms of contributing to nomad-simulations, we are absolutely keen. The magres format spec document was already some effort to defining a standard schema for computed magres data for the community. I would be very happy to see it adapted for nomad generally. I count >13 periodic electronic structure codes that can calculate magnetic shielding tensors (one of the key properties) so having a general schema that other parsers can work with makes sense to me.

I've manually added @Sathya-S3 now and I think he's in a better position to review the specific code changes having tested out the initial version.

@JosePizarro3 JosePizarro3 self-assigned this Jul 24, 2024
@JosePizarro3
Copy link
Collaborator Author

Great, thanks @jkshenton !

In terms of contributing to nomad-simulations, we are absolutely keen. The magres format spec document was already some effort to defining a standard schema for computed magres data for the community. I would be very happy to see it adapted for nomad generally. I count >13 periodic electronic structure codes that can calculate magnetic shielding tensors (one of the key properties) so having a general schema that other parsers can work with makes sense to me.

Indeed, the magres format is a very easy case where, whatever you decided for the properties should be incorporated directly, while for the symmetry open questions, we should work a bit more on it.

Copy link
Collaborator

@Sathya-S3 Sathya-S3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on presenting the magres parser in such a polished state, Jose. My main comments are related to understanding the workflow links between CASTEP and magres files, which I have highlighted with the relevant code. I'm happy to approve the pull request after we have had a chance to chat with you about the workflow linking and how we can help with further magres parser development (Quantum Espresso workflows etc.).

I have integrated the magres parser as guided by the README instructions with a NOMAD Oasis instance and have verified that the file parsing works flawlessly. The test was performed with an ethanol CASTEP-magres workflow files archive.

pyproject.toml Show resolved Hide resolved
"Documentation" = "https://nomad-coe.github.io/nomad-parser-magres/"
"Homepage" = "https://github.com/CCP-NC/nomad-parser-magres"
"Bug Tracker" = "https://github.com/CCP-NC/nomad-parser-magres/issues"
"Documentation" = "https://CCP-NC.github.io/nomad-parser-magres/"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation page "https://CCP-NC.github.io/nomad-parser-magres/" does not exist at the time of review. I assume this will be auto-rectified when the changes are merged into "master branch". Will run a quick verification after merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we solved this last time we talk, the docu page has to be deployed in the settings of the repo. Check this: https://www.mkdocs.org/user-guide/deploying-your-docs/#github-pages

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkshenton for your attention

Comment on lines +613 to +663
self, nmr_first_principles_archive: 'EntryArchive'
):
"""
Automatically parses the NMR Magres workflow. Here, `self.archive` is the
NMR magres archive in which we will link the original NMR first principles (CASTEP
or QuantumESPRESSO) entry.

Args:
nmr_first_principles_archive (EntryArchive): the NMR (first principles) CASTEP or QuantumESPRESSO archive.
"""
workflow = NMRMagRes(method=NMRMagResMethod(), results=NMRMagResResults())
workflow.name = 'NMR Magres'

# ! Fix this once CASTEP and QuantumESPRESSO use the new `nomad-simulations` schema under 'data'
# Method
# method_nmr = extract_section(nmr_first_principles_archive, ['run', 'method'])
# workflow.method.nmr_method_ref = method_nmr

# Inputs and Outputs
# ! Fix this to extract `input_structure` from `nmr_first_principles_archive` once
# ! CASTEP and QuantumESPRESSO use the new `nomad-simulations` schema under 'data'
input_structure = extract_section(self.archive, ['data', 'model_system'])
nmr_magres_calculation = extract_section(self.archive, ['data', 'outputs'])
if input_structure:
workflow.m_add_sub_section(
NMRMagRes.inputs, Link(name='Input structure', section=input_structure)
)
if nmr_magres_calculation:
workflow.m_add_sub_section(
NMRMagRes.outputs,
Link(name='Output NMR calculation', section=nmr_magres_calculation),
)

# NMR (first principles) task
# ! Fix this once CASTEP and QuantumESPRESSO use the new `nomad-simulations` schema under 'data'
program_name = nmr_first_principles_archive.run[-1].program.name
if nmr_first_principles_archive.workflow2:
task = TaskReference(task=nmr_first_principles_archive.workflow2)
task.name = f'NMR FirstPrinciples {program_name}'
if input_structure:
task.inputs = [Link(name='Input structure', section=input_structure)]
if nmr_magres_calculation:
task.outputs = [
Link(
name='Output NMR calculation',
section=nmr_magres_calculation,
)
]
workflow.m_add_sub_section(NMRMagRes.tasks, task)

self.archive.workflow2 = workflow
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, the workflow separates the .castep (NMR first principles calculation) file and .magres file resulting from the calculation as two individual entries, with both having display names " CASTEP NMR SinglePoint Simulation". Ideally, since the magres file is a direct output of the CASTEP calculation, it might be beneficial to include it directly to the outputs of the CASTEP first principles calculation in the workflow.

Will the "fix" mentioned in the code comments correspond to fixing how the workflow is processed currently? Or is it related only to the adoption of the new 'nomad-simulations' schema?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, after we talked: we agreed that the separation is somewhat an artifact, but we can keep it just in case someone uploads simply a magres file. In any case, I have to fix this workflow part; it is not working right now but I know how to handle it.

if outputs is not None:
simulation.outputs.append(outputs)

# Try to resolve the `entry_id` and `mainfile` of other entries in the upload to connect the magres entry with the CASTEP or QuantumESPRESSO entry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the metadata section of file upload, there is a files section that lists all files included in the upload. Is this the only intended result? There is mention of linking the NMR workflow directly in the magres entry, could you please clarify how it is done? As per my previous comments, we see two separate file entries and workflows for the .castep and .magres files.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are more to connect the castep with the magres entries

#
from nomad.metainfo import SubSection, Quantity, Reference
from nomad.datamodel.metainfo.simulation.method import Method
from nomad.metainfo import Quantity, Reference, SubSection
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This NMRMagRes class is defined to generate an extra EntryArchive if both NMR first principles and NMR magres SinglePoint EntryArchives are present. Could you please clarify how this works in linking the workflows as mentioned in the parser.py script?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea's that the magres file creates an entry which connects with the CASTEP or QE one. In this entry, you will find all the details of the original DFT calculation + the output nmr properties.

The creation of such a workflow in the magres entry is just an organizational tool. This is actually what workflows should be about, just organizing in one way without cluttering entries which could be reuse for something else (in this case, the CASTEP or QE ones).

In any case, this can be changed later. We could decide to integrate the workflow better, hence just adding a few lines in the CASTEP and QE parsers to handle the possibility of a magres output file.

README.md Show resolved Hide resolved
@JosePizarro3
Copy link
Collaborator Author

@Sathya-S3 I will do a final clean up next week and push again. I think I can integrate your comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants