Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate section information from new Reach implementation #1399

Merged
merged 7 commits into from
Nov 25, 2022

Conversation

bgyori
Copy link
Member

@bgyori bgyori commented Nov 24, 2022

This PR adapts to recent changes in Reach for extracting section information when reading nxml files. There was an old implementation of this but Reach stopped producing section names at some point, and the new reinstated implementation is different, so the code on the INDRA side also had to be adapted. I did some empirical statistics on the kinds of (unnormalized) section names that occur and made improvements to their normalization.

Independently, it looks like PubMed changed their search API to return a maximum of 10k instead of 100k IDs for searches, requiring updates to tests. I also improved the way we get MeSH IDs from non-standard MeSH URNs from MedScan.

@bgyori bgyori merged commit 1e0eda4 into sorgerlab:master Nov 25, 2022
@bgyori bgyori deleted the reach_sections branch November 25, 2022 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant