Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DailyMed NDC->Image File - Initial Work #318

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

jrlegrand
Copy link
Member

Resolves #309

Explanation

While #309 isn't 100% solved, I wanted to start here and treat future enhancements as separate issues to sort of start cleaning up long-standing branches.

This PR provides a way to extract 1 part of the 5 parts of zipped file of DailyMed full prescription SPL data. You can change the part number to download all 5. We need to create an enhancement issue for automating 1-5 with sequential tasks. Honestly the reason I haven't prioritized this is because my hard drive space is horribly low and I can only download one at a time anyway.

This will extract all the files and run XSLT against them to do the following:

  1. Map each SPL to metadata and lists of things contained in the SPL, like image names, NDCs, and components
  2. Try to RegEx match NDCs in image names and compare the matches to the valid NDCs for that SPL
  3. Try to RegEx match NDCs in PRINCIPAL DISPLAY PANEL component sections of the SPL and try to map them to the image files contained within that same component
  4. OCR - not fully built out or tested yet
  5. Barcode scanning - not fully built out or tested yet

Rationale

This gets us to around 50k NDC->image mappings. There is still work to be done to understand the true denominator of label images that are out there in order to understand the delta between where we are now and how we would need to get to 100%.

Tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DailyMed XML processing for NDC -> image
2 participants