-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
include OAI harvest details in Job details #374
Comments
My main concern is that the code I'm using to do this seems inefficient in a way that may or may not matter and if it does matter may or may not be able to be improved. See #422 |
Oooooo, this is awesome! I think this is exactly what some people have been asking for. I do think you're right though, as it stands now, looping through all the records, may be ultimately inefficient for large Jobs. Thankfully, I think we could lean on the Django/Mongo ORM to pull these counts pretty quickly. Don't have a one liner handy, but something like the following might work:
Then, to avoid calculating these each time a Job is loaded, one option may be to store in a Job's But it's looking awesome. Happy to keep spitballing, but in short, I would think leaning on the ORM might be a good option. And if necessary, it that ends up being costly for huge jobs, could count with Spark and write to |
It turns out that mongo totally has a function to do this: |
Brilliant! 😎 |
Per a recommendation, include some statistics / insight into an OAI harvest. Specifically -- low hanging fruit -- what sets were harvested, and distribution of records across sets.
The text was updated successfully, but these errors were encountered: