include OAI harvest details in Job details #374

ghukill · 2019-01-10T18:35:06Z

Per a recommendation, include some statistics / insight into an OAI harvest. Specifically -- low hanging fruit -- what sets were harvested, and distribution of records across sets.

antmoth · 2019-06-20T21:00:05Z

So, I made it do this:

My main concern is that the code I'm using to do this seems inefficient in a way that may or may not matter and if it does matter may or may not be able to be improved. See #422

ghukill · 2019-06-21T19:19:55Z

Oooooo, this is awesome! I think this is exactly what some people have been asking for.

I do think you're right though, as it stands now, looping through all the records, may be ultimately inefficient for large Jobs. Thankfully, I think we could lean on the Django/Mongo ORM to pull these counts pretty quickly.

Don't have a one liner handy, but something like the following might work:

# get Job's records as QuerySet (MongoEngine, but very similar to native Django SQL ORM)
job_records = Job.objects.get(pk=224).get_records()

# get OAI sets from Job records
job_records.values_list('oai_set').distinct('oai_set')
Out[21]: 
['wayne:collectioncfai',
 'wayne:collectionhermanmiller',
 'wayne:collectionrencen',
 'wayne:collectionmim']

job_records.filter(oai_set='wayne:collectioncfai').count()
Out[22]: 2292

Then, to avoid calculating these each time a Job is loaded, one option may be to store in a Job's job_details, which is a JSON object that is storing these exact kinds of things (field mapping metrics, etc.). Could be stored on Job finish, or first time Job is loaded.

But it's looking awesome. Happy to keep spitballing, but in short, I would think leaning on the ORM might be a good option. And if necessary, it that ends up being costly for huge jobs, could count with Spark and write to job_details that way.

antmoth · 2019-06-21T20:55:59Z

It turns out that mongo totally has a function to do this: item_frequencies. New commit pushed!

ghukill · 2019-06-21T21:30:54Z

Brilliant! 😎

ghukill added the enhancement label Jan 10, 2019

ghukill changed the title ~~OAI harvest details~~ include OAI harvest details in Job details Jan 10, 2019

antmoth self-assigned this Jun 19, 2019

antmoth mentioned this issue Jun 24, 2019

Includes very basic harvest stats in details tab #422

Merged

antmoth closed this as completed Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include OAI harvest details in Job details #374

include OAI harvest details in Job details #374

ghukill commented Jan 10, 2019

antmoth commented Jun 20, 2019 •

edited

Loading

ghukill commented Jun 21, 2019

antmoth commented Jun 21, 2019

ghukill commented Jun 21, 2019

include OAI harvest details in Job details #374

include OAI harvest details in Job details #374

Comments

ghukill commented Jan 10, 2019

antmoth commented Jun 20, 2019 • edited Loading

ghukill commented Jun 21, 2019

antmoth commented Jun 21, 2019

ghukill commented Jun 21, 2019

antmoth commented Jun 20, 2019 •

edited

Loading