Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7429/feature/add trending score to solr #9878

Conversation

benbdeitch
Copy link
Collaborator

@benbdeitch benbdeitch commented Sep 13, 2024

Closes #7429

This PR adds support for trending scores to Solr, allowing us to better track which works are achieving a statistically notable increase in popularity. It adds several new fields, and comes with two scripts to be run-- one daily, the other hourly, to keep this information constantly up to date.

Currently, it's still in draft mode, as there is currently no code to automatically run the scripts.

Technical

This implementation uses Solr's ability to update documents in place, which requires the new trending fields to not be stored or indexed, and instead treated as a docValue. Essentially, they are left out of Solr's inverted index, and instead treated as a more usual document-to-value mapping.

This is both A) more performant than atomic updates, and B) avoids the issues that atomic updates can have with copyfield values.

The relevant cron commands are located in an added file, docker/cron.local

  1. Delete your solr container and all related volumes.
  2. Run docker compose up.
  3. Going to your local solr instance, run a search for a work on Solr (e.g. key:"/works/OL54120W"), and check to ensure that the new fields are present.
  4. Save a work to your 'want-to-read' list.
  5. Set up a docker/cron.local file to run the cron jobs in, along with a new container. Change the times on the cron tasks to run more frequently; (* * * * *) will make them run every minute.
  6. Make sure the container has access to both dbnet and webnet networks, and has depends on: db.
  7. After a minute or so, run the search on Solr again, and see if the appropriate trending fields have updated. You can also check the logs of the cron-jobs container in Docker, to see if they're running correctly.

Screenshot

Stakeholders

@cdrini

@benbdeitch benbdeitch marked this pull request as draft September 13, 2024 22:13
@github-actions github-actions bot added the Priority: 2 Important, as time permits. [managed] label Sep 13, 2024
@benbdeitch benbdeitch force-pushed the 7429/feature/add-trending-score-to-solr branch from 3673c5e to 8fb48c7 Compare September 14, 2024 19:22
@mekarpeles mekarpeles added this to the Sprint 2024-09 milestone Sep 15, 2024
@benbdeitch benbdeitch force-pushed the 7429/feature/add-trending-score-to-solr branch from 74fdc7a to ff3dbf3 Compare October 1, 2024 22:51
@benbdeitch benbdeitch force-pushed the 7429/feature/add-trending-score-to-solr branch from ff3dbf3 to 6d56b8b Compare October 1, 2024 22:51
@benbdeitch benbdeitch marked this pull request as ready for review October 1, 2024 22:55
@benbdeitch benbdeitch force-pushed the 7429/feature/add-trending-score-to-solr branch from ebfa080 to 6aef600 Compare October 1, 2024 22:59
@mekarpeles mekarpeles modified the milestones: Sprint 2024-09, 2024-11 Oct 25, 2024
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 21.27660% with 37 lines in your changes missing coverage. Please review.

Project coverage is 17.12%. Comparing base (ce16a79) to head (a5545e4).
Report is 537 commits behind head on master.

Files with missing lines Patch % Lines
openlibrary/plugins/openlibrary/js/signup.js 40.00% 13 Missing and 2 partials ⚠️
...ary/plugins/openlibrary/js/editions-table/index.js 0.00% 9 Missing and 3 partials ⚠️
...y/plugins/openlibrary/js/bulk-tagger/BulkTagger.js 0.00% 9 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9878      +/-   ##
==========================================
+ Coverage   16.06%   17.12%   +1.06%     
==========================================
  Files          90       89       -1     
  Lines        4769     4752      -17     
  Branches      832      831       -1     
==========================================
+ Hits          766      814      +48     
+ Misses       3480     3428      -52     
+ Partials      523      510      -13     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benbdeitch benbdeitch force-pushed the 7429/feature/add-trending-score-to-solr branch from eb038d6 to 649ae9b Compare November 14, 2024 16:52
@@ -0,0 +1,17 @@

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benbdeitch notes this should be deleted; but maybe copy over the comments to the other file!

@@ -53,6 +53,10 @@ def can_update_key(key: str) -> bool:
return any(updater.key_test(key) for updater in get_solr_updaters())


async def in_place_update():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: investigate this one 😂

def build_subjects(self) -> dict:
@property
def trending_z_score(self) -> float:
return self._work.get("trending_z_score", 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might not work, since _work is a database work, not a solr work.

I think the way we might have to solve this is to make a network requests to fetch solr work data (I don't think we currently use the solr work data anywhere). We do something sort of like this in the author updater; since that has to make a network request to fetch all the author's books in solr. See if we can copy that rough pattern about when/how to make the network request. This is slightly uncharted territory, so might require a unique approach.

The way to test this is:

  1. Add the book to reading logs/etc so that it's z-score gets computed and saved in solr
  2. Confirm the trending score is non-zero in solr
  3. Edit the the work in question ; this will trigger a new reindex of the work record

Expected: The trending score should be copied over from the previous solr record
Actual: The trending score gets zero-d by this line.

@mekarpeles
Copy link
Member

mekarpeles commented Dec 2, 2024

Can this one be closed in favor of #10057? And if so @benbdeitch do you mind doing that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Can it be closed? Priority: 2 Important, as time permits. [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add trending score to solr
6 participants