-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
7429/feature/add trending score to solr #9878
7429/feature/add trending score to solr #9878
Conversation
3673c5e
to
8fb48c7
Compare
74fdc7a
to
ff3dbf3
Compare
ff3dbf3
to
6d56b8b
Compare
ebfa080
to
6aef600
Compare
…benbdeitch/openlibraryfork into 7429/feature/add-trending-score-to-solr
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9878 +/- ##
==========================================
+ Coverage 16.06% 17.12% +1.06%
==========================================
Files 90 89 -1
Lines 4769 4752 -17
Branches 832 831 -1
==========================================
+ Hits 766 814 +48
+ Misses 3480 3428 -52
+ Partials 523 510 -13 ☔ View full report in Codecov by Sentry. |
…and line, rather than through an environment variable.
eb038d6
to
649ae9b
Compare
for more information, see https://pre-commit.ci
@@ -0,0 +1,17 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benbdeitch notes this should be deleted; but maybe copy over the comments to the other file!
@@ -53,6 +53,10 @@ def can_update_key(key: str) -> bool: | |||
return any(updater.key_test(key) for updater in get_solr_updaters()) | |||
|
|||
|
|||
async def in_place_update(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todo: investigate this one 😂
def build_subjects(self) -> dict: | ||
@property | ||
def trending_z_score(self) -> float: | ||
return self._work.get("trending_z_score", 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might not work, since _work
is a database work, not a solr work.
I think the way we might have to solve this is to make a network requests to fetch solr work data (I don't think we currently use the solr work data anywhere). We do something sort of like this in the author updater; since that has to make a network request to fetch all the author's books in solr. See if we can copy that rough pattern about when/how to make the network request. This is slightly uncharted territory, so might require a unique approach.
The way to test this is:
- Add the book to reading logs/etc so that it's z-score gets computed and saved in solr
- Confirm the trending score is non-zero in solr
- Edit the the work in question ; this will trigger a new reindex of the work record
Expected: The trending score should be copied over from the previous solr record
Actual: The trending score gets zero-d by this line.
Can this one be closed in favor of #10057? And if so @benbdeitch do you mind doing that? |
Closes #7429
This PR adds support for trending scores to Solr, allowing us to better track which works are achieving a statistically notable increase in popularity. It adds several new fields, and comes with two scripts to be run-- one daily, the other hourly, to keep this information constantly up to date.
Currently, it's still in draft mode, as there is currently no code to automatically run the scripts.
Technical
This implementation uses Solr's ability to update documents in place, which requires the new trending fields to not be stored or indexed, and instead treated as a
docValue
. Essentially, they are left out of Solr's inverted index, and instead treated as a more usual document-to-value mapping.This is both A) more performant than atomic updates, and B) avoids the issues that atomic updates can have with copyfield values.
The relevant cron commands are located in an added file,
docker/cron.local
docker compose up
.key:"/works/OL54120W"
), and check to ensure that the new fields are present.docker/cron.local
file to run the cron jobs in, along with a new container. Change the times on the cron tasks to run more frequently; (* * * * *) will make them run every minute.dbnet
andwebnet
networks, and hasdepends on: db
.Screenshot
Stakeholders
@cdrini