-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search analytics #6019
Search analytics #6019
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I haven't heavily reviewed the graphing. I'd like to keep to the same libraries we're already using, so if you can use the same stuff as the ad code, that would be great. I don't feel strongly though.
readthedocs/search/models.py
Outdated
_('Query'), | ||
max_length=4092, | ||
) | ||
count = models.PositiveIntegerField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we want more data here. Should we be storing an object each time a search happens? That way we can show the frequency of a search over time. Currently, this only tells us how many times a search has happened.
I think if we plan to delete the data every 3 months, we can probably store every search query with it's own timestamp. I'm fine with shipping this initially though, before we start storing a lot more data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you meant be more data
.
Storing search object everytime a search was made is a better idea. I realised that the graphs were wrong before. And going this way makes them correct and easier.
That way we can show the frequency of a search over time.
Can you exapand this more?
- Do we want this to be selected by user? Like the user can select a date and we show him the frequency of searches made vs time for that day.
- Or we just show this for today/yesterday?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just that we will be able to see when a specific search was done each time it was done. The current modeling only shows the number of times a search was done, but no time data about each search.
@ericholscher Screenshot - https://ibb.co/xzX54Ty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good with a few small nits. I'll go ahead and merge this to get the modeling shipped, but we should clean up some of these tidbits.
project_slug | ||
) | ||
# data for plotting the doughnut-chart | ||
distribution_of_top_queries = SearchQuery.generate_distribution_of_top_queries( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried this will be slow in production after we have a lot of data, but we can deal with it then.
|
||
response = HttpResponse(content_type='text/csv') | ||
response['Content-Disposition'] = f'attachment; filename="{file_name}"' | ||
template = loader.get_template('projects/search_analytics/csv_data_template.txt') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we writing this with a template instead of a CSV library?
verbose_name=_('Version'), | ||
related_name='search_queries', | ||
on_delete=models.CASCADE, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we really even want to cascade these deletes. Is there a reason we don't want to store Version here as a string, so we can keep them forever even if a version is deleted?
.order_by('created_date') | ||
.annotate(count=Count('id')) | ||
.values_list('created_date', 'count') | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really slow. We will see in prod, hopefully it won't be an issue.
|
||
|
||
@app.task(queue='web') | ||
def record_search_query(project_slug, version_slug, query, total_results): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm yea, that seems less than ideal. We should probably think more about the right approach for "search as you type" -- probably adding Autocomplete vs. search as you type in some cases.
|
||
project_qs = Project.objects.filter(slug=project_slug) | ||
|
||
if not project_qs.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably log a warning.
@@ -0,0 +1,3 @@ | |||
serial_no,date_time,query | |||
{% for row in data %}{{ forloop.counter }},"{{ row.0|addslashes }}","{{ row.1|addslashes }}" | |||
{% endfor %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely we should use the csv
library for this.
Closes #5967
WIP