Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search analytics #6019

Merged
merged 52 commits into from
Aug 7, 2019
Merged

Search analytics #6019

merged 52 commits into from
Aug 7, 2019

Conversation

dojutsu-user
Copy link
Member

@dojutsu-user dojutsu-user commented Jul 31, 2019

Closes #5967
WIP

@dojutsu-user dojutsu-user added the PR: work in progress Pull request is not ready for full review label Jul 31, 2019
@dojutsu-user dojutsu-user requested a review from a team July 31, 2019 14:47
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I haven't heavily reviewed the graphing. I'd like to keep to the same libraries we're already using, so if you can use the same stuff as the ad code, that would be great. I don't feel strongly though.

readthedocs/search/utils.py Outdated Show resolved Hide resolved
readthedocs/search/utils.py Outdated Show resolved Hide resolved
readthedocs/settings/base.py Outdated Show resolved Hide resolved
_('Query'),
max_length=4092,
)
count = models.PositiveIntegerField(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we want more data here. Should we be storing an object each time a search happens? That way we can show the frequency of a search over time. Currently, this only tells us how many times a search has happened.

I think if we plan to delete the data every 3 months, we can probably store every search query with it's own timestamp. I'm fine with shipping this initially though, before we start storing a lot more data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what you meant be more data.

Storing search object everytime a search was made is a better idea. I realised that the graphs were wrong before. And going this way makes them correct and easier.

That way we can show the frequency of a search over time.

Can you exapand this more?

  • Do we want this to be selected by user? Like the user can select a date and we show him the frequency of searches made vs time for that day.
  • Or we just show this for today/yesterday?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just that we will be able to see when a specific search was done each time it was done. The current modeling only shows the number of times a search was done, but no time data about each search.

@dojutsu-user dojutsu-user requested review from ericholscher and a team August 2, 2019 18:11
@dojutsu-user dojutsu-user self-assigned this Aug 2, 2019
@dojutsu-user
Copy link
Member Author

dojutsu-user commented Aug 3, 2019

@ericholscher
I have updated the PR.
And also added a download button which allows proj admins to download all data in csv format.

Screenshot - https://ibb.co/xzX54Ty

@dojutsu-user dojutsu-user changed the title [WIP] Search analytics Search analytics Aug 5, 2019
@dojutsu-user dojutsu-user removed the PR: work in progress Pull request is not ready for full review label Aug 5, 2019
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with a few small nits. I'll go ahead and merge this to get the modeling shipped, but we should clean up some of these tidbits.

project_slug
)
# data for plotting the doughnut-chart
distribution_of_top_queries = SearchQuery.generate_distribution_of_top_queries(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried this will be slow in production after we have a lot of data, but we can deal with it then.


response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = f'attachment; filename="{file_name}"'
template = loader.get_template('projects/search_analytics/csv_data_template.txt')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we writing this with a template instead of a CSV library?

verbose_name=_('Version'),
related_name='search_queries',
on_delete=models.CASCADE,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we really even want to cascade these deletes. Is there a reason we don't want to store Version here as a string, so we can keep them forever even if a version is deleted?

.order_by('created_date')
.annotate(count=Count('id'))
.values_list('created_date', 'count')
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really slow. We will see in prod, hopefully it won't be an issue.



@app.task(queue='web')
def record_search_query(project_slug, version_slug, query, total_results):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm yea, that seems less than ideal. We should probably think more about the right approach for "search as you type" -- probably adding Autocomplete vs. search as you type in some cases.


project_qs = Project.objects.filter(slug=project_slug)

if not project_qs.exists():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably log a warning.

@@ -0,0 +1,3 @@
serial_no,date_time,query
{% for row in data %}{{ forloop.counter }},"{{ row.0|addslashes }}","{{ row.1|addslashes }}"
{% endfor %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely we should use the csv library for this.

@ericholscher ericholscher merged commit f9f6c53 into readthedocs:master Aug 7, 2019
@dojutsu-user dojutsu-user deleted the search-analytics branch August 7, 2019 19:03
@dojutsu-user dojutsu-user restored the search-analytics branch August 7, 2019 19:04
@dojutsu-user dojutsu-user deleted the search-analytics branch August 8, 2019 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide search analytics to users.
2 participants