Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beyond word clouds #95912

Closed
markharwood opened this issue Mar 31, 2021 · 4 comments
Closed

Beyond word clouds #95912

markharwood opened this issue Mar 31, 2021 · 4 comments
Labels
enhancement New value added to drive a business result Feature:Lens impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@markharwood
Copy link
Contributor

markharwood commented Mar 31, 2021

Word cloud visualizations are nice eye candy but for practical use have a number of issues. They have been called "the pie chart of text data" and I find it hard to disagree.

Making sense of popular/significant terms can be greatly improved if we also provide a degree of clustering in the visualization.
As a real-world and topical example, here are the significant words generated from today's news headlines and rendered as a typical word cloud (using this):

Word_Cloud_Generator

The user is left wondering if Joe Biden's dog has anything to do with the Suez Canal and if Deliveroo drivers have been involved in a biting incident. If we use the adjacency matrix aggregation we can cluster these same terms by their co-occurrence and use a Graph visualization to give a much more useful summary of today's news:
Kibana-52
We can clearly see that it was Biden's dog in the biting incident and that it was the ever given megaship stuck in the Suez canal. In my prototype these relationship lines that connect terms can also be clicked and a highlighter can be used to show where the connected terms were used in the original text:
Kibana
This style of interaction helps users quickly remove the mystery by providing the missing context.
Even if we don't adopt a graph visualization, the clusters produced by the adjacency matrix aggregation can be of use in colouring words based on the clusters they sit in.

It's also worth mentioning again that text fields are currently not supported in word cloud visualizations and the significant_text aggregation was specifically designed for producing these sorts of word discoveries from text fields, with special support for eliminating junk words from noisy text.

@markharwood markharwood added Feature:Tagcloud Tag cloud visualization feature enhancement New value added to drive a business result labels Mar 31, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Mar 31, 2021
@stratoula stratoula added the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Apr 5, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Apr 5, 2021
@monfera
Copy link
Contributor

monfera commented Apr 6, 2021

Agree, esp. if there's a need to show relations and not just allude to term frequencies, and there's enough space for the links and proximity layout to work. Word cloud is indeed a bit like the pie chart, can even be circular :-) elastic/elastic-charts#1038

@markharwood
Copy link
Contributor Author

markharwood commented Apr 7, 2021

Agree, esp. if there's a need to show relations and not just allude to term frequencies,

Even for the simpler case of plain lists a bar chart can be clearer, as noted in that article.

However we can do better than plain bar charts when it comes to lists of significant terms found in query results.
They are significant because they have seen an uptick in popularity for the selected query (e.g. trending in today's news).
When it comes to conveying the popularity there are different scales at play between terms. Evergreen topics like "Meghan Markle" are often in the news but on the day of the Oprah Winfrey interview there's an uplift. A very minor celebrity would have significantly fewer mentions but would trend on the occasion of their death or caught saying something racist. Perhaps the important measure is the significance score e.g. the percentage of their mentions that occur in the search results.
This can be shown in one scaleable bar chart - the green bar represents the number of matches in the search results and the grey bar the number of matches outside of the search results (the background popularity):

Large GIF (912x564)

Everything is drawn to scale and the zoom bar can be used to reveal details of minor celebrities and the percentage of their mentions that occur in the search results.
This style of interface shows all the possible stats of interest:

  1. Meyers Leonard is more popular than Oprah in today's news (one green bar bigger than the other)
  2. Oprah is more popular than Meyers normally (one grey bar bigger than other grey)
  3. 80% of all Meyer's mentions in the news happened today (lots of green vs grey visible when zoomed in)
  4. Expanding your query with Meghan (as an OR) will drastically increase the number of matching results (lots of grey)

Currently word clouds use one stat to size words and any comparisons are hard because long words use more space than short words.

@timroes timroes added Feature:Lens and removed Feature:Tagcloud Tag cloud visualization feature labels Apr 19, 2021
@stratoula stratoula added the impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. label Feb 15, 2023
@stratoula stratoula added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Nov 29, 2023
@stratoula
Copy link
Contributor

Thank you for contributing to this issue, however, we are closing this issue due to inactivity as part of a backlog grooming effort. If you believe this feature/bug should still be considered, please reopen with a comment.

@stratoula stratoula closed this as not planned Won't fix, can't repro, duplicate, stale Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:Lens impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

5 participants