Beyond word clouds #95912

markharwood · 2021-03-31T11:00:27Z

Word cloud visualizations are nice eye candy but for practical use have a number of issues. They have been called "the pie chart of text data" and I find it hard to disagree.

Making sense of popular/significant terms can be greatly improved if we also provide a degree of clustering in the visualization.
As a real-world and topical example, here are the significant words generated from today's news headlines and rendered as a typical word cloud (using this):

The user is left wondering if Joe Biden's dog has anything to do with the Suez Canal and if Deliveroo drivers have been involved in a biting incident. If we use the adjacency matrix aggregation we can cluster these same terms by their co-occurrence and use a Graph visualization to give a much more useful summary of today's news:

We can clearly see that it was Biden's dog in the biting incident and that it was the ever given megaship stuck in the Suez canal. In my prototype these relationship lines that connect terms can also be clicked and a highlighter can be used to show where the connected terms were used in the original text:

This style of interaction helps users quickly remove the mystery by providing the missing context.
Even if we don't adopt a graph visualization, the clusters produced by the adjacency matrix aggregation can be of use in colouring words based on the clusters they sit in.

It's also worth mentioning again that text fields are currently not supported in word cloud visualizations and the significant_text aggregation was specifically designed for producing these sorts of word discoveries from text fields, with special support for eliminating junk words from noisy text.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-04-05T05:38:06Z

Pinging @elastic/kibana-app (Team:KibanaApp)

monfera · 2021-04-06T23:04:59Z

Agree, esp. if there's a need to show relations and not just allude to term frequencies, and there's enough space for the links and proximity layout to work. Word cloud is indeed a bit like the pie chart, can even be circular :-) elastic/elastic-charts#1038

markharwood · 2021-04-07T09:29:38Z

Agree, esp. if there's a need to show relations and not just allude to term frequencies,

Even for the simpler case of plain lists a bar chart can be clearer, as noted in that article.

However we can do better than plain bar charts when it comes to lists of significant terms found in query results.
They are significant because they have seen an uptick in popularity for the selected query (e.g. trending in today's news).
When it comes to conveying the popularity there are different scales at play between terms. Evergreen topics like "Meghan Markle" are often in the news but on the day of the Oprah Winfrey interview there's an uplift. A very minor celebrity would have significantly fewer mentions but would trend on the occasion of their death or caught saying something racist. Perhaps the important measure is the significance score e.g. the percentage of their mentions that occur in the search results.
This can be shown in one scaleable bar chart - the green bar represents the number of matches in the search results and the grey bar the number of matches outside of the search results (the background popularity):

Everything is drawn to scale and the zoom bar can be used to reveal details of minor celebrities and the percentage of their mentions that occur in the search results.
This style of interface shows all the possible stats of interest:

Meyers Leonard is more popular than Oprah in today's news (one green bar bigger than the other)
Oprah is more popular than Meyers normally (one grey bar bigger than other grey)
80% of all Meyer's mentions in the news happened today (lots of green vs grey visible when zoomed in)
Expanding your query with Meghan (as an OR) will drastically increase the number of matching results (lots of grey)

Currently word clouds use one stat to size words and any comparisons are hard because long words use more space than short words.

stratoula · 2024-01-30T08:15:30Z

Thank you for contributing to this issue, however, we are closing this issue due to inactivity as part of a backlog grooming effort. If you believe this feature/bug should still be considered, please reopen with a comment.

markharwood added Feature:Tagcloud Tag cloud visualization feature enhancement New value added to drive a business result labels Mar 31, 2021

botelastic bot added the needs-team Issues missing a team label label Mar 31, 2021

stratoula added the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Apr 5, 2021

botelastic bot removed the needs-team Issues missing a team label label Apr 5, 2021

monfera mentioned this issue Apr 6, 2021

[Visualize] Shaped Word Clouds #42262

Closed

timroes added Feature:Lens and removed Feature:Tagcloud Tag cloud visualization feature labels Apr 19, 2021

stratoula added the impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. label Feb 15, 2023

stratoula added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. and removed impact:needs-assessment Product and/or Engineering needs to evaluate the impact of the change. labels Nov 29, 2023

stratoula closed this as not planned Won't fix, can't repro, duplicate, stale Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Beyond word clouds #95912

Beyond word clouds #95912

markharwood commented Mar 31, 2021 •

edited

Loading

elasticmachine commented Apr 5, 2021

monfera commented Apr 6, 2021

markharwood commented Apr 7, 2021 •

edited

Loading

stratoula commented Jan 30, 2024

Beyond word clouds #95912

Beyond word clouds #95912

Comments

markharwood commented Mar 31, 2021 • edited Loading

elasticmachine commented Apr 5, 2021

monfera commented Apr 6, 2021

markharwood commented Apr 7, 2021 • edited Loading

stratoula commented Jan 30, 2024

markharwood commented Mar 31, 2021 •

edited

Loading

markharwood commented Apr 7, 2021 •

edited

Loading