Awesome Plotly with code series (Part 8): How to balance dominant bar chart categories
Discover the #1 strategy to handle skyscraper bars in your charts
Welcome to the eighth post in my “Plotly with code” series! If you missed the first one, you can check it out in the link below, or browse through my “one post to rule them all” to follow along with the entire series or other topics I have previously written about.
A short summary on why am I writing this series
My go-to tool for creating visualisations is Plotly. It’s incredibly intuitive, from layering traces to adding interactivity. However, whilst Plotly excels at functionality, it doesn’t come with a “data journalism” template that offers polished charts right out of the box.
That’s where this series comes in — I’ll be sharing how to transform Plotly’s charts into sleek, professional-grade charts that meet data journalism standards.
Intro - No chart is ready by default to deal with a dominating category.
When someone thinks about bar charts, they often picture clear and easily distinguishable bars that effectively present data.. But what happens if there is a bar that is so big that is dwarfs all the other bars? After reading my earlier Awesome Plotly with code series (Part 3): Highlighting bars in the long tails post, you might say: “Jose, you already showed how to deal with small and difficult to detect bar charts, just highlight them”. That is true enough, except that my previous post was highlighting 2 bars, not all-except-one bar!
In this post I’ll show you how to present a chart where both the tall “skyscraper” bar and the smaller “one-story houses”, are represented equally.
What will we cover in this blog?
Scenario 1: The skyscraper - Plotting a default chart chart
Scenario 2: The broken bar - Who invented this aberration?
Scenario 3: The separation - Providing space for both stories
PS: As always, code and links to my GitHub repository will be provided along the way. Let’s get started!
Scenario 1: Skyscraper bar charts and default plotting issues
Imagine presenting a study about the languages spoken in England, and trying to create a bar chart that effectively showcases these insights. The idea is to focus on the non-English languages, as it is obvious that English is the dominating language. However, you still want to provide a view of how much more English is dominating over the rest. Here is the data you might be working with:
You can see that English is spoken by 52 million people… and the next closest language is Polish with 600k! That is a 86x+ difference! If you were to plot this information through a bar chart, without really considering what would happen to the languages which are not English, then you would find yourself staring at the plot below.
Where do I think this plot has issues?
It is clear that the smaller bars are so tiny that it is virtually impossible to tell the difference in heights between them.
Even if you added the data labels, you cannot visually see that Panjabi, Urdu, Portuguese and Spanish are close, with Polish being 3x these languages.
In fact, because you want to show the data labels with “k” to represent thousands, what happens with the 53 million English speakers? Do you also plot the data label with “k”?
Scenario 2: The broken bar - Who invented this aberration?
Keep reading with a 7-day free trial
Subscribe to Senior Data Science Lead to keep reading this post and get 7 days of free access to the full post archives.