Misleading axis

Let's discover how bad actory are using their tooling to manipulate or mislead us. Disclaimer: It's not only on them.

A sense of objectivity

Authors of a study (the perceptual power of data visualisations) found that a data visualisation increases the sense of objectivity because the presented numbers carry evidence. Also people are more likely to change their mind on a subject, if they see a data visualisation and when they do not have a strong opinion on the subject. There is a problem, because researchers cannot tell if it's visual representation or the numbers themselves are the reason for the viewers impressions.

Since there is this sense of objectivity, people are prune to believe data visualisation are more accurate and more evidence based compared to written text or spoken words. A scientific proof for what I believe most of us expected.

Two attack vectors per coin

But since there is this sense of objectivity, people are prune to believe data visualisation are more accurate and more evidence based compared to written text or spoken words. This presumed evidence is also the reason why some bad actors can apply tricks like:

  1. display wrong data
  2. inappropriate amount of data
  3. bad design
  4. textual context bias
  5. statistical confusion

But readers also play a role in this game, because they have their own attack vectors like:

  1. Not being used to read charts (graphicacy/numeracy)
  2. Reading with their own confirmation bias
  3. Consumption patterns like (attention spans)

It's both ways imho we need an informed society, but also need to acknowledge that there is a level of effort you have to invest to get to the ground of what is shown to you. For most people it's impossible to look up the data in their daily lives. People are flooded in social media so they can either live with it and take what they are served (in times of AI the content we get is becoming average at best) or reduce the media consumption to a level where they can actually parse the information and think about it.

We should avoid the impression that every single one of us has to have at least an oppinion on every topic. Imho this is one cause why we don't actually look things up, we'r missing the time and skills to do so.

Bad design

Designing charts in a bad way may have different reasons. Maybe designers don't know better. Proofing bad intention could be a very challenging task. We will have an example for just how hard it is to proof. Here is a check list what not to do:

  1. Classic axis trickery
  2. Choosing an inappropriate symbol
  3. Misleading visual guidance
  4. Introduce confusion through statistics (there are book shelves only regarding this topic)
  5. Droping information
  6. Inadequite conclusions

A bad visualisation design is usually one where you'r getting a false impression through the use of data. There are classics most people have experienced themselves like the cherrypicker, base stealer or the time gap ticks to mislead people. There is an extensive list with fun names for the bad chart design patterns collected by flowingdata.com.

Axis manipulation

There are many ways you can adjust the axis to give a false impression. Let me pull up an example from the German statistic on politically motivated crime. The federal criminal police department (BKA) publishes a yearly report on this statistics. Let me try to mislead you by comparing right and left motivated crimes:

Misleading representation of polically motivated criminal cases in Germany

The chart above is one that could cause trouble when you lazily gaze over it or if the actual case labels are even dropped. Sure this is an extreme case and we all know if no axis the chart says basically nothing. But for inexperienced or naive people this might become dangerous. The example above was actually a little challenge to me. I wanted to build it quickly using Observable Plot and there is no way I found to make independent facets of the y axis. It would always like to link the right and left axis, so you cannot make them separate. Now I know why.

This is the original chart published by the federal department, with one exception I excluded the total cases, because I think it's a bad habbit to publish the data and sums in the same plot. Summing all up will add another line that will skew the y scale and makes the lines less comparable because all lines but the total line will be moved together.

When to start y at 0?

This was something that confused me a while. I felt somewhat confident that I'm not doing any harm, but I wasn't exactly sure when to start at 0 and when not to. For bar charts yes sure start at 0 no problem there. But do I have to do it for lines too?

In short: No start where reasonable, but make sure it really makes sense.

The long answer, at least as far as Alberto Carlo (who wrote How charts lie) takes the topic, it's a bit more complex then just one or the other side. He's doing an example that is compareble with the following.

The above example is a bit extreme, but it illustrates the whole problem with all these it should always be this way-rules, but mostly it's situational. Sometimes what's good practise can turn out to be bad behaviour. If you'r not quite sure, take the middle way and give the axis a bit more value room.

No axis is right if we see nothing

Beyond these simple axis topics, we can also have an example like the following, where data diverges strongly. Let's put up an example before we make the point. Here is a life expectancy vs GDP per capita chart.

Sometimes we need to change axis such that we can show certain things, but we need to take care what it suggests. With the logarithm toggle above the chart turned on we'll be able to see better how the data is behaving for Africa, select it in the region, you'll see what I mean when you toggle the logarithm on and off. On the other hand when we put up this chart it visually suggests a linear trend (turn on the trend line). But that's not true. Even though the logarithm is useful to see countries like Uganda, Zimbabwe, Niger or Togo it can suggest something which is not actually true.

What I want to say is that the visual appreance can trick people with less experience on statistics or math in general, which makes the chart designers responsible to choose the right tool at the right time, depending on audiance.

"Like this we can not show what we want to show" might be one of the most frequent reasons why designers perform data digging voodoo. In reality data often looks odd or is nearly uninspectable because of very few outliers.

For example in the GDP vs life expectancy chart above I excluded entries that have a life expectancy lower than 50, because of reasons the Central African Republic has a life expectancy of 18 in 2022. In 2023 it's back to 57. In 2022 this chart would be so heavily scewed, we would not be able to read it.

Color gaze

We can get in trouble with descrete color schemes as well as with other scales. But this one is particular easy to cheat with. Colors are very short gaze-friendly. What I want to say is people tend to quickly scan it and move on, especially with filled shapes like the map we now look at. Following we explore the possibilities to cheat with colors and give a false impression by using these to some advantage.

Quick note: These color scales are easy to get wrong I believe. Starting at 0 will make some data uncomparable, always fitting 6 colours to a scale of 7 or 8 percent risk poverty over state populations, also won't make much sense.

The following chart takes a look at the micro-census (a yearly survey to update the population statistics) in Germany. I calculated a number that expresses the percentage of state population that is in risk of poverty and mapped it onto this little state map. Again see how the impression may change when you select different color scale presets.

See for yourself how the chart changes if you set the custom setting and move the scale start. The colors will be recalculated. When turning the scale start to 0 all the chart is in dark colours. Likewise if we turn it the other way the chart looks like everything is good and a low risk of poverty is expected. Since the top part is fixed, some states will still look darker red. If we'd also increase the scale end, this will also change. This is just like the population chart in When to start y at 0.

Personally I'd go with something like 25 to 30, even though it's a hard call to see a 33% of unemployed people are in risk of poverty in the state of Schleswig-Holstein labeled light red. For the sake of comparison I'd choose this.

Conclusion

If we go back to the list in Bad design, we only touched on the first and to be honest we'r not even close to having completed all common axis tricks that might be used to suggest false conclusions. As soon as I get to it I'll add more examples to new pages to get more perspective.