Colour in Data Visualization – Part 1: Theory
How to use colour effectively in data visualisations seems to be one of those topics that everyone knows something about, but few seem to know enough. Although I’m probably in the same boat, over the last few years my work has required me to learn a lot more about it. I thought it would be helpful to try to summarize what I’ve found out, both for myself and others.
There are a ton of articles and tools on the web discussing all aspects of colour. The rabbit hole goes deep, all the way to the philosophy of colour and questioning whether it even exists. To make this useful and concise, I won’t go that far and as much as possible I’ll try to link to others’ writings, rather than trying to describe everything in excruciating detail here. I’ll also try to highlight the important takeaway points whenever possible, but follow the links for lots of useful info. All that said, colour is a deep and broad topic, so there’s lots of ground to cover and even the highlights will keep us busy for a while.
I’ve split this topic into two posts. Thist first one will cover the basic colour theory needed to approach data visualization applications. The second upcoming post will look at how we apply this knowledge to choose colours in our visualizations.
Update: I found an article that does a wonderful job of explaining a lot of these details: A Beginner’s Guide to Colorimetry
Look up colour spaces on Wikipedia and you immediately get into subtle distinctions between colour spaces and colour models, absolute and non-absolute, gamma, gamut… Yes, the rabbit hole goes deep… Regardless, what we need to know here is that there are many ways of describing colours. The most well known is RGB (typically sRGB), which uses three numbers to describe the intesity of the red, green and blue light components of a colour. This model is convenient for digital displays, since it maps directly to how the displays operate. Unfortunately it also has some drawbacks. It typically has a limited gamut (it can’t represent many of the colours we are able to see) and isn’t very intuitive for us as a way to describe colours (quick, what colour is [23, 57, 8]?). RGB can basically be thought of as a cube with each axis one of the component colours. Every colour we can represent is located somewhere in that cube.
Somewhat more intuitive is HSV/HSL (they are similar but different), which replace rgb with hue (what colour is it?), saturation (how rich is it?) and lightness/value (how much white or black is mixed in—this isn’t quite accurate, so look at the 3D models to see how this works). These colour spaces map all the available colours to cylinders where the angle is hue, the radius is saturation and the value or lightness is the height. Sometimes you might see these spaces represented as cones or double cones coming to a point at black (and white for HSL), but really they’re cylinders, since the space defines black and white for any saturation or hue when the lightness or value is 0 or 1.
HSV/HSL improves on how understandable colour triples are compared to RGB, but it still gives us surprises when we try to use it. The problem lies in how we perceive colours, compared to what the numbers describe. Intuitively, we expect that if we increase the saturation by 0.2 (assuming a 0..1 scale) we expect the same relative change regardless of the hue and regardless of the lightness/value. Unfortunately, this isn’t the case. Our perception of colour isn’t linear. We respond much more strongly to greens than blues, so adding 0.1 of the green channel to a colour will make it much brighter than adding 0.1 of the blue channel.
To compensate for our perception, ‘perceptually linear’ colour spaces have been devised. There are a number of these, with names that usually begin ‘CIE’. There’s CIELab, CIELUV, CIELCh (also called HCL). From what I’ve been able to surmise, the ones you want to know about are CIELab and CIELUV which both have cylindrical representations called CIELCH or HCL. If I understand this properly CIELab and CIELUV are almost identical (the Wikipedia page states they couldn’t agree on just one, but this online course on colourimetry suggests LUV for displays and Lab for surfaces and dyes). All of these create a space with a greyscale luminance axis that transitions from black to white. Colours then radiate out from this axis with the hue determined by the angle and the saturation (chroma) by the radial distance.
It seems that even CIELab isn’t perfect, since now there is also CIEDE2000, which is basically a bunch of corrections applied to it in an attempt to more accurately represent perecptual colour differences. An interesting sidenote about LAB space is that it is capable of representing imaginary colours that lie totally outside the gamut of human vision. Colours that can never be seen!
Phew, so that’s a lot of technical jargon. What are the takeaways? Basically these perceptually linear spaces give us a means of understanding how we perceive colours. By examining the 3D colour spaces we can see what colours exist and how they relate. Since the spaces are perceptually linear, we can expect that the distance between colours will correspond to their perceived difference.
To help with this I created a little web app that shows the RGB colour gamut in the HCL colour space. You can drag to rotate the 3D surface around its axis and use the buttons and sliders examine different slices through the space.
Looking at the overall shape we can notice that it comes to points at white and black and is widest in the middle. This matches our intuition that the closer we are to white or black, the less differentiable the hues are. We can also look at slices of the space for a given hue and see what luminance-chroma values are possible. Similarly we can take a slice at a given luminance and see what hue-chroma combinations are possible. Understanding what colours are available and where they lie in the space becomes a huge help when we want to create palettes to encode data values in our visualizations.
There are other things to keep in mind when working with colour. Colours have connotations:
Hot/Cold – red/blue
Stop/Slow/Go – red/yellow/green
Good/Bad – green/red
Conservative/Liberal/etc. – Blue/Red (in Canada)
Healthy – green
How these colours are interpreted depends on the culture of the viewer and the context the colours are shown in. For example, in the U.S., those political party colours are reversed.
Language also affects how we perceive colours. We are faster at telling colours apart if they have different names. This post discusses some examples of how these effects can show up.
Colour discrimination can be tricky. In general, the smaller the items are the harder it will be to distinguish colours. Be careful what you ask of your viewers! Then there are the tricks our eyes play on us. Background colours affect how we perceive foreground colours. We can interpret the same colour as completely different based on context as seen in this illusion by Edward Adelson.
The takeaway here is to keep the context the same for all the visual comparisons you want your viewers to make. For example, show your chart legend on the same bacground as the graphics it refers to.
There’s also the issue of how the colours will actually be represented on the display devices your users have. I’ve seen beautiful graphics with subtle contrasts rendered invisible when projected in an office conference room. Colours can change quite a lot even moving between different monitors. In general we can’t control the environment our graphics will be viewed in, so as much as possible we want to build in a margin of safety so that even if it isn’t ideal, viewers will at least understand what we’re trying to show.
Finally, none of that matters if your viewers are colourblind and can’t even distinguish the colours you’ve used. In that case you’ll want to rely on brightness and colour blindness tools that show how your graphics will look to those with these vision issues.
Phew! I think that’s most of the basics. This XKCD comic sums up perspective on all this theory. The next step is figuring out how to actually use colours in the visualizations we want to make, but I’ll save that for another post.
Colour in Data Visualization – Part 2: Applications | Brian Cort
[…] post is the conclusion of my two-part look into the use of colour in data visualization. In the first post we covered the prerequisite theory. In this post we will use that theory to effectively select […]