Colour in Data Visualization – Part 2: Applications
This post is the conclusion of my two-part look into the use of colour in data visualization. In the first post we covered the prerequisite theory. In this post we will use that theory to effectively select colours in a range of data visualization situations.
Attention Management and Visual Hierarchy
So now we know that colours are complicated. Colour spaces, cultural connotations, colour blindness, perceptual models… After learning all that, the first rule of colour in data visualization should probably be “don’t use colour in data visualization!” Seriously, if your graphics work in greyscale, you’re already ahead. But even then, greys are ‘colours’, and your clients are unlikely to be happy with completely grey visuals, so let’s dig in.
The first thing to do is determine the importance hierarchy of the what you want to show. Not all information is equally important. Some is crucial; some is merely ancillary. So you want to direct users’ attention appropriately. With colour, we do this with luminance contrast.
Luminance contrast is difference in brightness (luminance) between an area and its background. The eye will naturally be drawn to areas with high contrast, so your most important information should have the highest contrast. An additional benefit of structuring your colour choices this way is that your graphics will continue to mostly work when rendered in greyscale. The Color Usage page does a great job of explaining luminance contrast with lots of helpful images. (The rest of the pages on that site are also great and definitely worth reading!). Also keep in mind that the smaller an item is, the more contrast it needs to be legible. Tiny grey text on a white background will disappear while black text the same size might still be noticeable.
A nice example I found recently showing the use of luminance contrast to direct attention is the trade data (times and prices) on the cryptowat.ch bitcoin market website. Looking at the order book, notice how different luminances are used for price digits before and after the decimal point — only when those digits have changed from the next most inside price is it bright to draw your attention there. Similarly, the order times use brightness to highlight when the seconds, minutes and hours change from the previous order. In this way the eye is naturally drawn to more important information showing differences and ignores the repeated digits that don’t add value.
Thinking about your information hierarchy in terms of luminance contrast has some interesting effects on how we choose colours. Most visualizations have a hierarchy similar to the following:
- Annotations, selections and highlights
- Data representations (bars, lines, etc.)
- Axes & Guides
Once we allocate some contrast to each of those layers, we quickly realize that each layer might only get a small range of luminance values to play with. Let’s look at each of these layers in order.
For starters we need to choose a background colour. Leaving aside hue, our basic choices are white, black or grey. White and black maximize the available contrast ratios to work with, while grey cuts the range into two separate ratio ranges. Black on white (or white on black) will always stand out more than any colour on a grey background. That said, you may still want to tone down the contrast a bit by darkening a white background or lightening a black one. Just know that the available contrast ratios will be shrinking. Middle grey is a problematic choice, but still workable by using both white and pastels, and black and dark hues, as foreground colours.
The background colour also helps set the tone for your work, so emotional connotations factor in. White tends to feel stark, analytical or clinical, while black can imply passion or emotional richness. These are poor cover words for the felt experience of the colours, but at least point in the right direction. Consider how the visualization The Fallen of WWII would feel with a white background or how GapMinder would feel with a black one. Neither would work nearly as well.
One final word about blacks and greys. It can sometimes help to blend in a little colour to your greys to add a little feeling to them. Red for warmth or blue for coolness, etc. Whatever your visuals need to get the emotional tone right.
Axes and guides are the marks that orient us on the page. They also tend to be the ‘chart junk’ that Edward Tufte has so famously railed against. They are necessary, but once we understand what we are looking at, we want them to get out of the way. In terms of luminance contrast, this means that they should be fairly close in value to the background: legible but not distracting.
The data representations (visuals) are where most articles on colour in data vis tend to start, jumping right into topics like gradients, categorical colours, etc. We’ll get to all that, but for now, just realize that in order to work well with the rest of the application/visualization, you may only have a limited range of luminance values to work with.
If your labels can appear on top of the visuals, they will need additional luminance contrast in order to be legible. If the labels only appear alongside the visuals, then their luminance can be similar to the visuals they are labelling.
Finally there is the story your are trying to tell and the specific items you are interested in. This information is often highlighted with additional luminance contrast. So the selection state is brighter/darker or the story items need to stand out from the surrounding context.
The key takeaway from all this is that by the time you’ve divvied up the available luminance values into the various layers, each layer might not have much range to play with. For some layers this is fine: e.g. make all the axes and guides a single grey colour that is legible, but close to the background colour. But remember that not all colours exist at all luminance levels, so sometimes this might start to feel constraining.
With all that out of the way, let’s (finally!) look at how we can select colours for various data visualization situations.
Below I’m going to go through all the considerations that go into creating a colour palette to use in a visualization. It’s worth mentioning that there are a number of good premade palettes out there already. Cynthia Brewer’s ColorBrewer palettes have been the de facto standard for years. Many visualization libraries include a number of premade palettes (likely including some of the ColorBrewer ones). Often these are well-designed and can work well in a variety of situations and in a hurry. Still it’s worth knowing what to look for and the basic strategies for creating colour palettes for when you need to.
Colour palettes split into different types based on the data they are being used to display. The first question to ask is whether the data is ordered. If ordered, then we can convey that order in the sequence of colours. If the data is unordered (called categorical or nominal) then we want all the categories to attract roughly equal attention.
Categorical Colour Palettes
For a small number of categories creating a categorical palette is easy. Three, four, even five colours is no problem and even less than ideal colour choices will usually work. However, once you start needing more colours, things can become tricky, so it’s worth knowing what to look for.
Criteria for a good categorical colour palette are:
1. We want colous that are as visually distinct as possible so that we can easily tell them apart.
2. Colours that do not have any sort of natural ordering to them. We are sorting by type, not quantity and don’t want to be misled.
3. We want colours that all equally attract our attention.
We can measure colour differences by the distance between two colours in a perceptually linear color space. Ideally we want them as spread out (visually distinct) as possible.
Colours attracting equal amounts of attention implies them having the same luminance contrast from the background.
We’re probably also going to want colours with high chroma (saturation) since low saturation colours all fade to grey and become hard to tell apart.
When we take all these constraints together, what we’re left with is a palette of colours evenly spread out in hue with equal luminance and high chroma.
These three colours seem to go well together.
Adding a bright red screws up the palette—Its much too saturated compared to the other colours and so demands too much attention.
The first and second colours are too close in hue to be easily distinguishable and much too different from the third hue.
Even though these colours are well separated in hue, they lack sufficient saturation (chroma) to be easily distinguishable.
Imagining the HCL colour space, these criteria translate into taking a horizontal slice at a given luminance (or small range of luminances) and then choosing colours within this disk. Since we want highly saturated colours, we’ll probably want to choose colours around the edges of the disk. Since we want the hues as different as possible, we want them in equal arc angle increments around the full disk. In the HCL viewer you can see this by choosing the HC mode and then adjusting the luminance slider to see slices of the space at varying brightnesses.
This approach is good for a limited number of categories. At the same chroma and luminance we can easily distinguish 7-9 colours. Five colours works great; we can even push it to nine; after that it becomes difficult to distinguish the hues. At that point we have to vary the luminance more. What often ends up happening is that we can have sets of colours such as pastels, bright rich colours, dark rich colours, with a limited number of colours in each. We sacrifice the equal luminance contrast from the background and risk misleading viewers into believing there are multiple levels to our data hierarchy (instead of a flat set of categories), but gain a larger number of differentiable colours. This is the approach taken in, for example, D3’s category 20 colour palette:
Finally, there’s what to do when you don’t know in advance how many categories there are. This often happens when creating visualization software that can be used with arbitrary datasets. In this case there’s a little trick I’ve devised that seems to work well. Basically we choose hues by rotating around the colour space in increments of the golden angle. This ensures that no two hues will end up exactly the same and every new hue will tend to be spaced far apart from the existing ones. Still, after about 9 or 10 hues, it becomes hard to differentiate the hues regardless of how well spaced out they are, so you also need to vary the chroma and luminance. To do this you can choose a range of reasonable values and advance through it using a sine wave or step functions that create sets of bolds, pastels, etc. However, no matter how ingenious you are, after a certain point, you will just have too many colours. At that point you need to look at other representations for your data (shape, icons, etc.).
Finally, there are a number of tools out there that can help you create categorical colour palettes. One that I like is I Want Hue (but use the force-directed setting, rather than k-means, since the whole point is to have colours as distinct as possible). Another tool is Categorical, but play around with the sliders to make it generate a pleasing palette.
Quantitative Colour Palettes
If your data is quantitative (as opposed to categorical), then the next question to ask is whether it’s sequential or diverging. With diverging data we are interested in the difference between the values and some specified transition point. With sequential data we are only interested in the relative sizes of the values. Examples of diverging data are temperatures (degrees above/below freezing) and financial market performance (positive/negative returns). Examples of sequential data are population density or test marks.
For sequential data we want a list of colours with a naturally perceived order where the perceived difference between the colours reflects the actual differences in the values represented. For example, assuming a linear scale, we want a colour that appears half-way between two other colours to represent a value that it halfway between the values of those other colours.
The best way to achieve this result is to use a colour scale ordered in luminance. The most obvious way to do this is to use a greyscale ramp from black to white. Similarly a simple blend from a pure colour to white or to black would work. In the Colorpicker for data tool, this would be a straight line in the C-L mode. The key point is that the luminance of the colours should increase (roughly) linearly with the represented value.
To get the maximum possible range in luminances we need to go from black to white. Often, though, we want the colour scale to be, well, colourful. We can do this with a single (constant) hue by varying both the chroma and luminance: e.g. from black to dark red to bright red to pink to white. We can also use multiple hues, blending from one colour to another as we also increase the luminance. Typically this means using blues (saturated at low luminance) at the dark end of the scale and/or yellows (saturated at high luminance) towards the top (e.g. this example). Other common colour combinations are green to yellow, orange to yellow or blue to green.
Colours also have emotional connotations, so a palette that goes from blue to yellow will likely have a cold-warm connotation. Similarly green can mean positive, red negative, etc. The most appropriate range of hues is to use might have little to do with the range of data we have to display.
In terms of specific methods for generating palettes there are a few interesting approaches I’ve found. Wijffeljars et al suggest using bezier curves to trace paths through the HCL colour space. Interestingly, they use the ColorBrewer palettes as benchmarks and devise their own luminance function to choose colours along the curve that more closely match them. Their final tweak is to twist the curve around the luminance axis, so it starts more blue and ends more yellow, again, duplicating the ColorBrewer palettes. Overall, this was one of the best thought out approaches I’ve found, but unfortunately their implementation seems to no longer be available. Implementing their bezier math curve math probably isn’t necessary, though, since other approaches have ready implementations.
Cube Helix is an approach that creates a palette of monotonically increasing luminance by carefully spiralling through the RGB colour cube. It has a number of parameters that can be used to generate a range of palettes that vary in colour and intensity range. Best of all, there are numerous implementations of this approach in many languages and libraries. Finallly, Chroma.js has added some nice tools for creating multi-hued colour ramps that let you specify the colours you want to use and then applies corrections to account for colour discontinuities and luminance levels.
Related to the sequential colour ramps are diverging ramps. Here again we have quantitative data (as opposed to categorial), but there is also an important transition point. In many cases this is zero. In financial scenarios we care whether a stock or portfolio has increased or decreased in value. With heat we care if the value is above or below freezing. The amount the portfolio or temperature is above/below this value does matter, but it’s secondary to the direction. In these cases we want to use a diverging colour scale. In a sense we have a quantitative scale superimposed over a categorial scale of degree 2.
To make this work we typically choose two appropriate colours (e.g. red and green for up/down in the financial markets or red/blue for hot/cold temperatures) and then blend them with a desaturated middle point (somewhere on the black-grey-white range). The luminance of the middle point should tie in with the colours on either side.
In terms of the HCL colour space, what we’re doing is selecting two hues and tracing a path through their chroma x luminance planes that meet at a common point. That common point is typically a fully desaturated grey value. The Escaping RGBLand paper gives some nice examples of how this looks in section 4.3. In general for any two values equally far from the midpoint value on both sides we would prefer to have different hues but the same chroma and luminance values. In other words, the two colours should be equally colourful and bright.
Note that we don’t necessarily need to have a completely desaturated midpoint. It’s possible to twist the paths of both lines ot meet at an arbitrary colour such as a light yellow instead of a light grey. This may help to warm up the overall palette, but care must be taken not to introduce any confusion. It is also possible to not have the palettes meet at a common midpoint and instead omit this point and have a discontinuity in the ramps. This will tend to emphasize which side of the midpoint a value lies on at the expense of not showing as clearly how far from the midpoint it is.
More generally we can also combine the palettes we’ve covered so far, to convey additional levels of information. However, once you start getting this fancy you should be very careful that the resulting visuals easily convey what you want them to. Cynthia Brewer has created a wonderful little diagramshowing how some colour schemes can relate to each other. It’s worth checking out the image on her site where each combination links to a page discussing that type.
Categorical X Quantitative
One combination palette is to take a categorical palette and then vary the intensity of the colours to convey a second dimension within the categories. For a small number of hues vary the luminance to convey this secondary value, while adjusting chroma so that the colours stay within gamut. The trick is to avoid letting the ramps get too desaturated so the hues all remain distinguishable.
A nice use for this type of colour scheme is when you want to emphasize transitions along a quantitative scale. Say the underlying values are scalar (e.g. 0..1), but you have well defined ranges such as 0..0.2 is GOOD, 0.2..0.7 is OK and 0.7..1 is BAD. In this case you can use a linearly increasing luminance while changing the hue in steps. Why-HCL has a good example of this at the bottom of the page.
Quantitative x Quantitative
Similarly we can also combine two quantitative ramps that have sufficiently different hues. The Bivariate Chloropleth Maps article does a good job showing how this is done. Basically the trick here is to blend the two ramps in such a way that the combination values are easily interpretable. I’ll leave it up to you to decide how well the examples in that article satisfy this requirement.
As I said, at this point you’re really pushing the limits of what can be easily pereceived and interpreted by your viewers. If they need to continually refer back to the legend to figure out what things mean, you’ve gone too far and should find other representations that work better.
I think that’s all I wanted to say about using colour in data visualizations. We covered some basic theory and then looked at how to apply that theory in a number of scenarios. Hopefully this is sufficient information to guide you (and me!) in whatever situation you face. Please let me know if there are any important points I missed, or other good examples/tools worth mentioning. As I said before, this seemingly simple topic is actually quite deep and the internet is full of interesting work on it.