Learn with us online while the Exploratorium is temporarily closed. You can help us reopen—donate today.
View transcript-- All right, thanks, good afternoon, and I appreciate you coming back from the incredible exhibits to hear, those of you who did, those of you who didn't, I will see you later on the video. So today I'm gonna talk super briefly about this idea of encoding and decoding visualizations. Jennifer asked me to actually sort of directly address this topic of how to people both read and create visualizations. So, let's say you're looking at visualizations. Here's a screen full of 20 of them. Many of which I was involved in making, some of which some of my co-authors were involved in making. So here's 20 different screens, how can we systematically think about what in heck is going on here? If we just look at pixels alone, our first response is, "um, well, they're all different," and yes they are, but how can we get to the next level of analysis past that? How can we think about what is in common between them, even thought the pixels are different? What were some of the design decisions made? How can we actually reason about the process of constructing visualizations? And so what we want is some sort of systematic way to actually think about them. And so let's go, I'm gonna steal a phrase from Ron Rensink, and talk about some of the fruit flies of visualization; namely, bar charts and scatter plots, and say okay so what is actually going on here? How could be analyze these in a somewhat systematic way? And so, it's really useful to have a vocabulary of thinking about these in terms of marks and channels, where a mark is some flavor of geometric primitive, it could be a point in zero dimensions, or a line in one dimension, or areas in two dimensions, or even volumes in three dimensions, so think of these as a geometric primitive that can stand for some item of data, and then the way we're communicating information is to have some idea of controlling its visual appearance to then communicate something. We could do that with spacial position, horizontal or vertical or both put together, it could be color, it could be shape, it could be the amount of tilt, it could be size coding by length or area or volume, and so the question is well, how would we make decisions? How do we actually do this? How do we use this to think? And so, for one thing, going back to these fruit flies, we can think all right, what is happening? We have heard of the word bar chart and scatter plot, what's happening? Well here we are vertically coding with spatial position and our mark is a line. If we go to two dimensions, where we've actually got vertical and horizontal separately, we've switched to a point mark. If we say what else could we-- oh, ah we could use color to color code things, we could size code things with area, we still have these point marks. So this gives us a way to deconstruct and analyze, of course I picked simple examples, we could get much much more complex, but a natural question to ask then is well, how do you choose? How do you pick which of these to do? And it turns out that a lot of people have spent a long time doing a whole lot of experimental work to try to actually get us closer to answers to that question. And so we wanna take into account some of things like well, what are the characteristics of your data? At a really basic level, one of the things that matters a lot is what kind of, I'm gonna call these attributes, some people call them variables, or fields, or records, any number of things. So, is this field of data, is this attribute, is it categorical? Is it this thing versus that thing? Or is it somehow got an implicit ordering? Particularly is it quantitative? Things like the size of these floorboards, can you actually take a length and another length and subtract them and do full-on arithmetic with them? And so, the characteristics of the data actually matter quite a bit. It could even matter things like are you going from min to max sequentially? Or is there semantically a zero point and you go down from that and up from that? Or does it actually spin around cyclically so it comes back to where it came from? Now, data is super important but it's not the only thing that's important. What about things like the task? What is a person trying to understand from that data? And so, the task matters just as much as the data matters. The human perceputal system has a lot of characteristics, some of them are convenient, some of them are deeply inconvenient and they're not what you would like, so the human eye is, in fact, not a camera. The human brain is not a hard disk. And we have to deal with some of those realities. So, there has been a lot of work in trying to characterize, for a lot of these different channels, well, how could we think about using them to visually encode data? One of the most crucial things gets back to these characteristics of data that I just talked about, which is some of these channels really are intrinsically perceived as things that communicate magnitude. Think of these as the how much ones. So you can say things like how much bigger is this than that? How much curvier is this than that? And so there's a lot of quantitative information that is well-carried by these magnitude how much channels. And conversely, there's some channels that really come down to identity. What is this? Is it in this region or that region? Is it this color or that color? What is the motion? What is the shape? And so those are a good match for the categorical data. So this idea of getting your expressivity right where you match it so that you don't either communicate what's not there, or throw information on the floor and fail to encode it, is one of the really fundamental principles that's good to keep in mind. It's remarkable, sadly, how often people don't keep this in mind, so it's one of the pitfalls that in some sense one of the easier ones to think about. Another one is they're not all created equal. Some of them are more able to be perceived accurately than others. Now, it's true that accuracy of perception's not the only factor. There's also things like whether or not you're actually engaging people, but it is a good place to start to think about that. The order I picked for this list of channels was not arbitrary, I actually deliberately put these in order based on my own interpretation of the current literature about how able we are to accurately perceive these. This is still a subject of active study, but we at least do know some things based on the last, actually not just decades, but even over a hundred years of work on the psychophysics of how actually human perception works. And another really important one is distinguishability. Because they're also not all created equal in terms of how many bins are perceivable. So it matters that you get your impedance match if you need to communicate 37 levels in your data, what if your perceputal channel really only tell apart four of them? Well, then, you don't have such a great matching. As opposed to sometimes if you wanna show 10 things and this channel has got about at least 10 bins, then you're in really great shape. So you do wanna consider all of these different factors that are involved in trying to decide well, which of these channels should I pick? Now, if we have lots of time, we would talk at length about a lot of different design choices for visualization. About how do you spatially position the data? How do you use all these other variables? How do you actually interact with it? But we are not going to do that. I'm instead going to pick one topic, which is a fun one, which is color, and talk about it just a little bit more in the context of this idea of marks and channels. So what's going on with color? Well let's see. On the top, we've got something where we sort of clearly have four different color codings. They happen to be years. But then on the bottom, we're actually looking at something that somehow seems to communicate order a bit more and it's really emphasizing well these years had a sequence. And just to really hit you over the head with it, the picture on the right is one where we got a choropleth map, where we're actually taking geographic regions and color coding them, and then you can see that there's a bar chart, and we are redundantly coding with this sort of shades of green as well as with horizontal spatial position. So, what's going on? I stole this slide from Maureen Stone, this is part of a great slide deck from her that is talking about what's happening with color. So I just told you that wait, it really matters if it's ordered or categorical and here I'm showing you something where it can't seem to make up its mind. So what is going on here? So with decomposing color, the first rule of color is don't use the word color. Don't talk about color. It's confusing if you treat it as a monolithic thing. So what can you talk about? Well I'm not gonna make you get in a fight, we are gonna decompose color into three channels. And two of them are ordered channels that show magnitude, and that is luminance, think of it as sort of a grayscale thing, about gray in between white and black, how dark is something? And there's also another one of these ordered channels, saturation, which is sort of light pink is in between gray and hot pink. And the thing you probably think about as color, colloquially, like what color shirt are you wearing? Right, you're wearing a blue shirt, and you're wearing a red shirt, these are hue. So hue is a very categorical kind of thing, where you tend to actually do a great job of perceiving that things are different from each other without necessarily having an implicit ordering. So, that's great, I've given you a way to think, and now my caveat is this is a good guide to thinking, but it is not well supported by current tools. A lot of tools will show you hue and saturation, they don't typically show you true luminance in terms of how bright something is, the human eye responds to different wavelengths of light differently, so this is not something that is super well supported by common tools. But it is definitely a guide to thinking at a visualization level. So, going back and thinking about these now as channels with properties, some of those questions I asked, like both what do they convey in terms of ordering versus not ordering and how much can you encode? How many discriminable bins can we use? Spoiler, never as many as you want. So, let's understand a little bit about why. So what's going on with categorical color? So if you have the idea that the human visual system is perfect, that's not quite right. The human visual system is not so well designed to do absolute things, but it is utterly glorious at doing relative comparisons, and to keep us alive in the world it has served us well for a very long time. So, if two things that are color coded are right next to each other, we are really great at super fine grain discrimination. So let's take this example of somebody color coding the genes in a mouse. We can tell the difference between really subtle shades of green that are right next to each other. So, subtle, but we can definitely tell what's going on, even on any projector, including this one. So, what if, though, you have small regions that are not right next to each other? So we could sort of start counting, well, we got that tan stuff, and then okay well, maybe brown, I think there's some green, there's some red, oh, there's another green, wait, was that the same as the first green? Uh, well, in any case there's a purple, okay, here's a cyan, looks like maybe there's a royal blue, and I'm starting to not be able to tell which of these colors, because they are small regions that are not contiguous to each other. In fact, this is showing how the various genes on the mouse have migrated in the human chromosome. This is something biologists like to do a lot of. And they really wish they could color code 22 things. Unfortunately, we're not very good at that. We are surprisingly bad at that. And so if you've got these non-contiguous small regions of color, it's always fewer bins than you would really want, everyone is super tempted to color code more than they can really get away with. A good rule of thumb is it's pretty much safe to do six to 12 in many cases, but remember that includes your background color, it includes your default color, maybe a highlight color. So, well what else could we do? If we had more time we'd talk about the plethora of other alternatives, color coding is not your only choice. There's a lot of other ways to visually encode data. But what I wanna do is move on and just talk for another moment about ordered color. The other big category. People are very tempted to use rainbows, in part because, I think, of the physics. We have many demos on the show floor there that show that light through a prism makes a rainbow, and it's physics and it must be right. But physics and perception are not, in fact, the same, and there's this trickiness about rainbows, which is for one thing as I mentioned, they are not intrinsically perceputally ordered. If I locked Jennifer in a room and I gave her four paint chips of green and red and purple and orange and I said "put them in the same order," and then I locked Steve in another room and I gave him the same paint chips and I took their phones away, would they come in the same order? I wouldn't bet a million dollars, unless they colluded in advance. But if I gave them these two shades of gray and a white and a black, I would bet a lot of money on that. And so there's this issue of the ordering, and there's also a really unfortunate nonlinearity. If I grab a little box around sort of orange to yellow to green, I can maybe see three different colors. Same size box in green, I can see green and green and green. So, the way we respond to these different parts of the spectrum is not actually linear. Now, sometimes rainbows are interesting. You can actually name things. I can talk about the red regions and the yellow regions and the blue regions in this particular simulation of fluid flow, but there's some things that are not so great. So, one thing you could do is say well maybe you don't wanna try rainbow if you wanna emphasize large-scale structure, there's actually some subtlety you get in here that you're not getting so well with these much more small regions of color. So fewer hues, fewer nameable regions. This can be particularly tricky, if you haven't seen this one before it might not be super obvious what I'm showing, besides a big blobby thing. A really well chosen color map, though, might actually show you thinks like Florida, if that's a good thing to see or not, and so, and by carefully segmenting and actually having a really visible demarcation, you can see a lot of the structure, and carefully chosen going down into the sea, going up into the brighter ones is giving you something. So what if, of course, you wanna have your cake and eat it too and you wanna have it both ways? A nice way to have both nameability and order is actually to think about the nameability from the hue, and if you want an order you can think about monotonically increasing luminance from one to the other. There's actually some nice tools out there now. The Viridis and Magma color maps, those are the two ones on the very bottom, do quite a nice job of trying to maximize nameability and have this increasing luminance order, especially as compared to you can see on the right some of these others that definitely are not doing monotonically increasing luminance. So, lots more reading, if you're interested in that. These slides are actually posted online if you wanna have more resources to read. Some of this was, in fact, from a chapter in my book, which is another reference you might be interested in. I actually didn't talk at all about the research my own group does. There are many many many many papers and videos and open source software and talks and, in fact, full courses available online if you're interested. And those talk slides are down there. So, thanks very much for your post-lunch attention.
University of British Columbia computer scientist Tamara Munzner describes how research into the individual elements of data visualizations can help guide design choices for creating more intuitive visualizations and avoiding unnecessary confusion.
This talk was part of the Visualization for Informal Science Education conference held at the Exploratorium, which explored themes of interpretation, narration, broadening participation, applying research to practice, collaboration, and the affordances of technology.
VISUALISE was made possible thanks to generous support from the Gordon and Betty Moore Foundation and the National Science Foundation under Grant No. 1811163. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Pier 15
(Embarcadero at Green Street)
San Francisco, CA 94111
(415) 528-4444
The Exploratorium is a 501(c)(3) nonprofit organization. Our tax ID #: 94-1696494© 2021 Exploratorium | Terms of Service | Privacy Policy | Your California Privacy Rights |