What Is a Histogram?
Do you know how to make a histogram? When to make one? And what you can learn from it once you do?
Jason Marshall, PhD
Listen
What Is a Histogram?
The word “histogram” seems kind of weird. After all, unlike the sensibly named bar graph that we talked about last time, it’s not at all clear from the word what a histogram should look like. But, as we’ll see today, the idea behind the histogram isn’t nearly as weird as the word. In fact, the diagram makes a ton of sense for organizing certain kinds of information.
But where does the word come from? What does it mean? In truth, we’re not entirely sure. But the best idea I’ve come across is that the word histogram is a combination of the Greek words histos, which describes something standing upright (like the mast of a ship), and gramma meaning drawing or writing.
Why do I mention this? Because, as we’ll soon find out, this means that the word histogram isn’t actually weird at all. In fact, it’s sort of a perfect name for this type of diagram. So, what is a histogram? How do you make one? And what can you learn from it? Let’s find out.
What Is a Histogram?
Histograms are a lot like bar graphs. In fact, if you squint your eyes a bit while looking at side-by-side bar graphs and histograms, you might not be able to tell the difference between them. But although they look similar, they are definitely different—mostly in terms of the type of data they are used to talk about.
Histograms are a lot like bar graphs.
As you’ll recall, bar graphs (aka bar charts) are used to show the relative strengths of different categories of data. For example, if you want to display data from a survey in which people choose between options A, B, C, and D, you can make a bar graph with four vertical bars (one for each category), where the height of each bar represents the relative fraction of responses for each.
But what about data that doesn’t break down neatly into categories? For example, what if we want to somehow create a diagram showing the distribution of widths of the fall leaves my daughter has been collecting? I suppose we could categorize her leaves as “narrow,” “medium,” or “wide,” and then create a three-column bar graph. But that feels like a rather arbitrary thing to do. After all, one person’s narrow might be another person’s wide. And if you ask somebody else, you’ll probably get a whole different categorization.
Which won’t do at all for our purposes … we can do better.
How to Make a Histogram
So instead of arbitrarily creating categories about things that don’t want to be categorized, let’s give the data an opportunity to shine by creating a histogram. To do so, we won’t categorize our leaf widths, we’ll sort them into bins instead. To see how this works, imagine we grab 25 boxes and label them from 1 cm up through 25 cm. Then, we measure the width of each of our leaves, round it to the nearest centimeter, and put it into the box labeled with that width. After we finish, we just count up the leaves in each bin and make our histogram.
As I mentioned earlier, making a histogram is a lot like making a bar graph. Except that instead of drawing vertical bars to represent the relative sizes of categories, we draw vertical bars to represent the relative sizes of data bins. And to indicate that the data in a histogram is continuous and not categorized—in our case, this means that the leaf widths really range continuously between 0 and 25 cm—we usually draw the vertical columns of a histogram without gaps between them. This isn’t strictly necessary, but it is very common.
Imagine that the histogram we make from our leaf measurements tells us there are between 1 and 19 leaves in each of the 25 bins. As you might expect, our histogram also shows that there are not a uniform number of leaves in each bin, which means that some leaf widths are more common then others. In particular, our imaginary histogram shows what appears to be three different peaks around 4, 13, and 21 cm. What does this tell us?
What Can You Learn From a Histogram?
To answer this question, let’s think about why we actually make histograms in the first place. As with bar graphs, the advantage of histograms over simple tables is that histograms allow us to visually see how data is distributed and to quickly identify interesting trends.
Histograms allow us to visually see how data is distributed and to quickly identify interesting trends.
Case in point, the three peaks of our histogram tell us that there are three common leaf widths in our sample. What does that tell us? We can’t say for sure without doing a little more investigating, but it seems likely to me that it’s telling us we have leaves from three different trees in our sample. Which is a pretty cool thing to be able to see!
How Do You Know How to Bin Data?
There’s one other detail to worry about: How do you decide how to bin your data? How big or small should your bins be? In our example, by using 1 cm wide bins we found that our leaf widths peaked around three different values. But what if we decided to use 5 cm wide bins instead? In other words, what if we put all of the leaves from 1-5 cm in one bin, from 6-10 cm in the next, and so on with the fifth bin containing all the 21-25 cm wide leaves?
That certainly is a perfectly reasonable and mathematically legitimate thing to do, but you should know that the histogram you get might not tell exactly the same story. Instead of three peaks, you might now only see a single peak … perhaps in the 11-15 cm range. And you might be led to think that this means there is only a single most common leaf width. In a certain sense, that’s not an incorrect conclusion—the average leaf width for the entire sample does fall in that range. But using a smaller bin size with higher resolution allows you to see more detail in the data. Which could, as in our example, change your interpretation.
So how do you know how big to make your bins? The best answer is: trial-and-error, practice, and perseverance. There is no “right” or “wrong” way to do your binning. How you do it depends entirely on what you’re looking for with your analysis. Although this can at first seem strange and even daunting, rest assured that as you gain experience, you’ll begin to gain a better understanding of this artful side of the world of data analysis.
Wrap Up
Until next time, this is Jason Marshall with The Math Dude’s Quick and Dirty Tips to Make Math Easier. Thanks for reading, math fans!
Histogram image from Shutterstock.