Have you seen an NSLG lately? Chances are you have. An NSLG refers to a NonStandard Linear Graph, and it sucks. Let’s look at little more closely at what we mean by an NSLG. Then we can address why we think they suck.
Why we use/need graphs
At Chartlytics we love data graphics! So many different kinds of data graphics exist that it would take a very large book to properly describe them all. One class of graphic displays we see quite often falls under the category of “time series.”
A time series graphic shows data changing across an interval of time. How many pounds did a person lose across days? What number of speeding tickets did police issue during holiday weekends? How many hurricanes took place each month last year? The data on a time series graphic, frequently displayed on a “line chart,” tells a story of change.
Two main actors in the data story include “trend” and “variability” (sometimes the weird actor “outlier” joins the cast). Trend refers to how a measured quantity increases, decreases, or stays the same. Variability displays the stability, irregularity, or volatility of the measured change. An outlier refers to exceptional performance of the positive (“Wow look how awesome she did!”) or negative persuasion (“Ug, what the heck happened there!”).
A line chart from Gallup (of the famous Gallup poll) website serves up a nice example of a time series graph. United States Presidential job approval statistics began in the late 1930s and have become an important piece of information. The approval rating acts as a barometer of public support for the President of the United States. The line graph below shows the job approval rating of George W. Bush. The line graph has the percentage of approval rating on the vertical axis and the time displayed in months/years on the horizontal axis.
Figure 1: A line graph showing President Bush’s approval ratings. Source: http://www.gallup.com/poll/103798/bushs-yearly-approval-average-fourth-worst-gallup-annals.aspx
What story of change does the line graph tell? The spike in 2001 comes after 911. Following the terrible terroristic act, America united behind their President. As time went on, the trend of the data tells a story of a country whose approval rating for the President declined across the years. Why it happened requires more information on the graph,and that forms another story.
With an immediate examination of the line graph, any chart reader can discern important information with the quick and enlightening power of the time series graph. Look at the table below and compare the effect of looking at numbers to looking at a line graph.
Figure 2: A table with President Bush’s approval ratings. Source: http://partisanid.blogspot.com/2013/07/title.html
Making judgements with numbers alone does not inform the reader as well as the graphic form of the data. Numbers matter, but visualization displays interesting information and subtle and not so subtle patterns lost to a numerical table or number alone.
Line graphs’ imperative
The world needs time series visual displays, and you see them everywhere. Line graphs show change across time, and preferably through an “in your face” manner. People use the knowledge to make important, high stake decisions (decisions not well answered by numbers or just a statistic).
As an example, teachers measure student performance in reading and then graphically display the data on a line chart. If the student’s data across time shows the student has not learned fast enough, the teacher will make a change. If the student continues to exhibit no change or a lack of significant change, the teacher will take more measures to provide help. A worst case scenario occurs when the student doesn’t improve even through the teacher tried a number of interventions. The student may then receive a referral to specialist (e.g., reading specialist, special education teacher, school psychologist).
What if the graph the teacher used didn’t tell the data story correctly? In other words the graph could indicate a lack of progress when the student actually has a learned at an acceptable rate (we call the previous situation a false negative). The graph makes obvious the false negative below.
Figure 3: An animated graph showing data on a NSLG with a very low growth rate.
On the other hand, a graph could show the student making very rapid growth when the student made progress at a lower rate of change (a false positive). In other cases, the false positive could show a rapid rate of growth when the student actually has made very little growth.
Figure 4: An animated graph showing data on a NSLG with a very high growth rate.
False positives and false negatives gives us a headache.
As a savvy data visualist you have probably deduced why the line graphs above provided false impressions. With the one graph the vertical axis had a scaling of 0 to 200. With the second line graph the vertical axis started at 14 and ended at 22. Do you claim shenanigans? If yes, welcome to the world of NSLGs!
Why do so many people scale the graph to whatever number they feel like? Because they can. Because we don’t have rules to say otherwise. Because NSLGs, by their very nature, thrive in a miasma of nonstandardization.
People make decisions from line graphs that range from low stakes (e.g., how many wins have I had in my Pokemon card game league) to high stakes (e.g., will a student enter special education). All information has value (Rick lost his Pikachu card to Dave last week because he didn’t pay attention to his graph).
You may have time series data. And if placed on a line graph then it rises to a level of importance. What display you pick to view your data will make a world of difference. If you choose a NSLG you may fall prey to many of its weaknesses. As we move forward with our blog we will lay out in painstaking detail all the many limitations, and sometime outright deceptions, NSLG contain. But we do not mean to imply all NSLGs are evil. NSLGs have value if they meet they meet the following criteria:
1. Follow proper construction rules
2. Graph readers need to see/focus in on absolute change
3. The nonstandardized graph does not distort the data and tell a different story to the graph reader
The problem we have with NSGLs, and why they suck (most of the time); graph makers frequently violate one or all of the NSLG usage guidelines. For example:
Problem with rule #1. A crack team of data scientists from Penn State, Pitt, and Vanderbilt recently conducted an expansive study of NSLGs (time series line graphs) in 11 journals. After closely examining 4,313 NSLGs the researchers discovered many graphs contained construction and labeling errors. For example, graph makers violated the proportional construction rule 85% of the time. Furthermore, for multiple graphs within the same figure, 69% of the reviewed graphs did not scale the vertical axis to the same terminal value (meaning comparisons of trend and variability between graphs falls out of whack - kind of like comparing a race where people run different lengths). Many other problems came to light which we won’t go into here. But the analysis of the study makes sense of a perverse graphing policy - when everyone practices nonstandardization, no one is wrong.
Problem with rule #2. We want to view absolute change when we *only* care about the sheer amount of differences in quantities and nothing more. Let’s compare two companies.
The table above shows Company A had $20,000,000.00 revenue in the second quarter. When compared against the first quarter revenue, we see a $100,000 increase. Company B posted $20,000,000.00 in second quarter revenue. Again, comparing against the first quarter we observe an increase for company B but for $500,000.00. We can clearly compare the results of both companies and conclude Company B did better than Company A in absolute amount of revenue (Company A had a $100,000.00 increase and Company B a $500,000.00 gain).
Can we then conclude we should invest our money in the stock of Company B instead of Company A because Company B made had way more revenue than Company A (to the tune of $00,000.00 more)? If we let absolute amount guide our decision making we will invest in Company B.
But relative change gives us different information from absolute amount of change. Absolute amount of change looks at sheer differences while relative change focuses on relative differences. With the additional information in the table below we see the percentage of increases for revenue for Company A and B. When we compare first quarter (Q1) and second quarter (Q2) changes we now see very clearly Company A has an insane 100% growth rate and Company B grew by a slight 2.6%. In light of relative change, who would you now invest in?
In business, science, and life we better judge the significance of differences with relative change. Relative change always shows much one quantity changes relative to another; critical information because we can now judge changes against one another. Take a moment and look at scientific journals for psychology and education, guess what kind of time series graphs you find? Almost 100% nonstandard linear graphs presenting us with the crude gift of absolute amount of change.
Problem with rule #3. As shown above in the two animated figures, we have two stories based on the scaling of the vertical axis. One shows a slow increase while the other shows a rapid increase. The same data telling two vastly different stories should cause us to pause and say “What the h e double hockey sticks?”
What to do
Related idea: I find it disconcerting when people try to dismiss correlations by uttering the phrase “Correlation does not imply causation.” True, but a correlation by itself provides information in exact concordance with its design. Likewise, a linear graph will visually depict the information in direct relation to graph construction and the linear graph’s mission - showing absolute amounts of change.
No need to get angry at a correlation. Therefore, let’s not hate on linear graphs. But we must recognize a multiplicity of problems associated with nonstandardization. We must further acknowledge the various limitations of a linear graph (for example they focus only on absolute amount of change when we need to see the world through the lens of relative change). Linear graphs have their place in the world, but so many of them distort the present information due to nonstandardization and its insidious effects. We can do better.
At Chartlytics we do not offer opinion, false claims, or visual dishonesty. Instead, Chartlytics leverages the power of a standard ratio graphic, the Standard Celeration Chart. The ordered, rational geometry of celeration lines, bounce, and clearly detectable outliers means chart readers make better, faster decisions. And superior decisions lead to high-quality outcomes.