How can data be misleading? The purpose of data visualization is present data in a format that is accessible to others. It is meant to tell a story which makes it vulnerable to being manipulated in ways that leads the viewer to believe false information. What can we do? Here are some ways on how data is misrepresented and what to do about it.
1. omitting data -- remove or leave out data to emphasize a point
2. X-axis (horizontal), Y-axis (vertical) -- In a graph (scatterplot, line, bar), there are two perpendicular lines (lines that form a right or 90 degree angle). Those two lines are measurements that are used to describe the data within that space and the point where the lines interest is known as the point of origin or zero (0). Data in these types of graphs can be misleading when the one or both axes are undefined to to exaggerate the data. For example, some people will shorten (truncate) the y-axis to make the difference between 2 data points seem larger.
3. correlation vs. causation -- Correlation does not imply causation. Data visualizations can be misleading when the creator displays two or more social/scientific phenomena (i.e. trends) that are occurring at the same time and at similar rates to imply that they influence one another. Those representations do not account for variables that weren't or intentionally left out.
4. data fishing (data dredging - a data mining technique) -- imagine conducting an experiment without an hypothesis, or finding sources for a research paper without a thesis. Data fishing is essentially collecting random samples of data, making a conclusion, and not conducting more tests to verify the conclusion. This technique is extremely vulnerable to bias and cherry-picking.
5. bias -- bias plays a role in every decision one makes. While there is no cure for unconscious bias, there are ways to minimize such as having a diverse research team, training to increase bias-awareness, and/or hiring an outside to audit organizational practices. Be aware that there are those that will manipulate data to fit their goals/interests. Examining the authority (author, publisher, content creator) is good practice to evaluating information sources including charts and graphs.
6. Simpson's Paradox -- "...a statistical phenomenon in which an observed association between two variables at the population level (e.g., positive, negative, or independent) can surprisingly change, disappear, or reverse when one examines the data further at the level of subpopulations." (1) The video "How statistics can be misleading" in the second tab of this section provides deeper understanding of the concept.
(1) Bonovas, S., & Piovani, D. (2023). Simpson's Paradox in Clinical Research: A Cautionary Tale. Journal of clinical medicine, 12(4), 1633. https://doi.org/10.3390/jcm12041633
Goucher College Library, 1021 Dulaney Valley Road, Baltimore, MD 21204 • 410-337-6360 • © 2013-2017 •
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.