Search

Interrogating Data: A Science Writer’s Guide to Data Journalism

A graphic showing vertical bands of color that shift from hues of blue, at left, to red, at right.
The “Warming Stripes” graphic represents, in a simple visual form, the change in average annual temperature (in this case, globally) over the past 100+ years. Ed Hawkins/University of Reading (CC BY 4.0)

 

In many ways, all science writers are already data journalists. What do science writers do when they report on a newly published study? They dive into the details of the paper’s results; they ask experts for opinions on potential flaws in the methodology; they seek to connect the conclusion to their readers’ lives. Such investigation is driven by a desire to find evidence from the most authoritative sources and present it as clearly as possible. Reporting on data requires the same skill set.

Both data journalism and science writing boil down to “taking something complicated and trying to make it understandable,” explains Sara Chodosh, an assistant editor and graphics producer at Popular Science. Once a data journalist has answered their own questions, they go through the same process with an imagined reader: What can the data tell the audience that will help them grasp a larger pattern or concept?

Data journalism, whether it takes the form of a static visualization, an interactive feature, or simply a bit of additional analysis to add context to a breaking news piece, can bring scientific results to the foreground of a story and make them accessible for readers. Take, for example, the maps in a Reuters article that put Australia’s bushfires into perspective. Marvel at an interactive astronomy chart published by National Geographic that allows readers to explore our solar system’s moons. Explore the Climate Central report on shifting snowfall levels, which invites local journalists and meteorologists to repurpose the data in order to connect changing weather patterns directly to their audiences.

In its simplest definition, data journalism is the practice of using numbers and trends to tell a story. It requires a variety of skills: research to find the correct dataset, analysis to determine what kind of story this dataset may tell, and presentation to share that story with readers. And these skills are within reach for many science writers, even without any programming background: Simply ask questions, and you will find the central tenet of a story.

 

Research: Choose Your Data

The first step with any data story is finding a dataset to analyze. For science writers, one natural source is the results section of any paper that you believe tells a compelling story. Many scientists release their unanalyzed data on open-access platforms such as Dryad and GitHub, a practice that allows others, whether scientists or journalists, to explore and build upon published results. And even data that are not shared through open-access channels are often available on request.

Either way, the choice to use the results of one particular study in a data story requires careful vetting; consider the authors’ credentials and pore over their methods section before diving in.

Priyanka Runwal, a science writer and data reporter at Climate Central, points out that the process of finding a dataset may depend on the assignment. In some instances, one may have a question in mind (say, “How many Americans have been tested for COVID-19?”) and search for a specific dataset that answers this question. In others, one may come upon an intriguing dataset (say, the Global Health Security Index) and seek to formulate a question from it.

In examining a potential dataset for use in a project, consider whether the data tell a compelling story. Are there evident trends or interesting outliers? Would readers want to explore a figure, or would they prefer to jump ahead to the conclusion? A story explaining a review of biodiversity hotspots, for example, may benefit from a map or chart showing where these habitats are located around the world and how they are threatened by humans. In contrast, focusing heavily on numerical results from different trials in a story about testing for a new medical treatment may distract readers from understanding the qualitative conclusions about what the treatment so far seems to accomplish and the necessary steps to come.

Besides these questions of reader value, consider logistical concerns. Are the data downloadable? Have they been released under Creative Commons licenses? What do all of the data labels represent? Do you understand the study methods, caveats, and implications, or will you need to ask a scientist or press officer for clarification?

 

 

Analysis: Rely on Your Curiosity to Turn a Spreadsheet into a Story

Once you have a dataset, the next step is to find patterns in the numbers. Data analysis can often feel like chipping away at a stone in order to make a sculpture; you may start with a massive spreadsheet and spend days isolating specific variables or data points which will illustrate a trend to your readers.

You may make this process more targeted by asking questions of your dataset as though it is an interview subject, suggests Peter Aldhous, a science reporter at BuzzFeed News and data journalism professor at the University of California, Santa Cruz, and the University of California, Berkeley. As he says: “What can the data tell me that I want to know?”

Common questions to consider may include: How do you need to clean the data (through standardizing names, changing labels, geocoding, and so on) to ensure that categories match up and all necessary information is present? What role does each variable play in the source study or in other similar datasets? Which variable may be used as an indicator of a larger trend? What analysis is necessary to show that trend—for example, what other variable might you compare to the first, or what groups of data points might you compare to each other?

Don’t let your curiosity run too wild, though. Aldhous cautions that, like other sources, data can mislead you if you aren’t careful.

 

A graph showing the COVID-19 cases over different regions of the US, between March and July of 2020.
As COVID-19 cases spike in Florida, Texas, and other southern states, the data-visualization volunteers at The COVID Tracking Project at The Atlantic often use regional charts to show how these current outbreaks compare to the U.S.’s most infamous outbreak thus far (in the northeastern U.S. in March and April 2020). Charts on the project’s website are automatically updated daily, along with the project’s database. The COVID Tracking Project (CC BY-NC-4.0)

 

Duncan Geere, a freelance data journalist and former editor at Information Is Beautiful, puts his warning this way: “Figure out what the data is showing, but also what it’s not showing.” What are the limitations in this dataset, due either to flaws in the methods used to compile the data or to discrepancies between what the data reveal and the story you want to tell? How might you want to filter the data to account for limitations, outliers, or missing pieces? What biases may have been present in the compilation? Closely examining data-collection methods is especially crucial when the data are describing people.

Geere suggests writing down aspects of a dataset that you find interesting, as well as questions that come up, as you explore the data. “I reason that, if I find this particular aspect of the data interesting, then my audience will as well,” he says.

It may take some time to home in on what variable or trend from a dataset tells the most compelling story. Runwal leans into this exploration, she says. “For me, it requires patience, and eyeballing numbers for a while to actually make sense of them.” To this end, you may test several different methods of filtering or analyzing your data before deciding which focus will be most informative for your readers.

Patience is also key in the analysis process because code (even a supposedly simple Excel formula) often breaks. When that happens, online resources abound: forums such as the National Institute for Computer-Assisted Reporting (NICAR) listserv, Stack Overflow, and even social media sites can help you solicit advice from more experienced data reporters. Geere recommends the Data Visualization Society, which boasts an active Slack server including both journalists and visualization experts from other fields.

Finally, just as responsible writers record interviews and save their notes, responsible data journalists keep careful track of every step in their analysis. You want your work to be reproducible, both by other people in your newsroom—data journalists aren’t safe from copy-editing and fact-checking—and by readers.

As Sam Leon, data-investigations lead at the international NGO Global Witness, explains in a chapter of the Data Journalism Handbook on methodologies, data can easily be “distorted and mis-represented” through errors at analysis stages. Such errors can range from a typo introduced while cleaning data to an analytic choice that misrepresents correlation as causation. (See “Resources for Data Journalism Novices” for a list of popular programs, ranging from programming platforms to free online services for building graphics.)

 

JPL mission history infographic with colorfuul lines arcing from illustrations of planets.
To advertise a new website inviting NASA fans to make their own space-themed infographics, the Jet Propulsion Laboratory (JPL) showcased a graphic of their own. The chart explores JPL mission history with a colorful time series. NASA/JPL-Caltech

 

Presentation: What Should Readers Take Away from a Data Story?

Just as your questions can drive your data analysis, potential questions from your audience can drive your presentation. “Good science communication thinks about its audience,” Geere says. Good data visualizations do the same; they tell a story that the audience will be able to follow, whether that audience is highly science-literate readers of a trade publication or young readers of an educational site.

Geere outlines the basics of storytelling through data in a blog post: Like any other story, he explains, visualizations need a beginning (an entry point), a middle (answers to readers’ key questions), and an end (a final takeaway for the reader, whether this is a better understanding of a scientific issue or a connection to their own life).

Different visualization formats can highlight different aspects of a dataset. Kaiser Fung, data-science expert and founder of the blog Junk Charts (which highlights errors in data visualization in the media), lays out some ground rules in an article for the Data Journalism Handbook site. Pie charts (if they can’t be avoided) should be designed with careful consideration to color and order of sections, as readers’ eyes will be drawn to the largest sections. Bar charts and dot plots allow for easy comparison between groups. Scatter plots call attention to trends, and regression lines may be added to guide readers’ interpretation.

But there are more ways to present data than in static charts. In recent years, data reporters have increasingly sought out new ways of making their stories interactive, from Johns Hopkins’s COVID-19 tracker, which shows the virus’s global spread, to Stacker’s data-based slideshows, which add photos and context to each figure in the datasets upon which they rely. (Disclosure: Stacker is my employer.) Interactive features can help readers narrow or broaden the scope of a story according to their interest, to see how the data directly apply to them. And such features do not necessarily require extensive coding, either; searchable databases, for example, which are essentially public spreadsheets hosted by journalistic organizations, are a useful tool for readers to find specific information and do their own research.

However you present your data work, though, one guideline is always relevant: Make it simple. “I am always striving to make things that you can look at and immediately, as clearly as possible, understand what you’re being shown,” Aldhous says.

Chodosh agrees, noting that the more data are pared down, the easier it is to follow a story. Ensure that readers can follow one variable or one group of values at a time, and test your visualization by showing it to colleagues who aren’t familiar with the data. Simple color schemes, large text, and clear captions can also help make visualizations more accessible to readers who may otherwise have trouble following them.

In addition to considering your audience in the presentation of your data itself, consider your audience in writing a methodology section. A methodology can be a direct link to your code, a precise series of steps, or simply a paragraph at the end of your article. The complexity and location of your methodology section should depend on your audience: How much do you anticipate that this audience will want to understand precisely how you arrived at your conclusions?

In its most basic form, a methodology section should include a clear link to your data source and the major steps you took to analyze the data, written in simple language without jargon, as well as any caveats or major exceptions.

Several free resources can help you build data visualizations without coding; see the resource list at the end of the article. (But fair warning: If you travel down the path into the world of data reporting, you may find yourself seeing coding as a means of accomplishing more complex and more customizable presentations. For more coding resources, check out NICAR, as well as data journalism courses on Coursera, Code Academy, and the Northeastern University School of Journalism’s Storybench publication.)

 

 

 

Betsy Ladyzhets
Betsy Ladyzhets

Betsy Ladyzhets is a data journalist and science writer based in Brooklyn, New York. She is a research associate at Stacker, where she manages the publication’s Science and Lifestyle verticals. She’s also a member of the National Association of Science Writers and a volunteer for the COVID Tracking Project. Find her on Twitter @betsyladyzhets, and check out her newly minted newsletter, the COVID-19 Data Dispatch.

Skip to content