Skip to main content


How the Data Got Their Dots

Helping Students Understand Where Data Come From

The Science Teacher—January/February 2022 (Volume 89, Issue 3)

By Lisa Hardy, Colin Dixon, Seth Van Doren, and Sherry Hsi

How the Data Got Their Dots

A scientist stares at her computer screen as small blue dots appear, one after another, along a coordinate grid. Her experimental apparatus hums in the background. Once per second, a new dot appears on her screen. Sometimes it’s a little higher than the last, but usually it’s a little lower. Interesting. What will these dots tell her about the secrets of the natural world? Much less than she would like them to.

In professional science, these dots rarely tell you anything right away about the natural world. They more likely mean something about that experimental apparatus humming away in the background—It’s running hot again. We need a new vacuum pump. Are the tubes holding steady? It’s only after lots of work—years of work, possibly—that those dots will begin to tell a reliable story about the natural world. Yet this is how scientific progress happens. As scientists tinker with and improve their instruments and methods, the data they produce begin to reveal more about the world.

To scientists, this means that not all data is evidence. Before data can be used as evidence of something in the natural world, a lot of questioning must happen first: What else could have caused this? Is it an effect of my instruments? Some interference from the world? Scientists never just trust that data are good, or meaningful. Puzzling over data—trying to understand why it is the way it is—is central to scientific practice.

Yet in science classrooms, students usually see and work with data that’s intended to tell them right away about the natural world. Students then often treat the data we provide to them as factual, rather than as a source of evidence (Duschl 2008; Sandoval and Millwood 2005; Berland and Reiser 2009; McNeill and Berland 2017; Hancock, Kaput, and Goldsmith 1992; Manz 2016), and struggle to identify sources of error or uncertainty in evidence (Masnick and Klahr 2003). Students rarely have opportunities to critique the methods by which data were produced, to consider data integrity or sufficiency, or to otherwise engage in the scientific practices by which data become evidence (Duncan, Chinn, and Barzilai 2018; Samarapungavan 2018; Duschl 2000). Students also don’t get to do the authentic work of figuring out how to improve it (Chinn and Malhotra 2002; Lehrer, Schauble, and Lucas 2008; Hardy, Dixon, and Hsi 2020).

Rather than always aim for certainty, we need to let scientific work with data be more uncertain. We can then allow students to do more of the “puzzling over data” themselves, whether it’s deciding what sorts of data to collect and how, or developing criteria for what counts as “good data” (Ko and Krist 2018; Manz and Suárez 2018). To get students to do this sort of scientific thinking, and to prepare them for future work with messy data, we need to break data—occasionally, data must fail to tell students about the natural world.

Let’s get messy: three ways to tweak your existing labs

Sensor-based science labs are a great context for students to understand where data come from. When students use sensors to produce data, they can begin to see data not as “fact,” but as created by sensors, computers, scientists, and the natural world (Hardy, Dixon, and Hsi 2020). To develop new types of sensor-based labs for high school biology, we designed many variations on traditional lab experiments. Our labs used “Do-it-Yourself” probeware (Tinker and Krajcik 2001), including low-cost commercial sensors, and internet-connected Raspberry Pi computers (Hsi, Hardy, and Farmer 2017). These sensors presented no more safety risk than other low-voltage electronics though the sensors themselves can be damaged (e.g., by water); they allow students to view the sensors as designed technologies rather than incomprehensible black boxes (Hardy, Dixon, and Hsi 2020).

Here are three effective ways to get students puzzling over their sensor data.

Loosen up the procedures

In a traditional photosynthesis experiment, we give students instructions to follow to create an experimental setup: Twenty spinach leaves laid out flat at the bottom of a closed container and a carbon dioxide sensor stuck into the container to measure the changes in carbon dioxide level. They then record data for a prescribed amount of time when the leaves are in the dark, then again in the light. Instead of giving his students detailed instructions, Mr. B simply showed his class a demo setup, describing it as just one way to do it. Students then built setups of their own.

This led to more variation in their setups: number of spinach leaves, distance to the lamp, how well sealed the containers were. This variation became an important talking point in a discussion of the lab results. For example, Fernando’s group held its lamp to the side instead of directly above the spinach container. As a result, the intensity of light was much lower than for other groups. Even in their “light” condition, they saw that their CO2 levels rose slightly instead of dropping like other groups. During the class discussion after the lab, most students thought that plants do photosynthesis in the light but released CO2 in the dark. Fernando’s group data led them to wonder whether photosynthesis and cellular respiration may be happening simultaneously and at different rates. This variation in the data led to a more sophisticated, and more accurate, understanding of the biological processes. 

CO2 data.
CO2 data.

“The Three Scientists”: Support multiple methods

In a ninth-grade biology class with Ms. T, we described the ways that three different scientists might approach the same experimental question. We called the three scientists “The Coder,” “The Planner,” and “The Tinkerer.” The Coder would use our project’s software to program the lamp to turn on and off while collecting CO2 data. The Planner would create three separate data sets, varying the light levels for each by manually adjusting the lamp. Last, the Tinkerer would collect one long data set and vary the light levels by removing strips of tin oil placed between the light and the spinach.

On lab handouts and in small groups, students described their methods and explained the benefits and drawbacks of each approach. Responses reflected a combination of students’ preferences for ways of doing science, as well as their understandings of the limitations of the tools and materials. Most groups chose the Planner’s method, citing reasons like, “I like planning things” and “We can get more data and because we can check it against our predictions.” Groups that chose the Tinkerer method liked that they could “do something physical,” and thought that this method would yield more accurate data. Last, many students that chose the Coder method enjoyed programming and thought that minimizing the potential for human error would lead to better data, and with less effort.

When students all use multiple methods, it ensures variation in the data across the class. This variation can be productive (as in the example of Fernando’s data) but can also highlight that there are many ways of approaching scientific problems; each approach has different benefits and drawbacks, and the data you make is always dependent on the way you make it.

Re-frame the goals of the lab

The goal of a lab is usually to answer a question such as, “How does the rate of photosynthesis depend on the light level?” But a lot of scientific lab work involves figuring out how to make good measurements in the first place, using the particular tools, materials, or technologies available.

To engage students in this type of scientific thinking, we re-wrote a cellular respiration lab as a design task. Instead of answering a question, we ask students to use available materials (e.g., balloons, straws, tape, sealable bags) to figure out ways to measure the CO2 in their breath. The only problem is, the concentration of CO2 in your breath is much higher than the sensor can measure. Across the classroom, students tried many ways of creating useful data. Some students used a straw or inflated a balloon and blew air directly over the sensor. Others diluted the sample by filling a small sealable bag with their breath and inserting it into a larger one filled with air. They then popped the smaller bag and made their measurements. This way of framing the lab activity opened up space for multiple methods, and made the sensors and their limitations central to the scientific activity. The classroom became a place where students could tinker, troubleshoot, try out, and share different techniques.

In a re-design of our photosynthesis lab, we asked students to “create a data set that tells you as much as possible about the relationship between plants, light, and CO2.” To collect data, many students held the lamp by hand, watching their data as it came in, adjusting the height of the lamp when they decided they had enough data at each height. Rather than “set it and forget it,” the students became active participants of an ongoing experiment. We saw this again in a follow-up lab activity in which students had the goal of stabilizing the carbon dioxide levels. Many students again manually adjusted the height of the lamp to find just the right level of light to balance the rates of photosynthesis and cellular respiration. Others added sheets of wax paper between the lamp and the container to gradually adjust the light levels. Others used the programming capabilities of the software to turn the lamp on when the CO2 rose above a threshold level.

When students truly create data themselves, either by devising the methods or by taking on an active role in the ongoing experiment, their data begin to show a history of their own decisions and interactions. For example, a student might notice a pressure spike from when they bumped into the container, or what looks like noise if they didn’t take care to isolate the sensors. Further, noticing these features of their data can lead students to wonder how these sensors really work. They begin to ask more questions about the experimental setups and the technology: How exactly does a CO2 sensor create a number? Why does the sensor read a different value if we apply pressure to the container? And what is perhaps the central question in scientific reasoning about data: What is this data really telling us?

Helping students make sense of messy data

When students make their own data, it becomes more difficult to take a single, straightforward approach to analyzing it. So instead of stepping students through an analytic procedure, we’ve focused on pre-analysis and discussion as places to support the sort of “puzzling over data” that scientists do.

Data (back)stories: The where, why and how of data

To open space for puzzling and wondering about data—the sorts of work scientists might do before a rigorous analysis—we created an activity type called “Data (Back)Stories.” They are “backstories” of the data, told with and about a data set about how it came to be: What might have been happening at each moment in time to produce this data? We have incorporated Data Stories into student worksheets (see Figure 2), as well as in routines for class discussions about data.

Data story.
Data story.



A good way to introduce Data Stories is with graphs of something familiar or personal to the students, like a week or month in their classroom. In this case, Data Stories can be told by linking features of the graphs to prior knowledge they have about their school or class schedule, numbers of students, teacher free periods, fire drills, weather, local air conditions, etc.

For example, Ms. T had students look at graphs of CO2 and light level data that she had collected in the classroom over the prior week. She projected the graphs at the front of the room and showed students the Internet-of-Things sensor kit that she used to create them. On accompanying worksheets, students were asked to locate three events in the data: when students left the classroom, when the internet connection was disrupted, and when a janitor entered the classroom. Then students were asked to figure out which days of the week the data was being collected. Students combined their knowledge of their school’s bell schedule with features in the graph to make guesses.

Ms. T ran her hand along the horizontal time axis of the projected graph, asking the class to stop her when she reached each event. Students then explained their reasoning using features of the CO2 data, light data, or both. During this activity there was a clear sense of excitement and amusement about the classroom data. A pair of students joked about wishing they had blown on the sensor kit to mess up the data. One class jokingly accused the “shady” janitor of disrupting the sensor connection, and wondered aloud why he might come to the classroom so late at night, and stay for such a long time.

This activity was engaging, and highlighted the way that Data Stories make room for thinking about the way that data is really produced: involving human actors like the janitor or the teacher herself, the students in the classroom breathing carbon dioxide throughout the day, and the internet connection across which the data is transmitted. The activity also opened scientific questions about respiration that began a storyline for the unit.

When students can see the sensors and students behind the data—in addition to “the science”—they can then engage in more sophisticated scientific reasoning about good experimental design and methods, and about the trustworthiness of the data as evidence.

Whole-class data discussions

Discussing data from labs as a whole class is a great way to get students thinking about sources of variation in data. For example, after the students in Ms. K’s classroom completed the cellular respiration “design” lab described above, Ms. K facilitated a whole-class discussion about their data. When one student described his data as “stair steps,” Ms. K asked the class why it might look like that. Some students attributed the steps to the data-sampling rate on the sensors, or the data-saving rate of the computers. Another student suggested that it was as if someone was breathing in and out. Ms. K brought these together, saying, “OK, it could be this, it could be that. Don’t tell us yet—first tell us what you did to make this data (materials)? What could have been going on?”

With this “maybe” move, Ms. K supported the possibility of multiple influences on data, inviting students to wonder about what is real and what is due to the setup or sensors. Here the data don’t just tell you about a scientific phenomenon, but are an artifact of the technologies and of students’ choices about how to make their measurements. In such discussions, students not only offer explanations of features of their data, but also suggestions on how to improve their measurements. For example, when noting that one student’s data had hit the maximum value that the sensor could measure, another asked him why he had not diluted his CO2 sample first.

Discussing results of a lab can be a bit more difficult when students have very different data sets. To begin to make sense of the varied experimental results, Ms. K’s strategy was to layer students’ data from each condition on top of each other. Standing at the whiteboard, she first asked her students to describe their data to her. As they did, they used their own language, saying “bumpy” or “up, fast,” Ms. K sketched out what they told her. They’d correct her: “No, spikier—more like stair steps.” or “No, it was steeper than that.” She’d rephrase: “OK, a greater slope?” and resketch until they were satisfied. She repeated this for multiple data sets/students, one on top of the other. Students could see that—despite their data being bumpy and wiggly, and overall quite different—they still all saw the same overall effect: CO2 levels rose in the dark, and dropped in the light. And even though the students had all used different methods to create their data, they (with Ms. K’s facilitation) were able to arrive at a consensus about what had happened overall, which was sufficient to move the conceptual discussion forward.

Looking forward

For scientists, data is not the same thing as evidence. But for most students, the data they see in science classrooms—in their textbooks or from their own labs—is always used as evidence. This means that students lose out on the chance to practice critiquing the methods used to produce data, or thinking about how to collect better and more useful data.

While the resulting problems students have in reasoning about data have been well-documented, research on the effective design of sensor-based labs to promote a more sophisticated understanding of data is still in an early, exploratory stage. This early work may pave the way to later develop summative assessments for students’ understanding of data that can both help us see what individual students understand and gauge whether laboratory experiences are effective for all students. Yet at this stage, we use multiple methods, including informal and formative assessments, to elicit students’ reasoning so that we can characterize the opportunities that labs provide for students to think about their data in new ways. For instance, lab worksheets can prompt students to notice effects of the sensors on the data (e.g., the response time, or range) and ask students to describe possible causes. This can both probe their understanding of the sensors and prime them to consider the sensors as active in producing the data. Further, written data stories can show whether students can identify features of data due to the sensors themselves, or to human activity, in addition to those due to the phenomenon under study. By creating opportunities for students to see their data in new ways, we can help them begin to think in sophisticated ways about data and evidence.

Lisa Hardy ( is a Research Associate at the Concord Consortium in Emeryville, CA. Colin Dixon is a Research Scientist, Seth Van Doren is Research Project Manager, and Sherry Hsi is a Principal Scientist at BSCS Science Learning, in Colorado Springs, CO.


Berland, L.K., and B.J. Reiser. 2009. Making sense of argumentation and explanation. Science Education 93 (1): 26–55.

Chinn, C.A., and B.A. Malhotra. 2002. Epistemologically authentic inquiry in schools: A theoretical framework for evaluating inquiry tasks. Science Education 86 (2): 175–218.

Duncan, R.G., C.A. Chinn, and S. Barzilai. 2018. Grasp of evidence: Problematizing and expanding the next generation science standards’ conceptualization of evidence. Journal of Research in Science Teaching 55 (7): 907–937.

Duschl, R. 2000. Making the nature of science explicit. In Improving science education: The contribution of research, eds. R. Millar, J. Leech, and J. Osborne, 187–206. Philadelphia: Open University Press.

Duschl, R. 2008. Science education in three-part harmony: Balancing conceptual, epistemic, and social learning goals. Review of Research in Education 32 (1): 268–291.

Hancock, C., J.J. Kaput, and L.T. Goldsmith. 1992. Authentic inquiry with data: Critical barriers to classroom implementation. Educational Psychologist 27 (3): 337–364.

Hardy, L., C. Dixon, and S. Hsi. 2020. From data collectors to data producers: Shifting students’ relationship to data. Journal of the Learning Sciences 29 (1): 104–126.

Hsi, S., L. Hardy, and T. Farmer. 2017. Science thinking for tomorrow today. @Concord 21 (2): 10–11.

Ko, M., and C. Krist. 2018. Redistributing epistemic agency: How teachers open up space for meaningful participation in science. In Rethinking learning in the digital age: Making the learning sciences count, 13th International Conference of the Learning Sciences (ICLS), eds. J. Kay, and R. Luckin, 232–239. London: International Society of the Learning Sciences.

Lehrer, R., L. Schauble, and D. Lucas. 2008. Supporting development of the epistemology of inquiry. Cognitive development 23 (4): 512–529.

Manz, E. 2016. Examining evidence construction as the transformation of the material world into community knowledge. Journal of Research in Science Teaching 53 (7): 1113–1140.

Manz, E., and E. Suárez. 2018. Supporting teachers to negotiate uncertainty for science, students, and teaching. Science Education 102 (4): 771–795.

Masnick, A., and D. Klahr. 2003. Error matters: An initial exploration of elementary school children’s understanding of experimental error. Journal of Cognition and Development 4 (1): 67–98. DOI: 10.1207/S15327647JCD4,1-03

McNeill, K.L., and L. Berland. 2017. What is (or should be) scientific evidence use in K–12 classrooms? Journal of Research in Science Teaching 54 (5): 672–289.

Samarapungavan, A. 2018. Construing scientific evidence: The role of disciplinary knowledge in reasoning with and about evidence in scientific practice. In Scientific reasoning and argumentation, eds. F. Fischer, C.A. Chinn, K. Engelmann, and J. Osborne, ٥٦–٧٦. New York: Routledge.

Sandoval, W.A., and K.A. Millwood. 2005. The quality of students’ use of evidence in written scientific explanations. Cognition and Instruction 23 (1): 23–55.

Tinker, R.F., and J.S. Krajcik. 2001. Portable technologies: Science learning in context. New York: Springer.

Biology Labs Pedagogy Research Science and Engineering Practices Teaching Strategies Technology High School

Asset 2