Using computational thinking for data practices in high school science
By Erin Peters Burton, Peter Rich, Timothy Cleary, Stephen Burton, Anastasia Kitsantas, Garrett Egan, and Jordan Ellsworth
Students often need to obtain, organize, clean, and analyze data in order to draw conclusions about a particular phenomenon (e.g., why tidal heights change). When conducting a science investigation in biology, chemistry, physics, or Earth science, data can be collected by the student or can be provided to them via secondary data sets. This article demonstrates how computational thinking and data practices can be merged to develop more effective science investigation lesson plans.
One of the science and engineering practices from the Next Generation Science Standards (NGSS), Using Mathematics and Computational Thinking, provides opportunities to help guide data practices in high school science investigations. Data practices are a suite of undertakings that students perform in order to make scientific claims from the evidence found in data. Weintrop and colleagues (2016) have identified five data practices that scientists engage in while doing investigations: Creating data: Generating data from tools or observation Collecting data: Gathering and recording data Manipulating data: Sorting, filtering, cleaning, normalizing, and combining data sets Visualizing data: Communicating results with a representation such as a graph or chart Analyzing data: Extracting meaning from a data set for the purpose of drawing conclusions
As teachers, we may recognize these practices as crucial processes that students must undertake in order to gather valid and reliable data and make strong conclusions about scientific phenomena. A challenge for many students, and particularly those with less experience with scientific practices, is making effective decisions regarding how to use data practices in an integrated way during science investigations.
Simply developing student knowledge about these data practices will not typically be sufficient for enabling students to use such practices. Students need guided practice and feedback, but can also benefit from learning how computational thinking practices can be used to explain the detail in the processes of data practices.
Computational thinking (CT) is an approach to solving problems and designing systems that requires students to think recursively, reformulate problems to see them in a different light, model relevant aspects of problems, and use abstraction and decomposition in tackling complex problems. Wing (2006) defined CT to indicate a “thought process involved in formulating problems and their solutions so that the solutions are represented in a form that can be effectively carried out by an information-processing agent.” CT can be a useful addition to instruction as it is a suite of complex processes that students can use to become skillful in data analysis in scientific investigations (Weintrop et al. 2016).
Because science and mathematics increasingly emphasize computation, introducing computational tools in the classroom can give a more authentic view of these disciplines (Augustine et al., 2005). Computational thinking practices can help students understand what to perform during data analysis in a way that reflects the scientific discipline.
There are several CT practices that are naturally linked with data analysis, such as decomposition, pattern recognition, abstraction, algorithm building, and automation. Decomposition is defined as the breaking down of a complex problem into less complex subproblems. The specific approach to problem decomposition can vary, but the purpose is the same—to reduce the main problem into manageable steps or subproblems.
For example, in order to better understand what causes tides, we might identify some variables that may affect tide height, such as the position of the Moon or shape of the coast. To gather tide data more systematically, we might consider ways to measure the height of a tide, the format in which we would want that data, how often we want to measure tidal data, and where to place our instruments to get reliable information.
Another computational thinking practice is pattern recognition. Pattern recognition is defined as the identifying, clustering, and modularizing of steps, parts, or correlations that repeat or can be repeated. The primary purpose of identifying patterns is to cluster related parts of the problem by their recurring feature(s).
For example, to understand how a specific factor affects tide height, we might look for correlation (or pattern of behavior/relationship) between the cycle of the moon and tide height. While manipulating collected data, we might notice an additional relationship (i.e., the nature of an observed pattern) between the shape of the water body and tide height, such as an increased correlation when controlling for time.
Abstraction is conceptualized as a process of identifying and organizing relevant information and removing unnecessary information. The purpose of abstraction is to clarify the problem and generate generalizable solutions, which is an essential skill for constructing models in science and engineering (Gilbert 2004).
To better understand the relationship between two variables such as tide height and position of the moon, we might remove outliers so that we can focus on the data points that are most coherent. When testing correlation with many different factors in our tides example, such as latitude or temperature, we might notice that some factors have minimal correlation with resulting tide height. We might then choose to omit that factor from consideration, given that it does not appear to have a meaningful impact on our observations or predictions.
Algorithm building is defined as the creation of a series of precisely defined steps or rules that leads to successful solutions to a problem. An algorithm, in simple terms, is an unambiguously defined process to address an initial question. It may involve the steps to collect certain data, the steps to analyze that data, or any other defined process. The steps of the algorithm should, if built correctly, lead to a correct solution of the problem every time, or within a known error chance. To determine the factors that affect tide height, we might create a process of data collection and analysis to determine the extent to which location of the moon and shape of the water body impacts overall tide height. In order to predict future tide heights, we might create a formula for calculating tide heights given specific patterns or measures identified.
Automation involves performing a procedure with little or no direct human interaction. This term typically refers to the use of machinery or computers to perform the automation. At this level of computational thinking, the goal is to outsource work so that it reduces or removes the requirement for direct human action to achieve the desired outcome. For example, with a defined formula for determining future tide height, we may automate the process of calculating the precise predicted tide height for the next 24 hours by programming the formula into a computer to calculate. Rather than collecting data by hand, we will use a machine or computer to measure a specific variable at predefined time intervals. Automation can occur even at small scales, allowing scientists to focus on data analysis rather than the minutiae of repeated process.
For example, to reduce tedium in data manipulation and analysis (e.g., converting units of measure from imperial to metric), we may use a computer to store and manipulate our data in a quicker manner.
Blending CT practices with data practices can provide students with a metacognitive guide for making decisions while engaged in scientific investigations. Based on an analysis of CT and data practices, we hypothesize that different combinations of CT practices are well-aligned or fit naturally with different data practices. Table 1 (see “On the web”) displays the matrix we developed to help teachers focus CT to the most relevant data practices. We felt that not all CT practices and data practices align, and those areas are represented on the crosswalk by a blank space.
From this matrix, we have developed an extensive set of questions that a teacher can use in the lesson planning process or that students can use while engaged in creating data, collecting data, manipulating data, or visualizing data. The questions are organized by data practice and then by computational thinking for each data practice. After we present the questions, we explain how this set of questions can be used.
The entire set of CT questions for data practices is not intended to be used all at once. Instead, teachers could use these questions in two ways. First, teachers could take a lesson that is already written and choose one or two data practices to focus on. For example, if the lesson calls for secondary data sets to be selected from the NOAA website, questions about creating data may not be as relevant, but questions about visualizing data may be more important. Alternatively, teachers could focus on one CT practice, such as decomposition, throughout the lesson and apply each decomposition question in the sequence of data practices from creating to visualization.
Students can benefit from having these questions embedded into lessons. For example, we asked students to conduct an investigation to find factors that influence the height of tides. In this investigation, we asked students to collect data on a NOAA website and to plot the height of tides for four locations. The data provided by NOAA can be downloaded by different time intervals. Many of the students downloaded the data using the same measure per day over the course of a year.
When students plotted the data, it appeared to be thick colored lines and students could not interpret the patterns (see Figure 1). We asked the students questions from the manipulating data section (“What information is most necessary to answer your research question? Can you filter the data to only show the most important data?”) Prompted by the questions, students decided to use a smaller amount of data, every hour for a month, which was more appropriate for interpreting the phenomena.
With the manipulated data, students were able to see the patterns of the highs and lows in tides (see Figure 2). Further, students were able to discern that tide heights differed by location, which then led them to consider there is more than one variable affecting the height of tides.
By asking these CT questions about data practices, students can be more metacognitive about why they make the choices they make in their data practices. Understanding more about the assumptions made in data practices by using computational thinking could help teachers be more explicit in teaching about analysis of data and can help students think about why they are doing the data practices to make valid and rational decisions.
Augustine N.R., et al. 2005. Rising above the gathering storm: Energizing and employing America for a brighter economic future. National Academies Press, Washington, DC.
Gilbert J.K. 2004. Models and modelling: Routes to more authentic science education. International Journal of Science and Mathematics Education 2 (2): 115–130.
Foster I. 2006. 2020 computing: A two-way street to science’s future. Nature 440 (7083): 419.
NGSS Lead States. 2013. Next Generation Science Standards: For states, by states. Washington, DC: National Academies Press. .
Weintrop D., et al. 2016. Defining computational thinking for mathematics and science classrooms. Journal of Science Education and Technology 25: 127–147. doi: 10.1007/s10956-015-9581-5.
Wing J.M. 2006. Computational thinking. Communications of the ACM 49 (3): 33–35. doi: 10.1145/1118178.111821.