Making Large-Scale Science Assessment Meaningful

Next Gen Navigator

Making Large-Scale Science Assessment Meaningful

By Sandy Student

Posted on 2022-05-26

Disclaimer: The views expressed in this blog post are those of the author(s) and do not necessarily reflect the official position of the National Science Teaching Association (NSTA).

A key point in Call to Action is that there is no place for one-size-fits-all science instruction in equitable science education. Recent work on equity in assessment has drawn attention to the fact that American K–12 test-writing can often reflect the dominant cultural perspective by including, for example, test items that are provided without context (Mislevy 2018; Randall 2021). New science assessments should instead be constructed to reflect the context that makes science meaningful, engaging, and designed for the many ways that students engage with scientific phenomena. Assessment often drives instruction, and re-conceptualizing large-scale science assessment is a prerequisite to any effort to use assessment and accountability to advance science learning for all students.

Frankly, we are still figuring out how to design large-scale science assessment systems that positively reinforce the classroom practices outlined in A Framework for K–12 Science Education. Accomplishing that will require us to focus on the following areas of research:

Assessment instruments that reflect and capture how science happens in diverse classrooms. Classroom-focused teaching, learning, and assessment projects help us understand how science assessments can better serve diverse learners. For example, the SAIL Project provides an example of high-quality, equity-focused science learning for emergent bilingual students. Students make sense of local scientific phenomena using their linguistic resources across multiple languages, with connections to traditional scientific language made late in tasks (as opposed to trying to “replace” students’ language initially). Recent large-scale assessment innovations—including a greater role for open-ended responses, rubrics and scoring guidance that focus on the use of evidence to support conclusions over “right" answers, and the use of multiple performance tasks as part of a through-year model—represent promising first steps in efforts to support and reinforce this kind of high-quality local instruction and assessment through larger-scale assessment efforts by expanding the ability to capture diverse sensemaking repertoires in ways that honor linguistic and cultural range. States must prioritize building upon these innovative approaches further because only large-scale instruments that reinforce classroom equity and value rich sensemaking using a variety of cultural and linguistic assets can help states move toward accountability systems that support the learning of all students.
Assessment and accountability systems that incentivize better teaching and learning for all. Campbell’s law—that indicators used for decision-making are inevitably distorted by that very use—gives us reason to pause before diving wholesale into a new generation of assessment. While high-stakes accountability has generally had a negative impact on the classroom, there are select accountability policies that, in a meaningful minority of cases, have led to positive classroom practices, including student-centered instruction and curricular improvements (Au 2007). What does this mean for science assessment and accountability systems? States might consider the following:

Incorporating a variety of assessment task formats across multiple assessment occasions to prevent any one type from becoming overemphasized in instruction. This would hopefully lessen the impact of Campbell’s law.
Using open-ended, phenomenon-based performance assessment tasks that reflect a deeper conception of science. These reinforce a classroom focus on the kinds of tasks that reflect the cognitive complexity and integration of practices with content described by the Framework (Tekkumru-Kisa et al. 2015).
Allowing for some local flexibility in accountability assessments. Tasks that reflect local phenomena and cultural context are likely to be genuinely engaging for more students (Fine and Furtak 2020), and flexibility in assessment tasks appears to be the only way to achieve this on the large scale.

The vision for science in Call to Action portends a bright future for all students’ science learning. The research and subsequent innovations in large-scale science assessment required to ensure that assessment and accountability systems support this vision will be vital to achieving success.

References

Au, W. 2007. High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher 36 (5): 258–267. https://doi.org/10.3102/0013189X07306523.

Fine, C. G. McC., and E. M. Furtak. 2020. A framework for science classroom assessment task design for emergent bilingual learners. Science Education 104 (3): 393–420. https://doi.org/10.1002/sce.21565.

Llosa, L. 2021. Expanding the evidence of learning to promote equity through formative classroom assessment. National Council on Measurement in Education Classroom Assessment Conference; conference held remotely.

Mislevy, R. J. 2018. Sociocognitive foundations of educational measurement. London, UK: Routledge, Taylor & Francis Group.

National Research Council. 2012. A framework for K–12 science education: Practices, crosscutting concepts, and core ideas. Washington, DC: National Academies Press. https://doi.org/10.17226/13165.

Noble, T., C. Suarez, A. Rosebery, M. C. O’Connor, B. Warren, and J. Hudicourt-Barnes. 2012. “I never thought of it as freezing”: How students answer questions on large-scale science tests and what they know about science. Journal of Research in Science Teaching 49 (6): 778–803. https://doi.org/10.1002/tea.21026.

Randall, J. 2021. “Color‐neutral” is not a thing: Redefining construct definition and representation through a justice‐oriented critical antiracist lens. Educational Measurement: Issues and Practice, emip.12429. https://doi.org/10.1111/emip.12429.

Solano-Flores, G., and S. Nelson-Barber. 2001. On the cultural validity of science assessments. Journal of Research in Science Teaching 38 (5): 553–573. https://doi.org/10.1002/tea.1018.

Tekkumru-Kisa, M., M. K. Stein, and C. Schunn. 2015. A framework for analyzing cognitive demand and content-practices integration: Task analysis guide in science. Journal of Research in Science Teaching 52 (5): 659–685. https://doi.org/10.1002/tea.21208.

Sandy Student

Sandy Student is a doctoral candidate in Research and Evaluation Methodology at the University of Colorado Boulder School of Education and a part-time research associate with Lyons Assessment Consulting. His interests and research center on connecting technical issues in large- and small-scale educational assessment with their practical implications for students and teachers.

Note: This article is featured in the May 2022 issue of Next Gen Navigator, an e-newsletter from NSTA delivering information, insights, resources, and professional learning opportunities for science educators by science educators focusing on the themes highlighted in Call to Action for Science Education and on the Next Generation Science Standards and three-dimensional instruction. Click here to sign up to receive the Navigator.

The mission of NSTA is to transform science education to benefit all through professional learning, partnerships, and advocacy.

Assessment NGSS

Making Large-Scale Science Assessment Meaningful

You may also like