research and teaching
Journal of College Science Teaching—November/December 2020 (Volume 50, Issue 2)
By Lauren Shea, Chantale Bégin, Christopher Osovitz, and Luanna Prevost
Lectures have traditionally been the focus of higher education courses, particularly in STEM courses (Freeman et al., 2014; Haak et al., 2011). However, in recent years, there has been a push for active-learning approaches that shift the traditional lecture pedagogy to improve student success and the retention of students in STEM programs (Freeman et al., 2014; Hake, 1998; Heyborne & Perrett, 2016; Mayer et al., 2009). Lectures traditionally emphasize content and aim to maximize the amount of material that is covered, instead of focusing on the learning process and basic inquiry, which is the objective of scientific research (Armbruster et al., 2009). On the other hand, active-learning pedagogies include any method of instruction that engages students in the classroom (Bonwell & Eison, 1991; Prince, 2004); this may involve the use of classroom response systems (i.e., “clickers”). Active-learning methods have been shown to improve student success and problem-solving skills compared to classrooms with no student interaction (Freeman et al., 2011; Hake, 1998; Heyborne & Perrett, 2016; Marbach-Ad et al., 2016; Haag, 2016; Mayer et al., 2009; Smith et al., 2011; Wittrock, 1989).
Clickers have been widely promoted as an active-learning pedagogy. This technology uses a learner-centered approach, allows for an immediate question-answer format, and in-class discussion among students and the instructor (Blasco-Arcas et al., 2013; Marin, 2013; Mayer et al., 2009). By incorporating clicker questions into the lecture, students must come to class prepared and actively engage in the lesson to answer questions. Additionally, clickers allow instructors to evaluate the comprehension of students throughout the lesson and spend more time on material that students may struggle with (Caldwell, 2007; Gok, 2011; Yourstone et al., 2008).
However, there is contradictory evidence on how students respond to the use of clickers in the classroom. While several studies have found that using clickers in class is generally viewed positively by students (Caldwell, 2007; Eschenbach, et al., 2013; Gok, 2011; Liu et al., 2016; Roush & Song, 2013; Stevens et al., 2017), others have found that students hold negative views (Eddy et al., 2015; Machemer & Crawford, 2007; Seidel & Tanner, 2013; Welsh, 2012). In fact, students experiencing the same clicker intervention may have markedly mixed views. In a study of over 400 science majors, Welsh (2012) found that students had strongly positive and negative perceptions about clicker use, ranging from its effectiveness in engaging the class and revealing misunderstandings to encouraging off-topic student discussions and copying responses of other students.
Clicker-integrated instruction has been gaining popularity, but its effectiveness and best-use practices are still being explored (Chien et al., 2016; Hunsu et al., 2016). Clickers are often used to engage students in large classrooms (e.g., more than 200 students), since many other forms of active learning can be difficult to manage with such large groups. Yet, it is not clear which type of clicker use (types of questions, question order, how many questions during each class) optimizes student success.
Clicker question type and frequency, in particular, may give instructors the ability to shift student focus from recall to reasoning (DeBourgh, 2008). Additionally, the format used for each question (i.e., multiple choice or true/false) and the cognitive level of thinking required for each question could influence student learning (Hodges et al., 2017; Hubbard & Hubbard, 2016). Bloom’s Taxonomy is a common way for educators to rank questions and assess whether a question requires low-order cognition (recall and understanding) or high-order cognition (application, analysis, evaluation, and synthesis) (Anderson & Krathwohl, 2001; Prevost & Lemons, 2016).
In this study, we contrast two different strategies for clicker use in a large-enrollment introductory biology course. We compared learning gains and student success between a high-clicker frequency and a low-clicker frequency section of an introductory biology course taught by the same instructor. We also considered if clicker frequency and Bloom’s ranking impacted student performance on exams with more high- or low-order questions, and tested whether clicker frequency influenced student interest and their perceived learning.
This study took place during the spring 2017 semester at a large university (more than 40,000 students) in southwestern Florida. Two different sections of BSC 2011 Biological Diversity were taught by the same instructor, one with high-clicker frequency (14–25 questions per class) and the other with low-clicker frequency (4–6 questions per class). BSC 2011 is a course required for majors in biology, cellular and molecular microbiology, and most health sciences, but has no university-level prerequisite and can be taken by students outside of these majors. The study was done in accordance with the Institutional Review Board for human subject research. At the beginning of the semester, all students were given the option to participate in this study and notified that their gradebook data would be used anonymously if consent was given.
Both sections were offered two days a week (Tuesdays and Thursdays) in large lecture halls for 75-minute sessions. The high-clicker frequency section had 320 students (252 included in the study) and each class began at 9:30 am. The low-clicker frequency section had 294 students (232 included in the study) and began at 11:00 am. The demographics of students in the two sections were similar (Tables 1 and 2). Both sections were highly structured inside and outside of the classroom, and required the same online homework, textbook readings, and exams. Student grades were based on four unit exams and a final exam (75% of grade), Mastering Biology (Pearson) online homework (one pre- [5%] and postclass assignment [10%] for each chapter), clickers (10%), and limited extra credit (up to 1% of grade).
Student demographics for the high-clicker frequency section. | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Student demographics for the low-clicker frequency section. | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Unit exams were based solely on material covered during that unit, and the final exam was comprehensive. The course units were organized as the following: Principles of Evolution (Unit 1), Diversity of Prokaryotes, Protists and Plants (Unit 2), Diversity of Fungi and Animals (Unit 3), and Principles of Ecology (Unit 4). Each unit exam had 40 multiple-choice questions, and the final exam had 60 multiple-choice questions. Each exam question was ranked on the Bloom’s scale (from 1–6) by two researchers who were familiar with the course structure and had access to course content. The Bloom’s ranking refers to the following: 1: Recall, 2: Understand, 3: Application, 4: Analyze, 5: Evaluate, 6: Create. The ranking method was adapted from two other studies on biology education (Momsen et al., 2010; Prevost & Lemons, 2016). If questions were ranked differently, the question was discussed until a consensus score was reached.
The high-clicker frequency section had an average of 17.27 ± 3.64 (mean ± standard deviation) clicker questions and the low-clicker frequency section had 4.36 ± 0.95 (mean ± standard deviation) clicker questions per lecture. These questions were ranked with the Bloom’s scale using the same methods that were used for the exam questions. The clicker questions in the high-clicker frequency section included questions both on material already covered in class as well as new material that students had only seen in readings and online assignments, while the low-clicker frequency section only had clicker questions on material already seen in class. Students were required to bring their clicker remote to every class and answer each question that was presented. Each question typically remained on the screen for 45 seconds, with more complicated questions given 15–30 extra seconds. During this time students were encouraged to discuss the question with their neighbors. Once the allotted time ended, the results were presented in a histogram onscreen, followed by the instructor announcing the correct answer. The breakdown of correct and incorrect answers determined how much time the instructor spent discussing the question and answer and reviewing the related material, with more time spent when fewer students chose the correct answer. Students were graded on their performance on clicker questions: incorrectly answered questions were worth one point; correctly answered questions worth two points. Students also received five additional points per class if they were present for at least 75% of the clicker questions for that day.
On the first day of the semester, students in both sections took a 20-question preassessment (not counted toward final grade). The preassessment included five questions from each of the four units that would be covered throughout the course and the same 20 questions recurred on the final exam (postassessment). The assessment questions were ranked using Bloom’s Taxonomy to determine which level of thinking each question required. Bloom’s ranking for each question was determined by the instructor and two other evaluators and any discrepancy spurred a discussion until agreement was reached. The preassessment was not returned to students after grading and students were not notified that the questions would be included on the final exam. A total learning gain, also known as Hake’s <g>, which is an index of improvement, was calculated for each student with the following equation: Learning gain = (postassessment score - preassessment score)/(100 - preassessment score)
Students were asked to complete an online evaluation at the end of the semester that included the following questions: (1) How well did lectures in this class stimulate your interest in the course, and (2) How well do you think lectures in this class helped you understand the material? Their answers were provided on a scale between 1–5 (i.e., poor, fair, average, good, excellent).
The primary measure of student success in this study was the learning gain. Linear mixed-effect models were constructed in R using the lmer and lmerTest functions in the lme4 package. Student major, gender, and level were all included as random effects, while the clicker frequency was a fixed effect. Linear models were also constructed for learning gains with gender, major, level, and clicker frequency as fixed effects to test for interactions between student demographics and learning gain. Declared majors were categorized into six groups: biology, cell biology, medical science, other science, nonscience, and nondegree seeking. Student level ranged from first year to fifth year. Additional analyses were carried out with unit exam scores to determine if the material type (and cognitive level of material presented) influenced student exam score. Linear mixed-effect models were constructed as previously described for learning gain except that one more factor (material type: application or recall) was included. Akaike’s Information Criterion was used to determine the best fit model for the dataset (the lower the AIC value, the better the fit). A linear model was run on the survey results to see if the level of interaction, gender, major, and/or level influenced their ranking.
Unit 1 and 4 exams (Evolution and Ecology) incorporated a greater amount of conceptual material and higher-order thinking than unit 2 and 3 exams (Diversity) and had a greater number of questions at higher Bloom’s rankings (Figure 1). There was a slightly greater proportion of Bloom’s ranking 3 and 4 questions in the low-clicker section (0.48, Figure 2) than the high-clicker section (0.41). However, questions of all levels were used more frequently in the high-clicker section (Figure 3).
The proportion of different Bloom’s rankings on each exam, with Bloom’s ranking represented by different colors. Only levels 1–4 on Bloom’s Taxonomy were included since there were no questions ranked higher.
The proportion of clicker questions at different Bloom’s rankings in each course section, with Bloom’s ranking represented by different colors. Only levels 1–4 on Bloom’s Taxonomy were included since there were no questions ranked higher.
The frequency of clicker questions ranked at different Bloom’s rankings, with Bloom’s ranking represented by different colors. Only levels 1–4 on Bloom’s Taxonomy were included since there were no questions ranked higher.
The average learning gain was 15.2% higher in the high-clicker frequency section (0.598 ± 0.018, mean ± standard error) than in the low-clicker frequency section (0.519 ± 0.017; mixed effects models, p < 0.01; Figure 4). There were no significant interactions between demographic data (student level, major, and gender) and average learning gain.
Mean learning gain for both sections of Introductory Biology (high- and low-clicker frequency) (<i>p</i> < 0.01). Error bars represent the standard error of the mean.
Linear mixed effect models showed that exam scores were affected by material type, with higher scores on exams that had questions with a higher average Bloom’s ranking (exams 1 and 4, referred to henceforth as conceptual) compared to a lower average Bloom’s ranking (exams 2 and 3, referred to henceforth as memorization) (p < 0.01; Figure 5). There was a significant interaction between clicker frequency and material type (p < 0.05). On conceptual exams, scores were higher for the high-clicker frequency than the low-clicker frequency, while there was no difference in score on memorization units.
Mean exam scores according to clicker frequency and exam material type (conceptual versus memorization, <i>p</i> < 0.01 and <i>p</i> < 0.05 for interaction, error bars represent standard error of the mean).
Students in the low-clicker frequency section gave higher average student ratings (4.00 ± 0.063) for how the course stimulated their interest in comparison to the high-clicker frequency section (3.84 ± 0.067) (p < 0.01; Cohen’s d = 0.16; Figure 6). However, both sections received comparable scores on how the lecture helped with overall course comprehension (high-clicker frequency = 4.046 ± 0.066; low-clicker frequency = 4.068 ± 0.063).
Mean postsemester course evaluations completed by students in both sections (error bars represent standard error of the mean). Question 1 (Q1): How well did lectures in this class stimulate your interest in the course (linear model, <i>p</i> < 0.01; Cohen
In this study, we found that the way in which clickers are used in the classroom can significantly impact student learning gains. Clickers were used in both sections of BSC 2011 we studied, but the high-clicker frequency section had 4–6 times more clicker questions than the low-clicker frequency section for any given day. We found that the high-clicker frequency section had a 15.2% higher overall learning gain than the low-clicker frequency section. There may be several, nonmutually exclusive explanations for this finding. First, using a high number of clicker questions during class increases the overall amount of testing (here in a low-stakes, formative context) and retrieval, which has been shown in many contexts to increase student learning (Brame & Biel, 2015; Hernick, 2015; Karpicke & Roediger III, 2008). Second, the greater learning gains could be linked to the quick feedback time that students had, on many occasions during each class, between clicker question and answer. Indeed, rapid feedback has been linked to greater learning (Caldwell, 2007; Hernick, 2015; Liu et al., 2016; Yourstone et al., 2008). Third, clicker questions can incorporate all cognitive levels in Bloom’s Taxonomy, which may improve critical thinking skills (Haak et al., 2011; Liu et al., 2016); a greater number of clicker questions in the high-clicker frequency section allowed for more questions per cognitive levels, including about three times more questions requiring application or analysis (Figure 3), which likely improved overall learning. Fourth, the greater number of clicker questions allowed more opportunities for peer discussion since students could discuss questions with each other during the allotted time for each question. This discussion time may aid in understanding and classroom engagement (Gauci et al., 2009; Liu et al., 2016; Smith et al., 2009), although this was not something that we measured in our study. Finally, it could be that students in the high-clicker frequency section came to class better prepared (with textbook readings and online assignments), since many clicker questions in that section were asked before that material had been covered in class by the instructor.
Four noncomprehensive unit exams were administered to both sections and allowed us to examine the possible interaction between material type and clicker frequency. Exams 2 and 3 (on diversity of prokaryotes, protists, plants, fungi, and animals) required mostly recall of the material, with most questions only receiving a Bloom’s ranking of one or two. On the other hand, exams 1 and 4 (on evolution and ecology) had a higher proportion of questions ranked with a Bloom’s ranking of two or three. Because active-learning classroom designs have been suggested to be especially effective in improving problem-solving skills and higher cognition (Gauci et al., 2009; Liu et al., 2016), we predicted that any positive impact of high-clicker frequency should be especially important for the more conceptual unit exams (1 and 4) because they included a greater number of higher-ranked questions on the Bloom’s scale. When analyzing the effect of clicker frequency on unit exam grades, we found that both sections performed better on conceptual units (1 and 4) than on memorization units (2 and 3). We also found that the high-clicker section outperformed the low-clicker section in a more pronounced way on conceptual exams than on memorization exams. The positive effect of clicker questions was indeed more substantial for material that requires thinking at higher levels of Bloom’s Taxonomy.
Interestingly, while student learning was higher in the high-clicker frequency section, students expressed greater interest in class material in the low-clicker frequency section. This difference was small (3.98%) but significant. Student response to clickers and other active learning approaches have been shown to vary widely. Several reasons may explain why students in our study preferred the low-clicker frequency section. First, the low-clicker frequency section required less in-class work for students; as a result, the class may have been perceived as less challenging and stressful for students, and therefore more pleasant. It is difficult to say whether this suggested phenomenon is widespread. While several studies have found significant relationships between class difficulty and student evaluations of class, these studies have focused on final grades as the measure of difficulty level, not the amount of work required of students during class periods (Brockx et al., 2011; Centra, 2003; Stroebe, 2016). However, at least two studies reported anxiety or lower class enjoyment associated with active learning in STEM courses (England et al., 2017; Deslauriers et al., 2019). Second, when there were fewer clicker questions, there was more time in class to share stories about interesting organisms, provide several examples of a given concept, and show videos related to the material. It is likely that students enjoy these stories, examples, and videos, and that they helped boost the student evaluations of how lectures stimulated their interest in the material. Third, students may prefer lecture-style classrooms with low-clicker frequency because they are accustomed to and comfortable with this learning style (Trees & Jackson, 2007). Further research would be needed to discriminate between these different scenarios. It is important to note that our study was based on just two sections of one course, taught by the same instructor. Preliminary results from a related study (Bégin and Osovitz, 2014) suggest that there may be an instructor effect in these preferences, such that for classes taught by other instructors, student evaluations may be more positive in high-clicker frequency sections than in low-clicker frequency sections. While our study shows that a greater use of clickers can lower student interest in class, it should be noted that this result may vary with course, instructor experience, student population, and students’ prior experience with clickers and other forms of active learning. Several studies have found that once students accept active learning techniques, are trained on how to engage, and/or instructors revise approaches based on student feedback to make them more effective, student motivation and reviews may rise (Seidel & Tanner, 2013; Welsh, 2012).
There was no significant difference in how students perceived that classes helped improve their learning. This is rather interesting, because our results show that students in the high-clicker frequency section in fact learned more than those in the low-clicker frequency section (as measured by learning gains). This suggests that many students, at least in this context, are not very successful at identifying conditions that increase their learning (i.e., their metacognition is limited). Many recent studies have identified weaknesses in metacognition as one element limiting success in introductory biology courses (Hoskins et al., 2017; Sebesta & Speth, 2017; Tanner, 2012) and our results support the idea that improving metacognitive skills could be critical to improving student success.
The disconnect between student learning (greater in the high-clicker frequency section) and student assessment of the course (greater enjoyment of low-clicker frequency section, and no difference in perceived learning between sections) that we found in this study is of concern for two reasons. First, because student evaluations are a critical part of faculty assessments and promotions, instructors may be reluctant to use pedagogies that reduce their scores on student evaluations, even if these pedagogies enhance student learning and success. It is increasingly clear that student evaluations of teaching are often not correlated with student learning (Uttl et al., 2017; Deslauriers et al., 2019). Therefore, by emphasizing student evaluations of teaching in faculty assessments and promotions, universities may in fact limit the use of pedagogies that are effective and they might thereby limit student success. Second, if higher levels of in-class questioning and problem-solving (e.g., high-clicker frequency) leads to greater anxiety and lower enjoyment of class, students are more likely to leave a major where high levels of clicker interaction (or other comparable active-learning activities) are common (England et al., 2017). Improving student success and stimulating their interest are both important, especially in STEM courses where failure rates can be high and retention rates low (Freeman et al., 2011). Additional research needs to be carried out to create and implement strategies that make active-learning courses more appealing so that students do well and enjoy the courses.
Our study shows greater learning gains when a large-enrollment, introductory-level biology course is taught using a high frequency of clickers in the classroom compared to a low frequency of clickers, in particular for material with a higher-order cognitive level. However, students in the high-clicker section did not enjoy the class as much as those in the low-clicker section and did not recognize the learning value of a higher frequency of clicker questions. It would be worthwhile to repeat this experiment with other courses and other instructors to test how results vary with teaching style and material type. However, this study adds to a growing body of work that shows that clicker-integrated courses are beneficial to student success and suggests that varying clicker frequencies can impact how much students benefit from (or even prefer) the course.