RESEARCH AND TEACHING
It has been well established that having students work in cooperative groups in active engagement classes is more effective than traditional lectures (Hake, 1998) and is also preferred by the students themselves (Andre, 1999). A common practice employed in active learning classrooms is to assign students to academically heterogeneous groups (Johnson, 2014; Andre, 1999; Hake, 1998). The thought behind this is that the stronger students will help the weaker ones, the embodiment of peer learning.
After more than 10 years of teaching active learning classes, usually with highly heterogeneous groups, I became skeptical about how much the weaker students were benefitting from being in a group with significantly stronger students. Based on daily interactions with the groups, it seemed that the weaker students had a high tendency to become passive participants.
The natural inclination is to assume that such students are not engaged because they are unmotivated or disinterested. Another possible explanation though is that they are passive because they feel selfconscious about not understanding something that their group members do and do not want to hold their group back.
When examining the literature for comparisons of homogeneous and heterogeneous groups in active learning, there is surprisingly little published work, particularly for collegelevel science classes.
A 2011 study found that grouping students homogeneously rather than heterogeneously in the lab component of a traditional college biology class yielded greater gains in scientific reasoning (Jensen, 2011).
Similarly, a 1985 study found that in a traditional general physics course for elementary education majors, grouping students homogeneously during lab yielded modestly better gains in scientific achievement and general reasoning compared to heterogeneous grouping (Lawrenz, 1985).
A 2016 study at the University of Toronto looked at the impact of group heterogeneity during lab in a traditional lecturestyle college physics course (Harlow, 2016), finding no statistically significant impact on academic performance.
All of these studies focused on how students were grouped during the lab portion of a traditional lecture course. In such classes, the amount of time spent working in a group is significantly less than in an active learning science classroom. It is likely that would reduce any potential impact from group structure.
Due to the limitations of prior research on the subject, I began a study in the fall of 2017 to compare academically homogeneous and heterogeneous grouping on the level of interaction among students, student comfort level, and any impact on their academic performance.
The course was a secondsemester, calculusbased physics course for engineering and physical science majors covering fluid mechanics, waves, electricity, and magnetism (with most of the semester dedicated to electricity and magnetism). The workbook used was developed inhouse, containing a mix of conceptual and complex problemsolving activities interspersed with experiments.
The class met three days a week for two hours, beginning with a brief 5–20 minute lecture in a large lecture hall. Then, the 83 students split up between two adjacent 45seat rooms to work in groups of three for the remainder of the class (with one group of two due to 83 not being divisible by three). Having the class divided between two neighboring rooms presented some challenges pedagogically, but it was helpful for the study. It allowed one room to be used for grouping students heterogeneously and the other for homogeneous grouping.
The rooms each had five round tables that seat nine students. Most tables had three groups of three students, with one table in each room only having two groups. Each room had one graduate Teaching Assistant (TA) with at least two years of experience (the TA in the heterogeneous room had three years). Each room also had two undergraduate Learning Assistants (LAs), one with two years of experience and one with one year.
I went back and forth between the two rooms, tracking my time to avoid spending more time in one room than the other. The assistants and I met after class each day to review how the class went, to prepare for the next class, and to discuss best practices for an active learning classroom.
Various approaches have been developed for trying to improve group work, such as Aronson’s jigsaw method (Aronson, 1997), Slavin’s Student Teams and Academic Divisions (Slavin, 1978), and “collaborative learning” (Johnson, 1994; Schulte, 1999). Some of these can require considerable group management from the teacher. Collaborative learning generally requires assigning interrelated and complementary roles and tasks to each student in a group, providing teambuilding activities, and group processing (Johnson, 1990) in which students reflect on their group work and discuss how to be more effective in it (Springer, 1999).
These approaches certainly have merit. But my goal was to learn more about what contributes to students not interacting in their groups, particularly the weaker students. To better see how much of an impact group heterogeneity has on how willingly students interact, I chose not to use any group management strategy that involved assigning specific tasks to individual students, as that might obscure the impact of group structure.
Half of the students were assigned to academically heterogeneous groups in one room, and the other half to academically homogeneous groups in the other room. Students were identified as low, middle, or high performers using their grades in prerequisite courses (calculus and firstsemester physics). The breakdown was roughly prerequisite GPAs above 3.4, between 2.5 and 3.4, and below 2.5 for identifying students as high, middle, or lowperforming. Groups had three students, with the heterogeneous groups having one student from each level (high, medium, low), while homogeneous groups had all three students at the same level. Some students close to the border of the GPA brackets were moved as necessary to help with breaking students into groups of three.
Whether students were put in homogeneous or heterogeneous groups was determined primarily alphabetically (the first half of the class was heterogeneous, the second homogeneous). Some tweaking had to be done to make sure the heterogeneous classroom had full heterogeneous sets, and likewise for the homogeneous room. Scores on the math portion of the SAT were not available for some students due to some taking the ACT rather than the SAT, but the data from the students who did take it are shown in Table 1, as well as the gender and ethnic breakdown of the two rooms. The gender and ethnic breakdown is for every student in each room, not just the ones for whom SAT scores are available. The students were all traditional sophomores or juniors, except for five nontraditional students who had all been in the Navy for a few years prior to beginning college. Those students were divided up with two in the heterogeneous room and three in the homogeneous room.
Table 1. SAT math scores and gender and ethnicity breakdown for the two rooms.  


The two groups were determined to be statistically similar in prior math achievement based on an ANOVA singlefactor analysis, yielding an F value of 0.486, which was substantially lower than the F critical value of 3.984.
Students assigned to homogeneous groups remained in homogeneous groups all semester; students assigned to heterogeneous groups stayed in their groups as well. However, halfway through the course the groups were shuffled so that students had different partners.
In the room with the homogeneous groups, while each group was homogeneous, each table was not. A lowperforming group could be positioned next to a mediumperforming group, for example. Students were unaware of any methodology behind the way groups were formed.
A few methods were used to encourage students to collaborate within their groups. First, the grading mechanism held students accountable for other members of their group. Each student would submit their own work for the daily activities, but the TA would only grade one submission for each group. All of the students in the group would get that grade. Students were told that this grading system was being used to encourage them to work together and help each other out.
Students were also advised to view “group work” itself as a skill they need to build for their future careers. A few times throughout the semester, we had discussions about how students should be working together as a group. Students were asked to discuss how much they agree with the following statements:
In addition to pre and posttesting, students completed an online freeresponse survey during the semester to provide feedback on how their group was functioning.
The original intent was to continue this study for three years to gather a substantial amount of data from pre and posttests. Recording of student interactions to have a quantitative measure of collaboration level would begin in the second year of the study.
However, as will be explained, the study was terminated halfway through the second semester as it had become sufficiently apparent that placing students in academically heterogeneous groups was negatively impacting both their learning and their overall experience to the point that it seemed unethical to continue. At that point, all students were shifted to academically homogeneous groups.
This paper will present the data from the first semester and illustrate the reasons for terminating the data collection earlier than planned.
To assess the impact of grouping on student learning, students were asked to take the Conceptual Survey of Electricity and Magnetism (CSEM) as a pretest and posttest online through LASSO (Maloney, 2001). Due to some students not taking both tests, incomplete submissions, and some students not consenting to their data being used, complete data are available for about half of the class. These data are shown in Table 2.
Table 2. Scores on the CSEM for students in the heterogeneous and homogeneous groups. The errors given are the standard deviations.  


A Cohen effect size of 0.324 was calculated by finding the difference between the two normalized gains and dividing it by the average standard deviation. The convention established by Cohen (1998) is that an effect size greater than 0.2, but less than 0.5 is generally viewed as a small statistical difference (a large effect is defined as one with an effect size greater than 0.8). With the small number of data points, an ANOVA singlefactor analysis yielded a pvalue of 0.269, so we cannot reject the null hypothesis based on this small amount of data.
The motivation for aborting the study came from digging deeper into the data, as well as feedback from students on surveys and direct observation of students.
When looking at the results on the CSEM broken down by which category each student was in, a more important difference starts emerging. These data are shown in Table 3. Note that most of the students had little understanding beforehand of the content (electricity and magnetism), which shows up in the similar pretest scores (there is not much separation between high and lowperforming students).
Table 3. CSEM results broken down by which category the responding students were in: high, middle, or low performers.  


The calculated Cohen effect sizes for the high, medium, and lowperforming groups are 0.138, 0.433, and 0.885, respectively, and pvalues of 0.763, 0.412, and 0.264. Looking at the low and mediumperforming students together, the pvalue drops to 0.17. While the low number of participants keeps the result from being statistically significant, the reason for these results started to become clear when looking at the feedback from students on the surveys.
It is worth noting that in both grouping styles the “low” students were less likely to complete the noncompulsory CSEM and to consent to their data being used. Additionally, the higher portion of highperforming students responding in the heterogeneous room likely reduced the effect size that shows up when looking at the average performance for the two grouping styles shown in Table 2.
In the heterogeneous room, the students labeled as high performers had substantially greater gains than the other students in that room. Our observations and their feedback indicated that in most groups the highperforming student was likely doing most of the thinking for the group. The other students had a strong tendency to defer to that student to let him or her figure things out, and then explain it to them (or just copy down their work).
Most interactions between heterogeneous groups and myself or my assistants were dominated by the highperforming students in each group. The assistants were trained to try to specifically bring in the other students by directing questions at them—but the low and middleperforming students would still often just look to the highperforming student to answer for them.
In the homogeneously grouped room, the groups did not have a person who was the clear leader in terms of academic background. I sometimes had LAs change which room they were helping out in, and each time the LA would remark on how different the two rooms were. The homogeneously grouped room was significantly noisier because most of the students were actively discussing the activities. Interactions between a group and me or my assistants were not dominated by one person from each group.
This striking difference showed up when looking at the free response online feedback from the students. The comments from each student who responded were classified as positive, negative, or neutral (which includes no response) with regard to how they felt about their own group. The results of the midsemester group evaluation for the Physics II course are shown in Table 4.
Table 4. Categorizing of open comments from students on their satisfaction level with their group work roughly halfway through the semester.  


There were 14 groups in each room. In the heterogeneous room, only three of the groups had a positive comment from a student without also having a negative comment from another student in the same group. In the homogeneous room, that was true of 10 of the groups (the other four groups had only neutral comments regarding their own group).
While the data provide a simple quantitative comparison of the rooms, they do not explain the source of the discrepancy. Reading the actual student comments helps clarify why the homogeneous groups were working better.
For example, from one heterogeneous group, a lowperforming student wrote “I feel as if I am not smart enough for my group. Both of my groupmates are highly intelligent and I feel as if I slow them down. Therefore, I am afraid to ask them questions and feel less adept.”
From the same group, the highperforming student wrote “I work in a group of three and it seems as though its always just 2 of us working on the problems while the third partner does it on his own. We encourage him to listen in on our discussions of the problem and participate in it, yet it seems as though he still falls behind and does it on his own.”
A middleperforming student in another heterogeneous group observed, “I’m not sure groups of three work out very well. In each group I see two people take the lead and the third tries to catch up the whole time and seems to get it less than the others.”
A highperforming student in a heterogeneous group wrote “I would say that I definitely like the group work style of learning, the only thing I’d say is sometimes when someone in the group doesn’t understand something they might not speak up and they just put the answer down which doesn’t help them learn it.”
The middleperforming student in the same group commented “With my group, one of us usually completely understands the assignment but both of us, we don’t fully understand everything. I do end up understanding everything after a few questions have past but I feel like I’m a bother to the group mate who does understand everything & I don’t help out as much as I would like to.”
The lowperforming student in that group, who was very quiet during activity time, did not provide any comments.
By contrast, negative comments such as those just mentioned did not show up for the homogeneously grouped students. The only negative comment in that room ultimately seems to be a consequence of the group not being as homogeneous as intended, as a student in the group wrote “One of my group mates tends to speed ahead and is almost always a few questions ahead. So I always feel like I’m behind and constantly playing catch up.” This was a highperforming group, and the three students in that group all did very well. But two of them preferred to take their time and make sure they really understood things before proceeding, while the third wanted to move ahead.
Feedback such as this from a student in a different highperforming group is typical for the homogeneous room, “So far I feel that my group has worked pretty well. Typically when we are stuck on a problem, one of us is able to grasp the concept and explains and helps the others through it. We also have been working at a reasonable pace and handing in activities on time which is good.”
A student in a lowperforming group wrote “My group is working well. We are pretty slow which works for me and helps me understand the material more clearly.” Note that by knowing which groups were the ones more likely to have trouble, my assistants and I were able to check in on them more frequently.
Ten of the students in the homogeneous room, compared to only four in the heterogeneous room, specifically mentioned in their feedback that they are comfortable speaking up to ask for help from their group members when they do not understand something. For example, from a highperforming homogeneous group, “They often assume that I’m fine and know what I’m doing, but I don’t hesitate to say that I’m stuck.”
A lowperforming student who added the class the day the semester started was initially misplaced in a homogeneous mediumperforming group due to me not having his grades in prerequisite classes when he initially joined the class. He was shifted to a lowperforming group within a couple of weeks, and his response is very telling: “I am satisfied with my current allocation since my previous group was always far ahead of me, this one, I am satisfied that it’s making me work more. As opposed to copying.”
There seems to be a degree of comfort that comes from not being the only person in a group who does not understand something, which makes it easier for students to speak up and seek assistance. The students in the homogeneous lowperforming groups were more active participants within their groups than corresponding students in heterogeneous groups, and were also more comfortable asking for guidance from me and my assistants, as well as students in neighboring groups. For example, a student in a homogeneous lowperforming group wrote, “My table is great. We all get along really well while working through the problems. I am never afraid to seek help from my partners or the instructors.”
Altogether, the feedback points to academically heterogeneous groups often putting the weaker students in a position where they are more likely to become passive observers rather than active participants. Based on student comments, the weaker students in heterogeneous groups often are concerned about holding their group back and in general are selfconscious about not understanding something as well as their group members.
The data from pretesting and posttesting suggest that the primary beneficiaries of homogeneous grouping would be the students on the lower end. However, due to only collecting data for one semester, the number of data points is too small to show a statistically significant advantage to one grouping style over the other.
Despite that, the direct observations of student interactions and feedback from students on surveys added enough compelling evidence that I did not feel ethically comfortable continuing to put half of the students in academically heterogeneous groups so I could gather more data.
A concern I have heard from other teachers with regard to homogeneous grouping is that the stronger groups will speed ahead of the other groups. That is a valid concern, and something that needs to be addressed.
In my classes I have dealt with this by having additional “challenge” activities for students to work on when they get ahead. These are either puzzlestyle problems or activities on topics beyond the scope of the course, and which are likely to be particularly interesting to the students (e.g., an activity introducing them to special relativity). All students are free to also work on those activities outside of class, but having such a suite of activities available for groups that get ahead has helped reduce that concern with homogeneous grouping.
There are some limitations in how broadly the results of this study can be interpreted. It is important to realize that in both rooms there were multiple welltrained assistants in addition to myself. This meant that when students needed help, it was available. If there had been significantly less help available, we may have had very different results. In fact, that is what was found in prior studies investigating the effect of group heterogeneity on learning while middle school students were engaged in computerbased instruction with no additional teacher support (Hooper & Hannafin, 1988; Hooper, 1992). Those studies found that when no additional support is given, weaker students do better in heterogeneous groups than they do in homogeneous groups.
How strong this effect is may also be heavily dependent on the type of class and student makeup. In a physics course intended for engineering and physical science majors, such as the one in this study, most of the students have strong academic backgrounds and are likely not accustomed to being seen as a “weaker student,” especially in a mathematically oriented class. This may make them particularly susceptible to becoming selfconscious when they are the only member of a group who does not understand something.
These results are consistent though with the results of Jensen and Lawson (2011) when looking at the effect of abilitylevel grouping in a biology class. This also matches the results from a small study looking at the effect of group heterogeneity on general skill training not specific to any domain (Beane, 1971).
It is possible that group management methods such as assigning roles may help mitigate this issue of weaker students being reluctant to ask questions in heterogeneous rather than homogeneous groups. Further research in that area would be enlightening.
Michael Briggs (msbriggs@unh.edu) is an instructional manager in the Department of Physics at the University of New Hampshire in Durham, New Hampshire.