PMC Articles

Higher trait working memory capacity may benefit standardized test performance under race-related stereotype threat

PMCID: PMC12695912

PMID: 40389703


Abstract

Stereotype threat (ST) occurs when individuals primed with negative stereotypes underperform relative to a control group. Activating ST increases anxiety and worries about being negatively perceived, also introducing mental distraction that negatively impacts performance. We consider racial/ethnic ST effects on standardized test performance (SDTP) on the verbal and quantitative reasoning sections of the Graduate Record Exam (GRE). Across two experiments, working memory capacity (WMC) is investigated as a mediator and/or moderator of ST for race/ethnicity (Experiment 1, final n = 447, 19% Black, 81% White, 59% female and for Experiment 2, n = 166, 41% Black, 59% White, 73% female). We find a lack of strong evidence for the classic ST effect of a Race × Condition interaction. However, we show evidence that for Black students, higher trait WMC moderates racial/ethnic ST such that higher WMC is associated with higher scores on standardized tests under conditions of race-related ST. Our findings suggest the importance of higher WMC for racial minority students in remaining mentally resilient and maintaining performance during ST. Future work should address diversity and inclusion concerns regarding research on ST effects for racial/ethnic minorities, include more work examining racial/ethnic ST based on replication issues and statistical power, as well as more examination of the importance of WMC for performance under racial/ethnic ST. Future work should also consider the roles of protective factors, such as mindfulness and self-regulation practices in the context of racial/ethnic ST as WMC and SDTP have been shown to generally improve through implementing these practices. Supplementary Information The online version contains supplementary material available at 10.3758/s13421-025-01723-y.


Full Text

For people who have stigmatized social or group identities, being primed with a negative stereotype has been shown to cause underperformance on standardized tests relative to a control group, an effect known as stereotype threat (ST; Steele & Aronson, 1995). As one of the most researched topics in psychology, the effect of ST on performance has been shown for ethnic/racial identity (Steele & Aronson, 1995), sex/gender (Regner et al., 2010; Schmader & Johns, 2003), socioeconomic status (Flores et al., 2018; Tine & Gotlieb, 2013), and age (Levy, 1996). Outside of the U.S. context, ST research has focused a great deal on gender threat—examining its impact on women’s performance in math or STEM domains (see Flore & Wicherts, 2014; also Huguet & Regner, 2007; Regner et al., 2010). However, if a negative stereotype exists regarding cognitive performance of certain racial/ethnic groups in other countries, then in theory, ST effects could be observed in those cases, too. In the U.S. context compared with outside the U.S., the impact racial/ethnic ST effects have on performance is complicated by differences in attributions made based on having a racialized minority status. While these are interesting questions, they remain largely outside the scope of the current work, as we will focus on investigating racial/ethnic ST effects in U.S. student samples.
Although ST has been investigated widely, we focus on the effect of ST on cognitive tasks that are important for racialized minority students’ test performance and achievement in the U.S. Using data from the National Assessment of Educational Progress (NAEP), Stanford’s Center for Education Policy Analysis suggests that gaps in achievement between minority students and White students have narrowed since the 1970 s (Reardon, 2015; also, National Center for Education Statistics, 2013). However, gaps remain in White and Black students’ standardized test performance (SDTP) for reading and math in elementary through high school (National Center for Education Statistics, 2017). In the U.S. educational system, standardized tests such as the Scholastic Assessment Test (SAT) and the Graduate Record Exam (GRE)1 are important for demonstrating preparation for admittance to higher education institutions. Now, many undergraduate and doctoral programs have made these tests optional, but they are still required in some places. These tests are administered to millions of students in and outside of the U.S. (ETS, 2018), and achievement gaps remain between White and Black students in the U.S. Previous work suggests that worries about being negatively stereotyped based on racial/ethnic group is enough to impact Black students’ performance on standardized tests and other academic assessments (see Steele, 1997; Steele & Aronson, 1995).
Although there are continuing debates about their utility, standardized tests still have the power to open and close doors for students, and understanding the influence(s) of the testing environment for students is essential. The pressure of performing well, coupled with other anxieties, can cause some students to “choke” and underperform. Ironically, those who are among those most capable of performing well—deemed the academic “vanguard” (Steele, 1997)—tend to underperform in high-pressure situations (cf. Beilock & Carr, 2005; Beilock & DeCaro, 2007).
These findings beg the question of what factors are most important in understanding differences in performance on assessments of cognitive ability. Suggested factors include situational or contextual influences (Massey & Owens, 2014; Mullainathan & Shafir, 2013; Steele & Aronson, 1995), practice or experience (Ericsson et al., 1993; Hambrick et al., 2016), one’s intelligence (Jensen, 1969; Kovacs & Conway, 2016; Spearman, 1904; also see Holden & Tanenbaum, 2023), working memory capacity (WMC; Baddeley & Hitch, 1974; Conway et al., 2003; Regner et al., 2010; Schmader & Johns, 2003), as well as personality and attitudinal factors (Aronson et al., 2002; Blackwell et al., 2007; Dweck & Leggett, 1988; also Duckworth & Yeager, 2015; Durlak et al., 2011; Yeager & Walton, 2011). Based on Steele and Aronson (1995), having to contend with others viewing you based on a stereotype is enough to negatively shift your performance; however, the phenomenological process of ST is quite complicated and involves many integrated cognitive, affective, and physiological processes (Schmader et al., 2008). In terms of the cognitive mechanisms involved in ST, previous research suggests the importance of cognitive control or our ability to focus our attention and mental resources on completing a task goal (see Spencer et al., 2016).
When handling the cognitive demands of ST, people can be impacted in their ability to focus and control their mental resources. When a negative stereotype is made salient for a certain group, those identified with that group may have concerns about performing in way that would “prove” the stereotype. Considering this, people are motivated to avoid “confirming” the stereotype (see Steele, 1997) which could involve putting forth extra effort. Work by Jamieson and Harkins (2007) showed that participants under ST had difficulty inhibiting automatic responses on the antisaccade task but were able to quickly correct their responses. These findings suggest that although the antisaccade task was challenging under ST, participants were not overloaded cognitively in that they were still able to update their responses from incorrect to correct.
Additional work has suggested that when we are faced with challenging cognitive tasks and experiencing ST, we are contending with the stress and worries regarding ST as well as regulating ourselves to focus our attention and mental resources on completing the task at hand (see Beilock et al., 2007; Jamieson & Harkins, 2007; Regner et al., 2010; Schmader et al., 2008; Schmader & Johns, 2003; Steele, 1997). This means that ST can introduce an additional form of mental load on top of the challenges associated with solving difficult problems alone. Overall, much of previous research examining the cognitive mechanisms involved in ST suggests that cognitive control abilities, like the capacity of working memory, are of great importance. Activating ST may cause people to have difficulty with inhibiting incorrect responses, and especially so when the task is very challenging. This means the impact of differences in cognitive control and capacity of working memory on SDTP in the context of ST is important to understand. Next, we will consider the consequences of ST in terms of SDTP, then we will discuss the role of cognitive control through differences in working memory.
A key component of ST theory is about the consequences of being the target of stereotyping, including effects on assessments of cognitive ability, like standardized tests. Although the emphasis of standardized tests for placement and acceptance into academic programs has been criticized and has decreased some over time (Ramirez, 2008; Serrano, 2015), generally, such tests are still utilized and often weighted heavily in admissions decisions (Lauryn, 2017; National Association for College Admission Counseling, 2016). For these reasons, additional research is needed to clarify influences during high-stakes testing for targets of stereotyping and ST. Because our focus is on the cognitive performance responsible for differences in standardized test performance, we discuss two perspectives from the literature that involve the ways ST impacts cognitive resources through WMC.
One perspective is that experiencing threats to group identity causes a reduction in the ability to focus on the task at hand. Working memory is an established construct in the literature that taps our ability to focus attention on a task, while simultaneously storing, retrieving, and updating other key information (Baddeley, 1996, 2000; Baddeley & Hitch, 1974). Working memory (WM) is thought to have a capacity component (called working memory capacity, or WMC) that varies across individuals and constrains the parameters by which people activate and utilize the cognitive resources at their disposal (Cowan, 1988; Daneman & Carpenter, 1980; Turner & Engle, 1989). When considering the role of cognitive resources during identity-threatening situations, Schmader and Johns (2003) found that ST causes a depletion in the form of lower WMC. Further, they characterized this finding as a decrease in the ability to regulate one’s behavior in a goal-oriented way (also see Schmader et al., 2008). Others have suggested a state of mental depletion spills over and disrupts later task performance (Inzlicht & Schmeichel, 2012). For example, in several studies, under ST for either race or gender, WMC was shown to decrease compared with a control condition, and WMC mediated ST for gender on SDTP in math (Schmader & Johns, 2003). Thus, ST for gender was at least partially explained by differences in WMC.
Another perspective is that threats to group identity are moderated by the cognitive resources available at baseline. This view does not posit that threat is negated altogether; instead, it suggests when differences in trait cognitive resources are higher, they help combat identity-threatening situations. When considering baseline WMC during gender ST, for those with higher trait WMC, there was no difference in women’s and men’s fluid reasoning performance—under threat or in a control condition (Regner et al., 2010). However, Regner et al. (2010) found that when baseline WMC was lower, female students performed worse under threat compared with females under no-threat and worse than male students under threat. These results underscore the importance of higher WMC when navigating identity-threatening situations, specifically during contexts when the implications for demonstrating one’s cognitive ability is of great consequence, like with standardized tests.
The extent to which ST effects observed in the lab generalize to actual testing situations is difficult to assess (see Steele, 1997). Although a large body of work replicates ST effects, there have been issues with researcher degrees of freedom (see Simmons et al., 2011; Wicherts et al., 2016, on topics of researcher degrees of freedom and replication issues, generally) and replication in the ST literature (Inzlicht, 2016; Schimmack, 2017; also see Flore & Wicherts, 2014; Ganley et al., 2013; Sackett et al., 2004; Shewach et al., 2019; Wicherts, 2005). Nevertheless, the fact that threat effects have been shown to negatively impact some of the most capable students is alarming. Such issues necessitate further research to clarify boundary conditions or moderators of ST as well as the mechanisms underlying the effect (for review, see Beilock et al., 2007; Spencer et al., 2016; Wheeler & Petty, 2001). To date, many investigations of ST for race have focused on group-level effects (e.g., Black vs. White students), neglecting important individual differences that may mediate and/or moderate the effect. Also, to our knowledge, most studies designed to investigate the role of WMC focused on ST effects for gender, not race (Regner et al., 2010; Schmader & Johns, 2003). Work on the role of WMC for ST for race/ethnicity has been limited to White and Latino students’ performance and has explored the role of WMC as a mediator, not a moderator (cf. Schmader & Johns, 2003). We aim to address the extent to which individual differences in WMC mediate and/or moderate ST effects for race focused on Black students’ performance.
The threat manipulation procedure was based on Schmader and Johns (2003), with slight alterations: We included a measure of WMC before and after the threat manipulation.2 To make the present study amenable to threat effects for both operation (OSPAN; Turner & Engle, 1989; Unsworth et al., 2005) and reading (RSPAN; Daneman & Carpenter, 1980) spans, we induced the race prime before having participants complete the working memory tasks. For OSPAN and RSPAN, participants were informed that the task was indicative of quantitative or verbal capacity, respectively, and was highly related to measures of intelligence. Moreover, they were informed that performance is attributable to group membership. Following this, they completed an ethnicity survey, which was one item to indicate one’s race (see Supplemental Methodology). The control condition did not receive any directly threatening instructions and instead were told they would only complete a working memory task.
 Following Schmader and Johns (2003), we used an assessment of WMC to replicate the effect of threat on WMC and to test the hypothesis that state WMC mediates the effect of threat on standardized tests. To detect these effects, WMC must be measured post threat manipulation. However, we wanted to test the hypothesis that trait WMC moderates the effect of threat on standardized testing. To test this, WMC must be measured before the threat manipulation. Thus, half the participants performed RSPAN (reading) before the threat manipulation and OSPAN (math) after the threat and for the other half, vice versa. The quantitative capacity threat and intelligence group is analogous to Schmader and Johns, where OSPAN was administered post threat manipulation (see Fig. 1B).
To assess trait and state WMC, participants completed the automated OSPAN and RSPAN tasks (see Unsworth et al., 2005). OSPAN requires completing a series of arithmetic problems while remembering a list of letters. Participants solve a math problem for accuracy and then remember a letter for later recall. At the end of a list of trials, participants recall the letters in serial order (the total score was calculated using the partial unit method; see Conway et al., 2005). The RPSAN replaces math problems with making veridical judgments for sentences while remembering lists of letters. Participants received three to seven letters per trial and three sets of each trial length, totaling 15 trials—yielding a maximum score of 75.
Participants completed Math and Verbal sections of the Graduate Record Exam (GRE). The GRE mathematics subsection consisted of 25 multiple-choice or short-answer questions, each requiring mathematical reasoning and quantitative comparison skills. The GRE verbal subsection was the same length but contained verbal questions requiring the abilities to analyze, evaluate, and synthesize written material, in addition to recognizing relationships among words and concepts. Both sections were timed at 20 min and taken from free online practice materials provided by the Educational Testing Service (2018). The final score was the proportion of questions correct out of 25.
The Speilberger State-Trait Anxiety Inventory (STAI) short form (Marteau & Bekker, 1992) assessed state anxiety. The short form consisted of six questions, where participants indicated their present feelings using a 4-point scale ranging from 1 (not at all) to 4 (very much) to items such as “I feel calm.” Low levels of anxiety (e.g., “I am relaxed”) were reverse scored. A total score was obtained, with higher scores indicating greater anxiety.
Participants were tested in groups of up to six people. Participants completed the trait WMC span task, then received either threat or control instructions followed by a second, state WMC task. Participants who performed RSPAN before the threat manipulation completed OSPAN after the threat manipulation and vice versa. In the threat condition, participants received instructions similar to those of Schmader and Johns (2003) and completed the “ethnicity survey,” which served to prime race and induce ST. In the control condition, participants received similar instructions, which were modified to exclude the race prime.
Following the second WMC span task, all participants completed two sections of the GRE: one verbal and one quantitative section (ordered by task condition). After the GRE sections, participants completed STAI (Marteau & Bekker, 1992). Then, participants completed the postexperiment survey (this included a race prime question for those in the control condition). Lastly, participants were debriefed and thanked for their participation.
We aimed for adequate statistical power of at least 80%, based on effect sizes, as demonstrated in previous ST research. This was not straightforward, as researchers have computed effect-size estimates in different ways—some based on adjusted or unadjusted means (see Sackett et al., 2004; Shewach et al., 2019; Spencer et al., 2016; Wicherts, 2005). Additional work also suggests that the effects sizes in the literature may be inflated due to publication bias (Flore & Wicherts, 2014). We focused on recent reports on ethnic/racial ST for cognitive ability (Spencer et al., 2016). We then used the largest (d = 0.52) and smallest (d = 0.46) average effect sizes reported to estimate the sample size needed for a minimum of 80% statistical power. Using the powerInteract function in the powerMediation package in R (Qiu & Qiu, 2018), we estimated the sample size required for adequate statistical power based on a Race (White vs. Black) × Condition (threat vs. control) interaction effect,3 such that Factor A = 2 levels and Factor B = 2 levels for a larger effect size of Cohen’s d = 0.52, required a total n = 160 (at least 40 cases per cell); for alpha = 0.05 for a two-tailed test we would have statistical power of Beta = 0.824. For an average difference of Cohen’s d = 0.46, we would have statistical power of Beta = 0.806, requiring a total n = 120 (at least 30 cases per cell), for alpha = 0.05 for a two-tailed test. Based on the suggestion that the average effect sizes are inflated, we used the smaller average effect size reported in the literature to motivate the decision of recruiting at least 30 cases per cell in both Experiments 1 and 2. We strived to obtain statistical power based on the aforementioned calculation. Our efforts were limited by both resource and time constraints, and thus the final sample sizes obtained for Experiments 1 and 2 are reported in each Results section below.4 To address power concerns, we will also present results of combining samples from Experiments 1 and 2 in addition to running Bayesian regressions.
Below, we present two sets of results, one for each task order condition. Because we manipulated ST for either verbal or quantitative capacity in the threat condition, we separate the threat effects based on these task orders (see Fig. 1B).
Descriptive statistics and correlations are reported in Table 1. The measures of WMC (i.e., OSPAN and RSPAN) were strongly positively correlated with each other as well as moderately positively correlated with measures of SDTP (i.e., math and verbal GREs). Also revealed were that higher scores on anxiety negatively correlated with most of the performance measures.

An independent-samples t test indicated a significant difference in baseline RSPAN, t(197) = − 2.03, p = 0.044, Cohen’s d = 0.34, revealing that Black students (M = 56.27, SD = 12.37) scored significantly lower on RSPAN at baseline relative to White students5 (M = 60.10, SD = 10.78).
To test the effect of ST on state WMC and SDTP, we conducted a series of 2 (race) × 2 (condition) ANOVAs. The result of interest is the interaction effect of the race and condition variables (which would indicate ST) on both students’ WMC and standardized test performance. We found none of the Race × Condition interaction effects to be significant in the quantitative capacity and intelligence condition (see Table 2). The main effects for WMC and SDTP are next reported. The effect of ST on OSPAN revealed nonsignificant effects of condition, F(1, 195) = 0.66, p = 0.42, ηp2 = 0.0034, b = − 1.11, t(195) = − 0.63, CI95% [− 4.58, 2.36], p > 0.057 and race, F(1, 195) = 2.3, p = 0.13, ηp2 = 0.012, b = − 2.61, t(195) = − 1.04, CI95% [− 7.58, 2.36], p > 0.05.

In Step 2, the effect of RSPAN, b = 0.00046, t(41) = 0.18, p = 0.86, and the effect of condition, b = − 0.35, t(41) = − 1.54, p = 0.13, were not significant. The interaction, b = 0.0075, t(41) = 1.90, p = 0.065, approached significance. The change in model variance between Steps 1 and 2 also approached significance at the 0.065 level, F(1, 41) = 3.59, p = 0.065. Simple slopes (SS) tests revealed that when participants are under threat higher-WMC participants had higher predicted scores on the math GRE compared with those with lower WMC, b = 0.0080, p = 0.012, whereas, there was no significant change in GRE based on higher or lower WMC in the control condition, b = 0.0005, p = 0.86 (see Fig. 2). The model was also significant, F(3, 41) = 3.51, Multiple R2 = 0.20, p = 0.024.
For verbal GRE, in Step 1, neither the effect of RSPAN, b = 0.0018, t(42) = 0.90, p = 0.38, the effect of condition, b = 0.054, t(42) = 1.081, p = 0.29, nor the model were significant, F(2, 42) = 1.21, Multiple R2 = 0.054, p = 0.31. In Step 2, the effects of RSPAN, b = − 0.0013, t(41) = − 0.52, p = 0.61, and condition, b = − 0.38, t(41) = − 1.63, p = 0.11, were nonsignificant. However, their interaction, b = 0.0077, t(41) = 1.91, p = 0.063, approached significance. The change in variance accounted for between models also approached significance at the 0.06 level, F(1, 41) = 3.63, p = 0.064. Simple slopes tests indicated that under threat, higher WMC participants had higher predicted scores on the verbal GRE compared with those with lower WMC, b = 0.0063, p = 0.046. There was no significant change in predicted GRE scores in the control condition, b = − 0.0013, p = 0.61. (see Fig. 3) and the model was not significant, F(3, 41) = 2.06, Multiple R2 = 0.13, p = 0.12.
An independent-samples t test indicated a significant difference in OSPAN based on race, t(246) = − 2.49, CI95% [− 7.35, − 0.85], p = 0.014, Cohen’s d = 0.42, with White students significantly outperforming (M = 63.34, SD = 9.58) Black students (M = 59.24, SD = 10.50) by almost a half standard deviation difference.8
 We tested the effect of ST on state WMC and SDTP and found none of the interaction effects to be significant in the verbal capacity and intelligence condition (see Table 2). On the RSPAN we found a main effect for race, F(1, 244) = 10.20, p = 0.0016, ηp2 = 0.041, b = − 5.87, t(244) = − 2.07, CI95% [− 11.44, − 0.28], p < 0.05,10 such that White students performed higher (M = 59.28, SD = 11.89) than Black students (M = 52.81, SD = 12.13). The effect of condition, F(1, 244) = 0.17, p = 0.68, ηp2 = 0.00070, b = 0.87, t(244) = 0.51, CI95% [− 2.46, 4.19], p > 0.05, was nonsignificant.
For the verbal GRE, in Step 1, the effects of OSPAN, b = 0.0032, t(39) = 1.31, p = 0.198, condition, b = 0.059, t(39) = 1.16, p = 0.26, and the model were nonsignificant, F(2, 39) = 1.45, Multiple R2 = 0.069, p = 0.25. In Step 2, the effect of OSPAN trended toward significance at the 0.06 level, b = 0.0061, t(38) = 1.93, p = 0.06, indicating that there was only an effect of trait WMC (at the 0.06 level) such that higher OSPAN scores were associated with higher predicted scores on the verbal GRE. The effects of condition, b = 0.031, t(38) = 0.582, p = 0.56, the interaction, b = − 0.0070, t(38) = − 1.42, p = 0.16, and the model, F(3, 38) = 1.67, Multiple R2 = 0.116, were not significant (see Fig. 4).
For the math GRE, in Step 1, the effects of OSPAN, b = 0.0032, t(39) = 1.08, p = 0.29, condition, b = 0.017, t(39) = 0.27, p = 0.79, and the model were nonsignificant, F(2, 39) = 0.61, Multiple R2 = 0.030, p = 0.55. In Step 2, none the effects of OSPAN, b = 0.0059, t(38) = 1.5, p = 0.14, condition, b = − 0.0088, t(38) = − 0.13, p = 0.89, their interaction, b = − 0.0065, t(38) = − 1.08, p = 0.29, or the model, F(3, 38) = 0.80, Multiple R2 = 0.059 were significant (see Fig. 5).
Our sample may have higher trait WMC compared with samples of students at other universities—allowing students to remain resilient in the face of ST. Previous research suggests that higher WMC is associated with higher intelligence (Conway et al., 2003) and SDTP (Daneman & Carpenter, 1980). Experiment 1 provides potential explanations for why and how people succumb and are resilient to the effects of ST from a cognitive perspective. Because ST is viewed as a social-affective environmental phenomenon that disrupts cognitive performance, our data highlight the potential benefit of having more cognitive resources at baseline in order to combat its deleterious effects.
Experiment 2 implemented the same experimental design in a new sample to further examine the role of WMC during ST for race/ethnicity. We wondered whether Experiment 1 found an absence of a ST effect due to the private university sample being more motivated and/or experienced with high-stakes testing situations, and in turn, better able to use cognitive resources of WMC to perform competitively in spite of ST. Based on recent admissions and enrollment data,11 possible differences in preparation, experience, and motivation in the Experiment 1 sample are especially relevant when considering boundary conditions and the effect of ST. Experiment 2 recruited highly motivated students from a large state university, where overall GPA and previous standardized test scores were slightly less competitive.12
A total of 166 undergraduates from a public state university were recruited. Participants were invited if they were at least 18 years of age,13 native English speakers, and self-identified as White (98 students, 66 women) or Black (68 students, 55 women). All received credit toward a course requirement for participating.
The same experimental design as Experiment 1 was employed. ST was tested through a series of ANOVAs, and the role of WMC was tested through mediation and moderation analyses for Black students based on math or verbal ST. Materials and procedures were generally the same as Experiment 1.14
We recruited participants to achieve adequate statistical power based on the same criteria outlined in Experiment 1. Again, we present two “sets” of results, one for each task order condition (see Fig. 1). As in Experiment 1, the same data preparation and analytic approaches were implemented.
Descriptive statistics revealed the measures of WMC (i.e., OSPAN and RSPAN) were strongly positively correlated with each other as well as moderately positively correlated with measures of SDTP (i.e., math and verbal GREs). These trends are reported in Table 3 below.

Only the interaction effect on the second standardized test, the verbal GRE, was trending toward significance at the 0.11 level (see Table 4). Pairwise comparisons indicated that Black students scored lower on the verbal GRE but not significantly so under threat compared with the control (M diff = − 0.00047, p = 0.99). White students tended to score lower under threat compared with control (M diff = − 0.098, p = 0.10). All other interaction effects for the quantitative capacity and intelligence threat type were non-significant (see Table 4).

For verbal GRE in Step 1, the effects of RSPAN, b = 0.00083, t(34) = 0.81, p = 0.43, condition, b = − 0.0042, t(34) = − 0.13, p = 0.89 and the model, F(2, 34) = 0.326, Multiple R2 = 0.0188, p = 0.72, were nonsignificant. In Step 2, the effects of RSPAN, b = − 0.0011, t(33) = − 0.83, p = 0.42, condition, b = 0.0052, t(33) = 0.165, p = 0.87, and the model, F(3, 33) = 1.67, Multiple R2 = 0.132, p = 0.19, were nonsignificant. The interaction of RSPAN and condition was significant, b = 0.0041, t(33) = 2.07, p = 0.046. Simple slopes tests indicated that under threat, higher WMC participants had higher predicted scores on the verbal GRE compared with those with lower WMC, b = 0.0030, p = 0.045. There was no significant change in predicted GRE scores in the control condition, b = − 0.0011, p = 0.42. Based on the significant interaction, the models in Steps 1 and 2 were tested for a significant change in their variances. Analysis of change in model variances indicated there was a significant difference, F(1, 33) = 4.3, p = 0.046, providing further evidence for moderation of trait WMC on ST for Black students’ verbal GRE scores (see Fig. 4).
Independent-samples t test found that although White students performed about 4 points higher on average OSPAN, this difference was not significant,16t(83) = − 1.38, p = 0.17, Cohen’s d = 0.31.
 Only a significant race by condition interaction was found on WMC via the RSPAN task (see Table 4). Follow-up tests revealed White students in the threat condition did not differ from White students in the control (M diff = 2.17, p = 0.95). In contrast, Black students under threat tended to experience a performance decrease on RSPAN relative to Black students in the control condition; however, this finding was not significant (M diff = − 11.4, p = 0.17). All other interaction effects were nonsignificant (see Table 4).
For math, GRE18 results indicated the effect of race was nonsignificant, F(1, 80) = 2.5, p = 0.12, ηp2 = 0.035, b = − 0.071, t(80) = − 1.49, CI95% [− 0.17, 0.024], p > 0.05. The effect of condition was also nonsignificant, F(1, 80) = 1.06, p = 0.31, ηp2 = 0.013, b = 0.018, t(80) = 0.52, CI95% [− 0.051, 0.088], p > 0.05.
In Step 2, predicting math GRE, the effect of OSPAN, b = − 0.0018, t(26) = − 0.79, p = 0.43, was nonsignificant. The effect of condition, b = 0.064, t(26) = 1.9, p = 0.076, approached significance, indicating that the threat effect predicted higher scores on the math GRE. The interaction, b = 0.0054, t(26) = 2.15, p = 0.041, and the model, F(3, 26) = 4.10, Multiple R2 = 0.32, p = 0.017, were significant. Simple slopes tests indicated that under threat, higher WMC participants had higher predicted scores on the math GRE compared with those with lower WMC, b = 0.0036, p = 0.0051. There was no significant change in predicted GRE scores in the control condition, b = − 0.0018, p = 0.43 (see Fig. 9). Based on the significant interaction term, an ANOVA was conducted in order to determine whether there was a significant difference in the variance accounted for between these models. Results revealed a significant difference in the variance, F(1, 26) = 4.62, p = 0.041, providing additional support for moderation (see Fig. 5).
There was limited evidence for ST effects impacting performance on outcome measures of WMC and standardized test performance. Only in the case of ST for verbal capacity and intelligence on the RSPAN did these data reveal evidence of a threat effect. Overall, students appeared to be resilient to the effects of ST. Here, it was expected that because the student sample came from a less “selective”19 population of undergraduate students, ST effects would be found in addition to evidence that ST is moderated by high trait WMC and mediated by state WMC. Instead, there was no evidence to support the notion that state WMC mediates ST—in most cases there was no threat effect revealed. Although, there was a pattern of some evidence supporting our second hypothesis that trait WMC moderates the effect of ST on both math and verbal SDTP providing a performance benefit for Black students. We believe this could be the case because having higher WMC span corresponds with more domain-general resources, which is less limiting for higher WMC span (see Kovacs et al., 2019; Kovacs & Conway, 2016). Moreover, these greater domain-general resources means that performance is less impacted for higher WMC span than lower WMC span students when subjected to the cognitive demands of identity-threatening situations like racial/ethnic ST effects. Moreover, like in Experiment 1, higher trait WMC students may be able to remain resilient and take on ST as a challenge depending on how ST is activated and the performance domain. In Experiment 2 however, our students showed resilience to ST effects on the second GRE task—when the performance domain and domain of ST were different (see Figs. 1, 4, and 5).
In Step 2, predicting math GRE, the effect of RSPAN, b = 0.0025, t(78) = 1.39, p = 0.17, condition, b = 0.058, t(78) = 1.44, p = 0.16, the interaction, b = 0.0026, t(78) = 0.957, p = 0.34, were nonsignificant. The model, F(3, 78) = 3.78, Multiple R2 = 0.127, p = 0.014, was significant. We also tested the change in model variances and simple slopes. Although there was not a significant change in the model variances accounted for F(1, 78) = 0.92, p = 0.34, under threat, higher WMC participants had higher predicted scores on the math GRE compared with those with lower WMC, b = 0.0051, p = 0.013; however, there was no significant difference for those in the control condition, b = 0.0025, p = 0.17. Taken together, these results reveal weak support for trait WMC on the RSPAN moderating the effect of threat on the math GRE (see Fig. 6).
In Step 2, predicting verbal GRE, the effect of RSPAN, b = 0.0014, t(78) = 0.69, p = 0.049, was significant, however, condition, b = 0.031, t(78) = 0.71, p = 0.48, was nonsignificant. The interaction, b = 0.0057, t(78) = 1.94, p = 0.056, was marginal at the 0.056 level and the model, F(3, 78) = 3.83, Multiple R2 = 0.1283, p = 0.013, was significant. We also tested the change in model variance, which was also marginally significant at the 0.056 level, F(1, 78) = 3.76, p = 0.056. To further unpack this, we examined the simple slope analysis revealing that under threat higher WMC participants has higher predicted scores on the verbal GRE compared with those with lower WMC, b = 0.0070, p = 0.0018. There was no significant change in predicted GRE scores in the control condition b = 0.0014, p = 0.49. Taken together, these results also reveal weak support for trait WMC on the RSPAN moderating the effect of threat on the verbal GRE (see Fig. 7).
We used the BayesFactor package in R (Morey & Rouder, 2015) to compute Bayes factors for main-effects-only models and the main effects plus the interaction terms models. Each of these was run compared with a null or intercept only model. To test for moderation, we computed the Bayes factors for the main-effects-only model and compared those to the main effects plus interaction term models. We include the full results of the Bayesian regression moderation analyses for the quantitative capacity condition in Table 5 and discuss the relevant results of these analyses in more detail below.

We used the Bayes Factor package in R (Morey & Rouder, 2015) to compute Bayes Factors for main-effects-only models and the main effects plus the interaction terms models. Each of these was run compared to a null or intercept only model. To test for moderation, we computed the Bayes Factors for the main-effects-only model and compared those to the main effects plus interaction term models. We include the full results of the Bayesian regression moderation analyses for the verbal capacity condition in Table 6 and discuss the results in greater detail below.

The goal of the present work was to investigate WMC as a cognitive factor in relation to performance under ST. We explored WMC as both state and trait variables as a mediator and/or moderator of ST for race/ethnicity. Although WMC has been previously explored as a state and trait variable (Ilkowska & Engle, 2010) and previous research has suggested the importance of considering individual differences in WMC with regard to ST (Regner et al., 2010; Schmader & Johns, 2003; Schmader et al., 2008), to our knowledge, exploring ST with both state and trait measures of WMC is novel and has not been investigated for ST for race/ethnicity.
Based on the results of two experiments, we found that students appeared to be resilient to the effect of ST on standardized test performance—the ST manipulation did not consistently produce a performance decrement for Black students. We found evidence of ST only in Experiment 2 for the verbal capacity and intelligence condition and only on the state WMC reading span (RSPAN) measure. Because participants appeared to be resilient to ST, we wondered whether this was due to differences in these students’ prior experience with standardized tests and beliefs about their abilities. Experiment 1 contained an especially highly motivated, high achieving student sample, so we wondered whether there was an overarching implicit belief among these students about high ability that attenuated the effect of ST on performance. This was suspected post hoc based on the method of how ST was induced—stating that the task was “highly correlated with measures of intelligence,” could activate a broader identity among these participants (see Brannon et al., 2015, for more on the shifting of the self-schema in different contexts; also, Logel et al., 2009, for shifting self-schema during ST). For example, the identity of being a student at a selective private university could protect students from feeling threatened and incidentally provide performance enhancement rather than decrement. However, based on similar patterns in Experiment 2, this did not seem as plausible but was worth mentioning here, as ST theory asserts the importance of domain identification (see Steele, 1997). In terms of resilience and motivation, previous research demonstrated that higher trait resilience as measured by grit has been shown to be moderately but not significantly associated with higher WMC (Dale et al., 2018). At present, these data are unable to disentangle whether differences in students’ beliefs about ability impact WMC or ST and performance, but future work should investigate this further.
Another perspective on these results is that having higher trait WMC resources could allow participants experiencing ST to respond by taking on the standardized test as a challenge. Some previous work suggests that racial minority students are able to perform well on standardized tests when they are viewed or characterized as a challenge (Steele & Aronson, 1995). However, if students have more cognitive resources in WMC, they might differ in the ways they focus those resources and plan to complete the task at hand. There is previous research on individual differences in WMC that shows higher and lower span individuals differ in their strategies for approaching, performing, and solving different cognitive tasks (see Ilkowska & Engle, 2010; Shipstead et al., 2016; Unsworth et al., 2013; also see Delaney & Sahakyan, 2007). However, our results suggest the ability to do this in the context of ST could also differ depending on the whether the test is in the quantitative or verbal domains. Additional research is needed to better understand individual differences in WMC and how students respond to racial/ethnic ST in different task domains.
Another consideration is about how we found more cases of moderation when baseline WMC was on the RSPAN. This suggests that the predictive power of trait WMC using OSPAN or RSPAN may bring dissociations in the effects observed. In fact, previous research has found that because the OSPAN and RSPAN have different processing components (i.e., solving math problems for accuracy in OSPAN and reading sentences for accuracy in RSPAN), could cause the tasks to dissociate in their consistency for predicting performance on different outcomes (see Chow et al., 2016; Holden et al., 2020; Macnamara et al., 2011; Oberauer, 2009). However, this interpretation is speculative but worth further exploration.
Overall, across two experiments, we only found ST in Experiment 2 on the state WMC RSPAN measure—highlighting replication issues. It was unclear whether replication issues were due to the language used in our manipulation (based on Schmader & Johns, 2003, stating “this task is a measure of quantitative/verbal capacity and is highly related to measures of intelligence”) or based on issues with sample size.
We aimed to recruit adequate sample sizes, but due to reported replicability issues of ST (Schimmack, 2016, 2017; also Stricker & Ward, 2004), and potential effect-size inflation due to publication bias (see Flore & Wicherts, 2014), the sample size needed is much higher than originally anticipated (unknown to us at the time of planning the current study). For these reasons, we recommend recruiting samples that exceed 80% power based on effect sizes reported in the literature. To help clarify replication issues, we also recommend that future investigations use more direct language as in the manipulation based on the Steele and Aronson (1995) study (i.e., “this task is diagnostic/non-diagnostic of ability”).
The verbal section largely includes vocabulary (i.e., text completion and sentence equivalence) and reading comprehension items. Based on previous research the reading comprehension items have been linked with working memory and fluid intelligence and thus are likely to be more cognitively demanding than the text completion and sentence equivalent items (Daneman & Carpenter, 1980; De Jonge & De Jong, 1996; also see Vernucci et al., 2021). The quantitative reasoning section includes arithmetic, algebra, geometry, and data analysis problems. Previous research suggests that geometry may be the most complex or cognitively demanding among these question types, and differences in gender threat have been shown (see Huguet & Regner, 2007).
In general, more complex questions which involve more steps in order to arrive at a solution are also more cognitively demanding, so there is probably more to consider for understanding the cognitive load among the items and question domains (see Beilock & Carr, 2005; Beilock & DeCaro, 2007). For both the verbal and quantitative test sections several steps were involved in arriving at correct answers, and some questions had multiple correct answers, making it even more difficult to know which might be more cognitively demanding than others. Although this is a very interesting point, it is difficult to answer and remains outside of the scope of the present work but could be investigated more in future work.
Our current work emphasizes the need to explore and challenge a fundamental assumption of universality of human information processing by factoring in the racial sociocultural context of individuals performing cognitive tasks related to academic achievement (Holden et al., 2023; Thomas et al., 2023). Although inequality and gaps in education and achievement remain, the current work helps provide a better understanding of individual differences in cognitive resources that are available for performance on competitive standardized tests. We employed a comprehensive approach, combining experimental and differential methods for investigating ST—helping the field continue to uncover how, when, and why ST operates and how to better combat it. Based on our results, future work should focus on ways to conserve precious mental resources, especially for minority students. As recent work finds that WMC is enhanced through training in general (see Jaeggi et al., 2008, 2011, 2013; Redick et al., 2013) and in more diverse and at-risk samples (see Wong et al., 2024), future work should consider additional forms of cognitive intervention that are helpful for students who are vulnerable to ST for race/ethnicity. For example, mindfulness practices might be an important avenue for future research related to ST as these practices as forms of self-regulation have been shown to improve cognitive functioning, WMC, and potentially improve SDTP (Morrison & Jha, 2015; Mrazek et al., 2013).


Sections

"[{\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR72\", \"CR72\", \"CR59\", \"CR63\", \"CR24\", \"CR75\", \"CR42\", \"CR23\", \"CR30\", \"CR59\"], \"section\": \"Stereotype threat\", \"text\": \"For people who have stigmatized social or group identities, being primed with a negative stereotype has been shown to cause underperformance on standardized tests relative to a control group, an effect known as stereotype threat (ST; Steele & Aronson, 1995). As one of the most researched topics in psychology, the effect of ST on performance has been shown for ethnic/racial identity (Steele & Aronson, 1995), sex/gender (Regner et al., 2010; Schmader & Johns, 2003), socioeconomic status (Flores et al., 2018; Tine & Gotlieb, 2013), and age (Levy, 1996). Outside of the U.S. context, ST research has focused a great deal on gender threat\\u2014examining its impact on women\\u2019s performance in math or STEM domains (see Flore & Wicherts, 2014; also Huguet & Regner, 2007; Regner et al., 2010). However, if a negative stereotype exists regarding cognitive performance of certain racial/ethnic groups in other countries, then in theory, ST effects could be observed in those cases, too. In the U.S. context compared with outside the U.S., the impact racial/ethnic ST effects have on performance is complicated by differences in attributions made based on having a racialized minority status. While these are interesting questions, they remain largely outside the scope of the current work, as we will focus on investigating racial/ethnic ST effects in U.S. student samples.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR57\", \"CR52\", \"CR53\", \"Fn1\", \"CR21\", \"CR71\", \"CR72\"], \"section\": \"Stereotype threat\", \"text\": \"Although ST has been investigated widely, we focus on the effect of ST on cognitive tasks that are important for racialized minority students\\u2019 test performance and achievement in the U.S. Using data from the National Assessment of Educational Progress (NAEP), Stanford\\u2019s Center for Education Policy Analysis suggests that gaps in achievement between minority students and White students have narrowed since the 1970 s (Reardon, 2015; also, National Center for Education Statistics, 2013). However, gaps remain in White and Black students\\u2019 standardized test performance (SDTP) for reading and math in elementary through high school (National Center for Education Statistics, 2017). In the U.S. educational system, standardized tests such as the Scholastic Assessment Test (SAT) and the Graduate Record Exam (GRE)1 are important for demonstrating preparation for admittance to higher education institutions. Now, many undergraduate and doctoral programs have made these tests optional, but they are still required in some places. These tests are administered to millions of students in and outside of the U.S. (ETS, 2018), and achievement gaps remain between White and Black students in the U.S. Previous work suggests that worries about being negatively stereotyped based on racial/ethnic group is enough to impact Black students\\u2019 performance on standardized tests and other academic assessments (see Steele, 1997; Steele & Aronson, 1995).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR71\", \"CR5\", \"CR6\"], \"section\": \"Stereotype threat\", \"text\": \"Although there are continuing debates about their utility, standardized tests still have the power to open and close doors for students, and understanding the influence(s) of the testing environment for students is essential. The pressure of performing well, coupled with other anxieties, can cause some students to \\u201cchoke\\u201d and underperform. Ironically, those who are among those most capable of performing well\\u2014deemed the academic \\u201cvanguard\\u201d (Steele, 1997)\\u2014tend to underperform in high-pressure situations (cf. Beilock & Carr, 2005; Beilock & DeCaro, 2007).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR46\", \"CR50\", \"CR72\", \"CR22\", \"CR26\", \"CR38\", \"CR39\", \"CR69\", \"CR28\", \"CR4\", \"CR12\", \"CR59\", \"CR63\", \"CR1\", \"CR8\", \"CR20\", \"CR18\", \"CR19\", \"CR84\", \"CR72\", \"CR64\", \"CR70\"], \"section\": \"Stereotype threat\", \"text\": \"These findings beg the question of what factors are most important in understanding differences in performance on assessments of cognitive ability. Suggested factors include situational or contextual influences (Massey & Owens, 2014; Mullainathan & Shafir, 2013; Steele & Aronson, 1995), practice or experience (Ericsson et al., 1993; Hambrick et al., 2016), one\\u2019s intelligence (Jensen, 1969; Kovacs & Conway, 2016; Spearman, 1904; also see Holden & Tanenbaum, 2023), working memory capacity (WMC; Baddeley & Hitch, 1974; Conway et al., 2003; Regner et al., 2010; Schmader & Johns, 2003), as well as personality and attitudinal factors (Aronson et al., 2002; Blackwell et al., 2007; Dweck & Leggett, 1988; also Duckworth & Yeager, 2015; Durlak et al., 2011; Yeager & Walton, 2011). Based on Steele and Aronson (1995), having to contend with others viewing you based on a stereotype is enough to negatively shift your performance; however, the phenomenological process of ST is quite complicated and involves many integrated cognitive, affective, and physiological processes (Schmader et al., 2008). In terms of the cognitive mechanisms involved in ST, previous research suggests the importance of cognitive control or our ability to focus our attention and mental resources on completing a task goal (see Spencer et al., 2016).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR71\", \"CR37\"], \"section\": \"Stereotype threat\", \"text\": \"When handling the cognitive demands of ST, people can be impacted in their ability to focus and control their mental resources. When a negative stereotype is made salient for a certain group, those identified with that group may have concerns about performing in way that would \\u201cprove\\u201d the stereotype. Considering this, people are motivated to avoid \\u201cconfirming\\u201d the stereotype (see Steele, 1997) which could involve putting forth extra effort. Work by Jamieson and Harkins (2007) showed that participants under ST had difficulty inhibiting automatic responses on the antisaccade task but were able to quickly correct their responses. These findings suggest that although the antisaccade task was challenging under ST, participants were not overloaded cognitively in that they were still able to update their responses from incorrect to correct.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR7\", \"CR37\", \"CR59\", \"CR64\", \"CR63\", \"CR71\"], \"section\": \"Stereotype threat\", \"text\": \"Additional work has suggested that when we are faced with challenging cognitive tasks and experiencing ST, we are contending with the stress and worries regarding ST as well as regulating ourselves to focus our attention and mental resources on completing the task at hand (see Beilock et al., 2007; Jamieson & Harkins, 2007; Regner et al., 2010; Schmader et al., 2008; Schmader & Johns, 2003; Steele, 1997). This means that ST can introduce an additional form of mental load on top of the challenges associated with solving difficult problems alone. Overall, much of previous research examining the cognitive mechanisms involved in ST suggests that cognitive control abilities, like the capacity of working memory, are of great importance. Activating ST may cause people to have difficulty with inhibiting incorrect responses, and especially so when the task is very challenging. This means the impact of differences in cognitive control and capacity of working memory on SDTP in the context of ST is important to understand. Next, we will consider the consequences of ST in terms of SDTP, then we will discuss the role of cognitive control through differences in working memory.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR56\", \"CR65\", \"CR41\", \"CR51\"], \"section\": \"Consequences of stereotype threat\", \"text\": \"A key component of ST theory is about the consequences of being the target of stereotyping, including effects on assessments of cognitive ability, like standardized tests. Although the emphasis of standardized tests for placement and acceptance into academic programs has been criticized and has decreased some over time (Ramirez, 2008; Serrano, 2015), generally, such tests are still utilized and often weighted heavily in admissions decisions (Lauryn, 2017; National Association for College Admission Counseling, 2016). For these reasons, additional research is needed to clarify influences during high-stakes testing for targets of stereotyping and ST. Because our focus is on the cognitive performance responsible for differences in standardized test performance, we discuss two perspectives from the literature that involve the ways ST impacts cognitive resources through WMC.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR2\", \"CR3\", \"CR4\", \"CR13\", \"CR15\", \"CR76\", \"CR63\", \"CR64\", \"CR33\", \"CR63\"], \"section\": \"Stereotype threat and working memory capacity\", \"text\": \"One perspective is that experiencing threats to group identity causes a reduction in the ability to focus on the task at hand. Working memory is an established construct in the literature that taps our ability to focus attention on a task, while simultaneously storing, retrieving, and updating other key information (Baddeley, 1996, 2000; Baddeley & Hitch, 1974). Working memory (WM) is thought to have a capacity component (called working memory capacity, or WMC) that varies across individuals and constrains the parameters by which people activate and utilize the cognitive resources at their disposal (Cowan, 1988; Daneman & Carpenter, 1980; Turner & Engle, 1989). When considering the role of cognitive resources during identity-threatening situations, Schmader and Johns (2003) found that ST causes a depletion in the form of lower WMC. Further, they characterized this finding as a decrease in the ability to regulate one\\u2019s behavior in a goal-oriented way (also see Schmader et al., 2008). Others have suggested a state of mental depletion spills over and disrupts later task performance (Inzlicht & Schmeichel, 2012). For example, in several studies, under ST for either race or gender, WMC was shown to decrease compared with a control condition, and WMC mediated ST for gender on SDTP in math (Schmader & Johns, 2003). Thus, ST for gender was at least partially explained by differences in WMC.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR59\", \"CR59\"], \"section\": \"Stereotype threat and working memory capacity\", \"text\": \"Another perspective is that threats to group identity are moderated by the cognitive resources available at baseline. This view does not posit that threat is negated altogether; instead, it suggests when differences in trait cognitive resources are higher, they help combat identity-threatening situations. When considering baseline WMC during gender ST, for those with higher trait WMC, there was no difference in women\\u2019s and men\\u2019s fluid reasoning performance\\u2014under threat or in a control condition (Regner et al., 2010). However, Regner et al. (2010) found that when baseline WMC was lower, female students performed worse under threat compared with females under no-threat and worse than male students under threat. These results underscore the importance of higher WMC when navigating identity-threatening situations, specifically during contexts when the implications for demonstrating one\\u2019s cognitive ability is of great consequence, like with standardized tests.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR71\", \"CR68\", \"CR82\", \"CR32\", \"CR61\", \"CR23\", \"CR25\", \"CR60\", \"CR66\", \"CR81\", \"CR7\", \"CR70\", \"CR80\", \"CR59\", \"CR63\", \"CR63\"], \"section\": \"The current work\", \"text\": \"The extent to which ST effects observed in the lab generalize to actual testing situations is difficult to assess (see Steele, 1997). Although a large body of work replicates ST effects, there have been issues with researcher degrees of freedom (see Simmons et al., 2011; Wicherts et al., 2016, on topics of researcher degrees of freedom and replication issues, generally) and replication in the ST literature (Inzlicht, 2016; Schimmack, 2017; also see Flore & Wicherts, 2014; Ganley et al., 2013; Sackett et al., 2004; Shewach et al., 2019; Wicherts, 2005). Nevertheless, the fact that threat effects have been shown to negatively impact some of the most capable students is alarming. Such issues necessitate further research to clarify boundary conditions or moderators of ST as well as the mechanisms underlying the effect (for review, see Beilock et al., 2007; Spencer et al., 2016; Wheeler & Petty, 2001). To date, many investigations of ST for race have focused on group-level effects (e.g., Black vs. White students), neglecting important individual differences that may mediate and/or moderate the effect. Also, to our knowledge, most studies designed to investigate the role of WMC focused on ST effects for gender, not race (Regner et al., 2010; Schmader & Johns, 2003). Work on the role of WMC for ST for race/ethnicity has been limited to White and Latino students\\u2019 performance and has explored the role of WMC as a mediator, not a moderator (cf. Schmader & Johns, 2003). We aim to address the extent to which individual differences in WMC mediate and/or moderate ST effects for race focused on Black students\\u2019 performance.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR63\", \"Fn2\", \"CR76\", \"CR78\", \"CR15\"], \"section\": \"Stereotype threat manipulation\", \"text\": \"The threat manipulation procedure was based on Schmader and Johns (2003), with slight alterations: We included a measure of WMC before and after the threat manipulation.2 To make the present study amenable to threat effects for both operation (OSPAN; Turner & Engle, 1989; Unsworth et al., 2005) and reading (RSPAN; Daneman & Carpenter, 1980) spans, we induced the race prime before having participants complete the working memory tasks. For OSPAN and RSPAN, participants were informed that the task was indicative of quantitative or verbal capacity, respectively, and was highly related to measures of intelligence. Moreover, they were informed that performance is attributable to group membership. Following this, they completed an ethnicity survey, which was one item to indicate one\\u2019s race (see Supplemental Methodology). The control condition did not receive any directly threatening instructions and instead were told they would only complete a working memory task.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR63\", \"Fig1\"], \"section\": \"Order manipulation\", \"text\": \"\\u00a0Following Schmader and Johns (2003), we used an assessment of WMC to replicate the effect of threat on WMC and to test the hypothesis that state WMC mediates the effect of threat on standardized tests. To detect these effects, WMC must be measured post threat manipulation. However, we wanted to test the hypothesis that trait WMC moderates the effect of threat on standardized testing. To test this, WMC must be measured before the threat manipulation. Thus, half the participants performed RSPAN (reading) before the threat manipulation and OSPAN (math) after the threat and for the other half, vice versa. The quantitative capacity threat and intelligence group is analogous to Schmader and Johns, where OSPAN was administered post threat manipulation (see Fig.\\u00a01B).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR78\", \"CR11\"], \"section\": \"Working memory capacity\", \"text\": \"To assess trait and state WMC, participants completed the automated OSPAN and RSPAN tasks (see Unsworth et al., 2005). OSPAN requires completing a series of arithmetic problems while remembering a list of letters. Participants solve a math problem for accuracy and then remember a letter for later recall. At the end of a list of trials, participants recall the letters in serial order (the total score was calculated using the partial unit method; see Conway et al., 2005). The RPSAN replaces math problems with making veridical judgments for sentences while remembering lists of letters. Participants received three to seven letters per trial and three sets of each trial length, totaling 15 trials\\u2014yielding a maximum score of 75.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR21\"], \"section\": \"Standardized test performance\", \"text\": \"Participants completed Math and Verbal sections of the Graduate Record Exam (GRE). The GRE mathematics subsection consisted of 25 multiple-choice or short-answer questions, each requiring mathematical reasoning and quantitative comparison skills. The GRE verbal subsection was the same length but contained verbal questions requiring the abilities to analyze, evaluate, and synthesize written material, in addition to recognizing relationships among words and concepts. Both sections were timed at 20 min and taken from free online practice materials provided by the Educational Testing Service (2018). The final score was the proportion of questions correct out of 25.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR45\"], \"section\": \"STAI\", \"text\": \"The Speilberger State-Trait Anxiety Inventory (STAI) short form (Marteau & Bekker, 1992) assessed state anxiety. The short form consisted of six questions, where participants indicated their present feelings using a 4-point scale ranging from 1 (not at all) to 4 (very much) to items such as \\u201cI feel calm.\\u201d Low levels of anxiety (e.g., \\u201cI am relaxed\\u201d) were reverse scored. A total score was obtained, with higher scores indicating greater anxiety.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR63\"], \"section\": \"Procedure\", \"text\": \"Participants were tested in groups of up to six people. Participants completed the trait WMC span task, then received either threat or control instructions followed by a second, state WMC task. Participants who performed RSPAN before the threat manipulation completed OSPAN after the threat manipulation and vice versa. In the threat condition, participants received instructions similar to those of Schmader and Johns (2003) and completed the \\u201cethnicity survey,\\u201d which served to prime race and induce ST. In the control condition, participants received similar instructions, which were modified to exclude the race prime.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR45\"], \"section\": \"Procedure\", \"text\": \"Following the second WMC span task, all participants completed two sections of the GRE: one verbal and one quantitative section (ordered by task condition). After the GRE sections, participants completed STAI (Marteau & Bekker, 1992). Then, participants completed the postexperiment survey (this included a race prime question for those in the control condition). Lastly, participants were debriefed and thanked for their participation.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR60\", \"CR66\", \"CR70\", \"CR81\", \"CR23\", \"CR70\", \"CR55\", \"Fn3\", \"Fn4\"], \"section\": \"Statistical power\", \"text\": \"We aimed for adequate statistical power of at least 80%, based on effect sizes, as demonstrated in previous ST research. This was not straightforward, as researchers have computed effect-size estimates in different ways\\u2014some based on adjusted or unadjusted means (see Sackett et al., 2004; Shewach et al., 2019; Spencer et al., 2016; Wicherts, 2005). Additional work also suggests that the effects sizes in the literature may be inflated due to publication bias (Flore & Wicherts, 2014). We focused on recent reports on ethnic/racial ST for cognitive ability (Spencer et al., 2016). We then used the largest (d\\u2009= 0.52) and smallest (d\\u2009= 0.46) average effect sizes reported to estimate the sample size needed for a minimum of 80% statistical power. Using the powerInteract function in the powerMediation package in R (Qiu & Qiu, 2018), we estimated the sample size required for adequate statistical power based on a Race (White vs. Black) \\u00d7\\u2009Condition (threat vs. control) interaction effect,3 such that Factor A\\u2009= 2 levels and Factor B\\u2009= 2 levels for a larger effect size of Cohen\\u2019s d\\u2009= 0.52, required a total n\\u2009= 160 (at least 40 cases per cell); for alpha =\\u20090.05 for a two-tailed test we would have statistical power of Beta =\\u20090.824. For an average difference of Cohen\\u2019s d\\u2009= 0.46, we would have statistical power of Beta =\\u20090.806, requiring a total n\\u2009= 120 (at least 30 cases per cell), for alpha =\\u20090.05 for a two-tailed test. Based on the suggestion that the average effect sizes are inflated, we used the smaller average effect size reported in the literature to motivate the decision of recruiting at least 30 cases per cell in both Experiments 1 and 2. We strived to obtain statistical power based on the aforementioned calculation. Our efforts were limited by both resource and time constraints, and thus the final sample sizes obtained for Experiments 1 and 2 are reported in each Results section below.4 To address power concerns, we will also present results of combining samples from Experiments 1 and 2 in addition to running Bayesian regressions.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fig1\"], \"section\": \"Data preparation and analytic approach\", \"text\": \"Below, we present two sets of results, one for each task order condition. Because we manipulated ST for either verbal or quantitative capacity in the threat condition, we separate the threat effects based on these task orders (see Fig.\\u00a01B).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Tab1\"], \"section\": \"Summary statistics and correlations\", \"text\": \"Descriptive statistics and correlations are reported in Table 1. The measures of WMC (i.e., OSPAN and RSPAN) were strongly positively correlated with each other as well as moderately positively correlated with measures of SDTP (i.e., math and verbal GREs). Also revealed were that higher scores on anxiety negatively correlated with most of the performance measures.\\n\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fn5\"], \"section\": \"Baseline comparison\", \"text\": \"An independent-samples t test indicated a significant difference in baseline RSPAN, t(197) = \\u2212 2.03, p\\u2009= 0.044, Cohen\\u2019s d\\u2009= 0.34, revealing that Black students (M\\u2009= 56.27, SD\\u2009= 12.37) scored significantly lower on RSPAN at baseline relative to White students5 (M\\u2009= 60.10, SD\\u2009= 10.78).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Tab2\", \"Fn7\"], \"section\": \"Threat effects6\", \"text\": \"To test the effect of ST on state WMC and SDTP, we conducted a series of 2 (race) \\u00d7\\u20092 (condition) ANOVAs. The result of interest is the interaction effect of the race and condition variables (which would indicate ST) on both students\\u2019 WMC and standardized test performance. We found none of the Race \\u00d7\\u2009Condition interaction effects to be significant in the quantitative capacity and intelligence condition (see Table 2). The main effects for WMC and SDTP are next reported. The effect of ST on OSPAN revealed nonsignificant effects of condition, F(1, 195) =\\u20090.66, p\\u2009= 0.42, \\u03b7p2\\u2009= 0.0034, b\\u2009= \\u2212\\u20091.11, t(195) = \\u2212 0.63, CI95% [\\u2212 4.58, 2.36], p\\u2009> 0.057 and race, F(1, 195) =\\u20092.3, p\\u2009= 0.13, \\u03b7p2\\u2009= 0.012, b\\u2009= \\u2212\\u20092.61, t(195) = \\u2212 1.04, CI95% [\\u2212 7.58, 2.36], p\\u2009> 0.05.\\n\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fig2\"], \"section\": \"Mediation and moderation\", \"text\": \"In Step 2, the effect of RSPAN, b\\u2009= 0.00046, t(41) =\\u20090.18, p\\u2009= 0.86, and the effect of condition, b\\u2009= \\u2212\\u20090.35, t(41) = \\u2212 1.54, p\\u2009= 0.13, were not significant. The interaction, b\\u2009= 0.0075, t(41) =\\u20091.90, p\\u2009= 0.065, approached significance. The change in model variance between Steps 1 and 2 also approached significance at the 0.065 level, F(1, 41) =\\u20093.59, p\\u2009= 0.065. Simple slopes (SS) tests revealed that when participants are under threat higher-WMC participants had higher predicted scores on the math GRE compared with those with lower WMC, b\\u2009= 0.0080, p\\u2009= 0.012, whereas, there was no significant change in GRE based on higher or lower WMC in the control condition, b\\u2009= 0.0005, p\\u2009= 0.86 (see Fig.\\u00a02). The model was also significant, F(3, 41) =\\u20093.51, Multiple R2 = 0.20, p\\u2009= 0.024.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fig3\"], \"section\": \"Mediation and moderation\", \"text\": \"For verbal GRE, in Step 1, neither the effect of RSPAN, b\\u2009= 0.0018, t(42) =\\u20090.90, p\\u2009= 0.38, the effect of condition, b\\u2009= 0.054, t(42) =\\u20091.081, p\\u2009= 0.29, nor the model were significant, F(2, 42) =\\u20091.21, Multiple R2 = 0.054, p\\u2009= 0.31. In Step 2, the effects of RSPAN, b\\u2009= \\u2212\\u20090.0013, t(41) = \\u2212 0.52, p\\u2009= 0.61, and condition, b\\u2009= \\u2212\\u20090.38, t(41) = \\u2212 1.63, p\\u2009= 0.11, were nonsignificant. However, their interaction, b\\u2009= 0.0077, t(41) =\\u20091.91, p\\u2009= 0.063, approached significance. The change in variance accounted for between models also approached significance at the 0.06 level, F(1, 41) =\\u20093.63, p\\u2009= 0.064. Simple slopes tests indicated that under threat, higher WMC participants had higher predicted scores on the verbal GRE compared with those with lower WMC, b\\u2009= 0.0063, p\\u2009= 0.046. There was no significant change in predicted GRE scores in the control condition, b\\u2009= \\u2212\\u20090.0013, p\\u2009= 0.61. (see Fig.\\u00a03) and the model was not significant, F(3, 41) =\\u20092.06, Multiple R2\\u2009= 0.13, p\\u2009= 0.12.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fn8\"], \"section\": \"Baseline comparison\", \"text\": \"An independent-samples t test indicated a significant difference in OSPAN based on race, t(246) = \\u2212 2.49, CI95% [\\u2212 7.35, \\u2212\\u20090.85], p\\u2009= 0.014, Cohen\\u2019s d\\u2009= 0.42, with White students significantly outperforming (M\\u2009= 63.34, SD\\u2009= 9.58) Black students (M\\u2009= 59.24, SD\\u2009= 10.50) by almost a half standard deviation difference.8\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Tab2\", \"Fn10\"], \"section\": \"Threat effects9\", \"text\": \"\\u00a0We tested the effect of ST on state WMC and SDTP and found none of the interaction effects to be significant in the verbal capacity and intelligence condition (see Table 2). On the RSPAN we found a main effect for race, F(1, 244) =\\u200910.20, p\\u2009= 0.0016, \\u03b7p2\\u2009= 0.041, b\\u2009= \\u2212\\u20095.87, t(244) = \\u2212 2.07, CI95% [\\u2212 11.44, \\u2212\\u20090.28], p\\u2009< 0.05,10 such that White students performed higher (M\\u2009= 59.28, SD\\u2009= 11.89) than Black students (M\\u2009= 52.81, SD\\u2009= 12.13). The effect of condition, F(1, 244) =\\u20090.17, p\\u2009= 0.68, \\u03b7p2\\u2009= 0.00070, b\\u2009= 0.87, t(244) =\\u20090.51, CI95% [\\u2212 2.46, 4.19], p\\u2009> 0.05, was nonsignificant.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fig4\"], \"section\": \"Mediation and moderation\", \"text\": \"For the verbal GRE, in Step 1, the effects of OSPAN, b\\u2009= 0.0032, t(39) =\\u20091.31, p\\u2009= 0.198, condition, b\\u2009= 0.059, t(39) =\\u20091.16, p\\u2009= 0.26, and the model were nonsignificant, F(2, 39) =\\u20091.45, Multiple R2 = 0.069, p\\u2009= 0.25. In Step 2, the effect of OSPAN trended toward significance at the 0.06 level, b\\u2009= 0.0061, t(38) =\\u20091.93, p\\u2009= 0.06, indicating that there was only an effect of trait WMC (at the 0.06 level) such that higher OSPAN scores were associated with higher predicted scores on the verbal GRE. The effects of condition, b\\u2009= 0.031, t(38) =\\u20090.582, p\\u2009= 0.56, the interaction, b\\u2009= \\u2212 0.0070, t(38) = \\u2212 1.42, p\\u2009= 0.16, and the model, F(3, 38) =\\u20091.67, Multiple R2 = 0.116, were not significant (see Fig.\\u00a04).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fig5\"], \"section\": \"Mediation and moderation\", \"text\": \"For the math GRE, in Step 1, the effects of OSPAN, b\\u2009= 0.0032, t(39) =\\u20091.08, p\\u2009= 0.29, condition, b\\u2009= 0.017, t(39) =\\u20090.27, p\\u2009= 0.79, and the model were nonsignificant, F(2, 39) =\\u20090.61, Multiple R2 = 0.030, p\\u2009= 0.55. In Step 2, none the effects of OSPAN, b\\u2009= 0.0059, t(38) =\\u20091.5, p\\u2009= 0.14, condition, b\\u2009= \\u2212\\u20090.0088, t(38) = \\u2212 0.13, p\\u2009= 0.89, their interaction, b\\u2009= \\u2212\\u20090.0065, t(38) = \\u2212 1.08, p\\u2009= 0.29, or the model, F(3, 38) =\\u20090.80, Multiple R2 = 0.059 were significant (see Fig.\\u00a05).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR12\", \"CR15\"], \"section\": \"Discussion\", \"text\": \"Our sample may have higher trait WMC compared with samples of students at other universities\\u2014allowing students to remain resilient in the face of ST. Previous research suggests that higher WMC is associated with higher intelligence (Conway et al., 2003) and SDTP (Daneman & Carpenter, 1980). Experiment 1 provides potential explanations for why and how people succumb and are resilient to the effects of ST from a cognitive perspective. Because ST is viewed as a social-affective environmental phenomenon that disrupts cognitive performance, our data highlight the potential benefit of having more cognitive resources at baseline in order to combat its deleterious effects.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fn11\", \"Fn12\"], \"section\": \"Experiment 2\", \"text\": \"Experiment 2 implemented the same experimental design in a new sample to further examine the role of WMC during ST for race/ethnicity. We wondered whether Experiment 1 found an absence of a ST effect due to the private university sample being more motivated and/or experienced with high-stakes testing situations, and in turn, better able to use cognitive resources of WMC to perform competitively in spite of ST. Based on recent admissions and enrollment data,11 possible differences in preparation, experience, and motivation in the Experiment 1 sample are especially relevant when considering boundary conditions and the effect of ST. Experiment 2 recruited highly motivated students from a large state university, where overall GPA and previous standardized test scores were slightly less competitive.12\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fn13\"], \"section\": \"Participants\", \"text\": \"A total of 166 undergraduates from a public state university were recruited. Participants were invited if they were at least 18 years of age,13 native English speakers, and self-identified as White (98 students, 66 women) or Black (68 students, 55 women). All received credit toward a course requirement for participating.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fn14\"], \"section\": \"Design and procedure\", \"text\": \"The same experimental design as Experiment 1 was employed. ST was tested through a series of ANOVAs, and the role of WMC was tested through mediation and moderation analyses for Black students based on math or verbal ST. Materials and procedures were generally the same as Experiment 1.14\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fig1\"], \"section\": \"Results\", \"text\": \"We recruited participants to achieve adequate statistical power based on the same criteria outlined in Experiment 1. Again, we present two \\u201csets\\u201d of results, one for each task order condition (see Fig.\\u00a01). As in Experiment 1, the same data preparation and analytic approaches were implemented.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Tab3\"], \"section\": \"Summary statistics and correlations\", \"text\": \"Descriptive statistics revealed the measures of WMC (i.e., OSPAN and RSPAN) were strongly positively correlated with each other as well as moderately positively correlated with measures of SDTP (i.e., math and verbal GREs). These trends are reported in Table 3 below.\\n\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Tab4\", \"Tab4\"], \"section\": \"Threat effects15\", \"text\": \"Only the interaction effect on the second standardized test, the verbal GRE, was trending toward significance at the 0.11 level (see Table 4). Pairwise comparisons indicated that Black students scored lower on the verbal GRE but not significantly so under threat compared with the control (M diff = \\u2212 0.00047, p\\u2009= 0.99). White students tended to score lower under threat compared with control (M diff = \\u2212 0.098, p\\u2009= 0.10). All other interaction effects for the quantitative capacity and intelligence threat type were non-significant (see Table 4).\\n\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fig4\"], \"section\": \"Mediation and moderation\", \"text\": \"For verbal GRE in Step 1, the effects of RSPAN, b\\u2009= 0.00083, t(34) =\\u20090.81, p\\u2009= 0.43, condition, b\\u2009= \\u2212\\u20090.0042, t(34) = \\u2212 0.13, p\\u2009= 0.89 and the model, F(2, 34) =\\u20090.326, Multiple R2\\u2009= 0.0188, p\\u2009= 0.72, were nonsignificant. In Step 2, the effects of RSPAN, b\\u2009= \\u2212\\u20090.0011, t(33) = \\u2212 0.83, p\\u2009= 0.42, condition, b\\u2009= 0.0052, t(33) =\\u20090.165, p\\u2009= 0.87, and the model, F(3, 33) =\\u20091.67, Multiple R2\\u2009= 0.132, p\\u2009= 0.19, were nonsignificant. The interaction of RSPAN and condition was significant, b\\u2009= 0.0041, t(33) =\\u20092.07, p\\u2009= 0.046. Simple slopes tests indicated that under threat, higher WMC participants had higher predicted scores on the verbal GRE compared with those with lower WMC, b\\u2009= 0.0030, p\\u2009= 0.045. There was no significant change in predicted GRE scores in the control condition, b\\u2009= \\u2212\\u20090.0011, p\\u2009= 0.42. Based on the significant interaction, the models in Steps 1 and 2 were tested for a significant change in their variances. Analysis of change in model variances indicated there was a significant difference, F(1, 33) =\\u20094.3, p\\u2009= 0.046, providing further evidence for moderation of trait WMC on ST for Black students\\u2019 verbal GRE scores (see Fig.\\u00a04).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fn16\"], \"section\": \"Baseline comparison\", \"text\": \"Independent-samples t test found that although White students performed about 4 points higher on average OSPAN, this difference was not significant,16t(83) = \\u2212 1.38, p\\u2009= 0.17, Cohen\\u2019s d\\u2009= 0.31.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Tab4\", \"Tab4\"], \"section\": \"Threat effects17\", \"text\": \"\\u00a0Only a significant race by condition interaction was found on WMC via the RSPAN task (see Table 4). Follow-up tests revealed White students in the threat condition did not differ from White students in the control (M diff =\\u20092.17, p\\u2009= 0.95). In contrast, Black students under threat tended to experience a performance decrease on RSPAN relative to Black students in the control condition; however, this finding was not significant (M diff = \\u2212 11.4, p\\u2009= 0.17). All other interaction effects were nonsignificant (see Table 4).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fn18\"], \"section\": \"Threat effects17\", \"text\": \"For math, GRE18 results indicated the effect of race was nonsignificant, F(1, 80) =\\u20092.5, p\\u2009= 0.12, \\u03b7p2\\u2009= 0.035, b\\u2009= \\u2212\\u20090.071, t(80) = \\u2212 1.49, CI95% [\\u2212 0.17, 0.024], p\\u2009> 0.05. The effect of condition was also nonsignificant, F(1, 80) =\\u20091.06, p\\u2009= 0.31, \\u03b7p2\\u2009= 0.013, b\\u2009= 0.018, t(80) =\\u20090.52, CI95% [\\u2212 0.051, 0.088], p\\u2009> 0.05.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fig5\"], \"section\": \"Mediation and moderation\", \"text\": \"In Step 2, predicting math GRE, the effect of OSPAN, b\\u2009= \\u2212\\u20090.0018, t(26) = \\u2212 0.79, p\\u2009= 0.43, was nonsignificant. The effect of condition, b\\u2009= 0.064, t(26) =\\u20091.9, p\\u2009= 0.076, approached significance, indicating that the threat effect predicted higher scores on the math GRE. The interaction, b\\u2009= 0.0054, t(26) =\\u20092.15, p\\u2009= 0.041, and the model, F(3, 26) =\\u20094.10, Multiple R2\\u2009= 0.32, p\\u2009= 0.017, were significant. Simple slopes tests indicated that under threat, higher WMC participants had higher predicted scores on the math GRE compared with those with lower WMC, b\\u2009= 0.0036, p\\u2009= 0.0051. There was no significant change in predicted GRE scores in the control condition, b\\u2009= \\u2212\\u20090.0018, p\\u2009= 0.43 (see Fig. 9). Based on the significant interaction term, an ANOVA was conducted in order to determine whether there was a significant difference in the variance accounted for between these models. Results revealed a significant difference in the variance, F(1, 26) =\\u20094.62, p\\u2009= 0.041, providing additional support for moderation (see Fig.\\u00a05).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fn19\", \"CR40\", \"CR39\", \"Fig1\", \"Fig4\", \"Fig5\"], \"section\": \"Discussion\", \"text\": \"There was limited evidence for ST effects impacting performance on outcome measures of WMC and standardized test performance. Only in the case of ST for verbal capacity and intelligence on the RSPAN did these data reveal evidence of a threat effect. Overall, students appeared to be resilient to the effects of ST. Here, it was expected that because the student sample came from a less \\u201cselective\\u201d19 population of undergraduate students, ST effects would be found in addition to evidence that ST is moderated by high trait WMC and mediated by state WMC. Instead, there was no evidence to support the notion that state WMC mediates ST\\u2014in most cases there was no threat effect revealed. Although, there was a pattern of some evidence supporting our second hypothesis that trait WMC moderates the effect of ST on both math and verbal SDTP providing a performance benefit for Black students. We believe this could be the case because having higher WMC span corresponds with more domain-general resources, which is less limiting for higher WMC span (see Kovacs et al., 2019; Kovacs & Conway, 2016). Moreover, these greater domain-general resources means that performance is less impacted for higher WMC span than lower WMC span students when subjected to the cognitive demands of identity-threatening situations like racial/ethnic ST effects. Moreover, like in Experiment 1, higher trait WMC students may be able to remain resilient and take on ST as a challenge depending on how ST is activated and the performance domain. In Experiment 2 however, our students showed resilience to ST effects on the second GRE task\\u2014when the performance domain and domain of ST were different (see Figs. 1, 4, and 5).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fig6\"], \"section\": \"Mediation and moderation\", \"text\": \"In Step 2, predicting math GRE, the effect of RSPAN, b\\u2009= 0.0025, t(78) =\\u20091.39, p\\u2009= 0.17, condition, b\\u2009= 0.058, t(78) =\\u20091.44, p\\u2009= 0.16, the interaction, b\\u2009= 0.0026, t(78) =\\u20090.957, p\\u2009= 0.34, were nonsignificant. The model, F(3, 78) =\\u20093.78, Multiple R2\\u2009= 0.127, p\\u2009= 0.014, was significant. We also tested the change in model variances and simple slopes. Although there was not a significant change in the model variances accounted for F(1, 78) =\\u20090.92, p\\u2009= 0.34, under threat, higher WMC participants had higher predicted scores on the math GRE compared with those with lower WMC, b\\u2009= 0.0051, p\\u2009= 0.013; however, there was no significant difference for those in the control condition, b\\u2009= 0.0025, p\\u2009= 0.17. Taken together, these results reveal weak support for trait WMC on the RSPAN moderating the effect of threat on the math GRE (see Fig.\\u00a06).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"Fig7\"], \"section\": \"Mediation and moderation\", \"text\": \"In Step 2, predicting verbal GRE, the effect of RSPAN, b\\u2009= 0.0014, t(78) =\\u20090.69, p\\u2009= 0.049,\\u00a0was significant, however, condition, b\\u2009= 0.031, t(78) =\\u20090.71, p\\u2009= 0.48, was nonsignificant. The interaction, b\\u2009= 0.0057, t(78) =\\u20091.94, p\\u2009= 0.056, was marginal at the 0.056 level and the model, F(3, 78) =\\u20093.83, Multiple R2\\u2009= 0.1283, p\\u2009= 0.013, was significant. We also tested the change in model variance, which was also marginally significant at the 0.056 level, F(1, 78) =\\u20093.76, p\\u2009= 0.056. To further unpack this, we examined the simple slope analysis revealing that under threat higher WMC participants has higher predicted scores on the verbal GRE compared with those with lower WMC, b\\u2009= 0.0070, p\\u2009= 0.0018. There was no significant change in predicted GRE scores in the control condition b\\u2009= 0.0014, p\\u2009= 0.49. Taken together, these results also reveal weak support for trait WMC on the RSPAN moderating the effect of threat on the verbal GRE (see Fig.\\u00a07).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR47\", \"Tab5\"], \"section\": \"Bayesian regression analyses\", \"text\": \"We used the BayesFactor package in R (Morey & Rouder, 2015) to compute Bayes factors for main-effects-only models and the main effects plus the interaction terms models. Each of these was run compared with a null or intercept only model. To test for moderation, we computed the Bayes factors for the main-effects-only model and compared those to the main effects plus interaction term models. We include the full results of the Bayesian regression moderation analyses for the quantitative capacity condition in Table 5 and discuss the relevant results of these analyses in more detail below.\\n\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR47\", \"Tab6\"], \"section\": \"Bayesian regression analyses\", \"text\": \"We used the Bayes Factor package in R (Morey & Rouder, 2015) to compute Bayes Factors for main-effects-only models and the main effects plus the interaction terms models. Each of these was run compared to a null or intercept only model. To test for moderation, we computed the Bayes Factors for the main-effects-only model and compared those to the main effects plus interaction term models. We include the full results of the Bayesian regression moderation analyses for the verbal capacity condition in Table 6 and discuss the results in greater detail below.\\n\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR31\", \"CR59\", \"CR63\", \"CR64\"], \"section\": \"General discussion\", \"text\": \"The goal of the present work was to investigate WMC as a cognitive factor in relation to performance under ST. We explored WMC as both state and trait variables as a mediator and/or moderator of ST for race/ethnicity. Although WMC has been previously explored as a state and trait variable (Ilkowska & Engle, 2010) and previous research has suggested the importance of considering individual differences in WMC with regard to ST (Regner et al., 2010; Schmader & Johns, 2003; Schmader et al., 2008), to our knowledge, exploring ST with both state and trait measures of WMC is novel and has not been investigated for ST for race/ethnicity.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR9\", \"CR43\", \"CR71\", \"CR14\"], \"section\": \"General discussion\", \"text\": \"Based on the results of two experiments, we found that students appeared to be resilient to the effect of ST on standardized test performance\\u2014the ST manipulation did not consistently produce a performance decrement for Black students. We found evidence of ST only in Experiment 2 for the verbal capacity and intelligence condition and only on the state WMC reading span (RSPAN) measure. Because participants appeared to be resilient to ST, we wondered whether this was due to differences in these students\\u2019 prior experience with standardized tests and beliefs about their abilities. Experiment 1 contained an especially highly motivated, high achieving student sample, so we wondered whether there was an overarching implicit belief among these students about high ability that attenuated the effect of ST on performance. This was suspected post hoc based on the method of how ST was induced\\u2014stating that the task was \\u201chighly correlated with measures of intelligence,\\u201d could activate a broader identity among these participants (see Brannon et al., 2015, for more on the shifting of the self-schema in different contexts; also, Logel et al., 2009, for shifting self-schema during ST). For example, the identity of being a student at a selective private university could protect students from feeling threatened and incidentally provide performance enhancement rather than decrement. However, based on similar patterns in Experiment 2, this did not seem as plausible but was worth mentioning here, as ST theory asserts the importance of domain identification (see Steele, 1997). In terms of resilience and motivation, previous research demonstrated that higher trait resilience as measured by grit has been shown to be moderately but not significantly associated with higher WMC (Dale et al., 2018). At present, these data are unable to disentangle whether differences in students\\u2019 beliefs about ability impact WMC or ST and performance, but future work should investigate this further.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR72\", \"CR31\", \"CR67\", \"CR77\", \"CR17\"], \"section\": \"General discussion\", \"text\": \"Another perspective on these results is that having higher trait WMC resources could allow participants experiencing ST to respond by taking on the standardized test as a challenge. Some previous work suggests that racial minority students are able to perform well on standardized tests when they are viewed or characterized as a challenge (Steele & Aronson, 1995). However, if students have more cognitive resources in WMC, they might differ in the ways they focus those resources and plan to complete the task at hand. There is previous research on individual differences in WMC that shows higher and lower span individuals differ in their strategies for approaching, performing, and solving different cognitive tasks (see Ilkowska & Engle, 2010; Shipstead et al., 2016; Unsworth et al., 2013; also see Delaney & Sahakyan, 2007). However, our results suggest the ability to do this in the context of ST could also differ depending on the whether the test is in the quantitative or verbal domains. Additional research is needed to better understand individual differences in WMC and how students respond to racial/ethnic ST in different task domains.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR10\", \"CR27\", \"CR44\", \"CR54\"], \"section\": \"General discussion\", \"text\": \"Another consideration is about how we found more cases of moderation when baseline WMC was on the RSPAN. This suggests that the predictive power of trait WMC using OSPAN or RSPAN may bring dissociations in the effects observed. In fact, previous research has found that because the OSPAN and RSPAN have different processing components (i.e., solving math problems for accuracy in OSPAN and reading sentences for accuracy in RSPAN), could cause the tasks to dissociate in their consistency for predicting performance on different outcomes (see Chow et al., 2016; Holden et al., 2020; Macnamara et al., 2011; Oberauer, 2009). However, this interpretation is speculative but worth further exploration.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR63\"], \"section\": \"Limitations and future research\", \"text\": \"Overall, across two experiments, we only found ST in Experiment 2 on the state WMC RSPAN measure\\u2014highlighting replication issues. It was unclear whether replication issues were due to the language used in our manipulation (based on Schmader & Johns, 2003, stating \\u201cthis task is a measure of quantitative/verbal capacity and is highly related to measures of intelligence\\u201d) or based on issues with sample size.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR62\", \"CR61\", \"CR73\", \"CR23\", \"CR72\"], \"section\": \"Limitations and future research\", \"text\": \"We aimed to recruit adequate sample sizes, but due to reported replicability issues of ST (Schimmack, 2016, 2017; also Stricker & Ward, 2004), and potential effect-size inflation due to publication bias (see Flore & Wicherts, 2014), the sample size needed is much higher than originally anticipated (unknown to us at the time of planning the current study). For these reasons, we recommend recruiting samples that exceed 80% power based on effect sizes reported in the literature. To help clarify replication issues, we also recommend that future investigations use more direct language as in the manipulation based on the Steele and Aronson (1995) study (i.e., \\u201cthis task is diagnostic/non-diagnostic of ability\\u201d).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR15\", \"CR16\", \"CR79\", \"CR30\"], \"section\": \"Limitations and future research\", \"text\": \"The verbal section largely includes vocabulary (i.e., text completion and sentence equivalence) and reading comprehension items. Based on previous research the reading comprehension items have been linked with working memory and fluid intelligence and thus are likely to be more cognitively demanding than the text completion and sentence equivalent items (Daneman & Carpenter, 1980; De Jonge & De Jong, 1996; also see Vernucci et al., 2021). The quantitative reasoning section includes arithmetic, algebra, geometry, and data analysis problems. Previous research suggests that geometry may be the most complex or cognitively demanding among these question types, and differences in gender threat have been shown (see Huguet & Regner, 2007).\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR5\", \"CR6\"], \"section\": \"Limitations and future research\", \"text\": \"In general, more complex questions which involve more steps in order to arrive at a solution are also more cognitively demanding, so there is probably more to consider for understanding the cognitive load among the items and question domains (see Beilock & Carr, 2005; Beilock & DeCaro, 2007). For both the verbal and quantitative test sections several steps were involved in arriving at correct answers, and some questions had multiple correct answers, making it even more difficult to know which might be more cognitively demanding than others. Although this is a very interesting point, it is difficult to answer and remains outside of the scope of the present work but could be investigated more in future work.\"}, {\"pmc\": \"PMC12695912\", \"pmid\": \"40389703\", \"reference_ids\": [\"CR29\", \"CR74\", \"CR34\", \"CR35\", \"CR36\", \"CR58\", \"CR83\", \"CR48\", \"CR49\"], \"section\": \"Conclusion\", \"text\": \"Our current work emphasizes the need to explore and challenge a fundamental assumption of universality of human information processing by factoring in the racial sociocultural context of individuals performing cognitive tasks related to academic achievement (Holden et al., 2023; Thomas et al., 2023). Although inequality and gaps in education and achievement remain, the current work helps provide a better understanding of individual differences in cognitive resources that are available for performance on competitive standardized tests. We employed a comprehensive approach, combining experimental and differential methods for investigating ST\\u2014helping the field continue to uncover how, when, and why ST operates and how to better combat it. Based on our results, future work should focus on ways to conserve precious mental resources, especially for minority students. As recent work finds that WMC is enhanced through training in general (see Jaeggi et al., 2008, 2011, 2013; Redick et al., 2013) and in more diverse and at-risk samples (see Wong et al., 2024), future work should consider additional forms of cognitive intervention that are helpful for students who are vulnerable to ST for race/ethnicity. For example, mindfulness practices might be an important avenue for future research related to ST as these practices as forms of self-regulation have been shown to improve cognitive functioning, WMC, and potentially improve SDTP (Morrison & Jha, 2015; Mrazek et al., 2013).\"}]"

Metadata

"{}"