To block or not to block, that is the question

by Nicole E. Pashley

on Mar 6, 2023 · 10 min read · multisite trials Neyman blocking ·

If you’ve ever taken a statistics course, you’ve likely heard the famous saying, typically attributed to George Box,

“Block what you can; randomize what you cannot.”

But is this advice always sound? Or are there situations where blocking will not help, or may even cause harm? In this post we dig into when and why this saying can move from being absolutely true to only mostly true.

First, a quick review: Experiments and randomization

In an experiment, the researcher randomly assigns units to treatment and control groups. Under complete randomization, a fixed number of units are randomly selected to receive treatment. For example, the experimenter may put 50 names in a hat and let the first 25 names pulled out be assigned to treatment and the rest assigned to control. Randomization assures us that on average our treatment and control groups look similar. However, things can go wrong. For example, in a medical study we may end up, by chance, with most of the elderly patients receiving active treatment and most of the young patients receiving placebo. Comparing these two groups would likely give us a poor estimate of the treatment effect. Although our estimator is unbiased due to the on-average balance, we may have large variability, leading us to have estimates that are often far from the truth.

What is blocking, and why might it help?

We can be more clever and ensure better balance than in complete randomization, at least with respect to certain covariates, by block randomizing. To use blocking, we first break units into groups, called blocks, based on some feature(s) or covariate(s) believed to be predictive of outcomes. Then we randomize within each block. This ensures that for each group of similar units (now grouped together in a block), some are assigned to control and some are assigned to treatment. Hopefully, this improved balance will reduce variance, i.e. increase precision, of our treatment effect estimators. In our medical example, for instance, if we block on age we will always have the same mix of young and old in each treatment group.

But is Box’s advice always good? Should we always block when we can? In other words, will a blocked design always reduce variance compared to a completely randomized design?

Types of blocks

To assess whether Box was right, we first need to recognize that not all blocked designs are created equal. In fact, not all blocks are even created the same way! In Pashley and Miratrix (2021), we identify three primary ways blocks are formed:

Fixed blocks: Imagine blocking on a single categorical covariate. For example, if you block on `age group’, you might have the following four blocks: “Children (0-14 years)”, “Youth (15-24 years)”, “Adults (25-65 years)”, and “Seniors (65+ years)”. Here there are a fixed total number of blocks (four in the example) and each unit in the population or sample belongs to one of these blocks based on their covariate value (a 12 year old would belong to the “children” block).
Structural blocks: Here the units have some natural grouping or structure such that there are many groups, or blocks, and each block has a fixed and finite number of members. Think of a twin study in which each pair of twins is one block with shared genetics. This blocking is also common in education where treatments may be randomized within classrooms, schools, or school districts. Another example is multisite trials where sites, such as villages or hospitals, can be thought of as blocks. These types of blocks may be used due to practical convenience in randomization rather than as a means to lower variance. As a note, this type of block may be more commonly thought of as clusters, but they are better described as blocks here because assignment occurs at the unit level (rather than the cluster level).
Flexible blocks: Imagine trying to group units “as best we can” based on many covariates or a continuous covariate. Here there is not a fixed number of blocks or even fixed block membership. Instead, block structure depends on the covariate distribution of the observed sample. This type of blocking is common in practice but understudied in the literature. One example: grouping a set of schools based on their similarity on prior test scores, geographic region, and poverty levels prior to randomization.

Performance of blocking: Overview

With these block types in hand, we can consider their performance in two different settings: the finite sample and the superpopulation. We can summarize our findings of when blocking has the potential to harm or help precision compared to complete randomization in the following table (adapted from Pashley and Miratrix (2022)):

Setting	Block type	Blocking can harm?	Blocking can help?
Finite sample	Any	✔	✔
Finite sample: asymptotic	Fixed blocks	✖	✔
Finite sample: asymptotic	Structural blocks	✔	✔
Stratified random sampling	Fixed blocks	✖	✔
Random sampling of blocks	Structural blocks	✔	✔
Simple random sampling	Flexible blocks	✖	✔

The first three rows of the table summarize finite sample results of when a blocked design could be harmful (i.e., decrease precision) compared to a completely randomized design. In the finite sample setting, we take the units in the experiment as fixed and aim to estimate the average treatment effect for just those units. For example, in an education experiment we may perform inference only for the schools in our experiment.

The superpopulation settings are the second three lines. In these, we assume that the sample of units in our experiment is randomly drawn from some larger (typically assumed infinite sized) population which we wish to generalize results to. For example, in an education experiment we may wish to generalize inference from schools in our experiment to all schools in the country. Hence, when targeting a superpopulation we have two sources of randomness: the random assignment of units to treatments and the random sampling of units into the experiment. This implies that we need to assume some sampling mechanism (i.e., how units are selected to be in the experiment) to derive results. The sampling mechanism typically differs based on block types and is indicated on the table as well. For flexible blocks, simple random sampling of units is natural. For structural blocks, random sampling of the blocks themselves makes the most sense. And for fixed blocks, researchers typically assume stratified random sampling (i.e., independent random sampling of units for each block). The last three rows of the table summarize superpopulation results of when a blocked design could be harmful (i.e., increase precision) compared to a completely randomized design. We next delve into these results, noting that we make the simplifying assumption that the proportion of treated units is the same within all blocks and is equivalent to the proportion we would use under a completely randomized design. See Pashley and Miratrix (2022) for discussion of the unequal proportions case.

Some further details

Regardless of block type, for finite-sample inference, blocking is not guaranteed to improve the precision of our estimators. Blocking will improve the precision of our estimators if a measure of within block variation is smaller than a measure of between block variation. In other words, if the units within our blocks are very similar but the blocks look quite different from each other in terms of the units’ outcomes, blocking will be helpful. See Pashley and Miratrix (2022) for full mathematical expression of the differences in variances (and therefore the potential benefit or harm of blocking). However if we think of asymptotics, i.e., what happens as our sample size grows, the answer for the finite sample becomes more nuanced.

For fixed blocks, if we consider the asymptotics of finite sample inference or if we consider the superpopulation setting, blocking will be beneficial or at least not harmful in terms of precision. Therefore, with fixed blocks, for instance blocking on age category, we should feel fairly safe that blocking will likely improve precision. Note that in terms of asymptotics, with fixed blocks it is natural to imagine the total number of blocks remaining fixed but the size of each block growing. The benefits of blocking are fairly similar for flexible blocks, though the asymptotics are more difficult due to dependence on the algorithm being used to do the blocking. But for fixed or flexible blocks in the superpopulation, even if we are blocking on a covariate (or multiple covariates) that is not predictive of outcomes, we have guarantees that the variance under a blocked design will be as good or better than the variance under a completely randomized design.

Structural blocks are different: we find an absence of guarantees for no harm of blocking in terms of precision in any settings (finite, asymptotic, or superpopulation). In fact the harm of blocking in finite sample settings is most easily illustrated with structural blocks. For instance, imagine conducting an experiment to study the impact on academic performance of an online educational program within a group of elementary schools where we block on classrooms. That is, within each classroom in the study, some students are randomly assigned to receive the online program and the rest of the students within the classroom do not receive the online program. This type of blocking would help remove teacher effects, but if those effects are not large we may end up in a situation where blocking can actually cause harm. Specifically, harm due to classroom blocking is possible when elementary schools attempt to assign students across classrooms such that each classroom has a similar distribution of high and low achieving students. Such classroom assignment leads to substantial heterogeneity in terms of academic outcomes within the classroom, but the classrooms all look similar to each other (low across classroom heterogeneity). In other words, if schools are working to make classrooms look the same as each other, we are in exactly the situation we don’t want when blocking! With structural blocks, the natural asymptotic framework is for the number of blocks to grow while the block sizes remain the same, i.e., adding more blocks to the sample. If we are adding more “bad” blocks to our sample, as in our classroom example, increasing sample size will not fix the harm of blocking, if we think of the relative amounts of uncertainty! Although structural blocks provide the easiest means to create hypothetical “bad” situations, structural blocks can be beneficial if they create homogenous groups of units such that the within block variation is lower than the across block variation. They are often blocks of convenience (or necessity) due to the realities of an experimental design. Therefore, we advise some mild caution with these types of blocks and encourage careful consideration of subject matter knowledge for the problem at hand. However, in Pashley and Miratrix (2022) we find in simulations and numerical explorations that the potential harms of blocking tend to be small and the gains large.

Summary

In summary, when considering the benefits of blocking, we urge the practitioner to take into account the type of blocking being done. Throughout this discussion we have focused on the case with equal proportions treated across blocks–the story becomes even more complicated if we allow these proportions to vary (as a brief rule of thumb, unequal proportions usually makes the cost of blocking higher, meaning it is easier to land in a context where blocking is not as beneficial if the blocking is not meaningfully isolating variation). Overall, we have found that if blocks are formed in a sensible manner using subject matter knowledge, the gains of blocking should outweigh the potential harm.

We therefore propose the following small update to Box’s famous quote:

“Block what you can; randomize what you cannot.” – George Box

“But use extra caution with structural blocks.” – Nicole Pashley & Luke Miratrix

Acknowledgements

Some materials from this blog taken from Pashley and Miratrix (2022) and Pashley and Miratrix (2021). For more details, including mathematical expressions, what happens when you ignore blocking, and more exploration of flexible blocks, see Pashley and Miratrix (2022). Thanks to the CARES lab and writing group for reading and review. And thanks to ChatGPT for converting the table from latex to R Markdown. Image credit to pixabay.com (link).

References

Pashley, Nicole E, and Luke W Miratrix. 2021. “Insights on Variance Estimation for Blocked and Matched Pairs Designs.” Journal of Educational and Behavioral Statistics 46 (3): 271–96.

———. 2022. “Block What You Can, Except When You Shouldn’t.” Journal of Educational and Behavioral Statistics 47 (1): 69–100.