I recently had the extremely uncomfortable experience of taking a timed test for employment that wanted me to create a balance table for an experiment with three treatments and having no idea how to do it.
This was uncomfortable because I know, in theory, how a balance table ought to work. I’m trained as a statistician. It’s remarkably embarrassing to stumble on your supposed “core competency.” I wound up turning in some half-assed tables glued together in Excel. Needless to say, I didn’t get a call back.
So, this post is for those of you who find yourself in the unenviable position of having a little too much book knowledge and a little too little practical know-how when it comes to statistical analysis. (Let’s be real – this post is also for me to prove to myself that I do know some of the things!)
To start off: the basic idea of a balance table is that we want to assess whether our randomization worked. We’re interested in assigning people to two or more treatments, and a balance table is a nice check that we haven’t assigned all of the men to treatment 1 and all of the women to treatment 2, or some nonsense like that.
Then, for an experiment that just has treatment and control, we usually just conduct a t-test on a variety of participant traits that we’ve gathered data for.
In the balance table shown below, for the “Incentives Work” paper by Duflo, Hanna, and Ryan, the authors take an equivalent tack – they regress each characteristic on a variable that is 0 for control units and 1 for treated. Then, the coefficient that they get from that regression is just the difference between treatment and control on this trait, and the standard error of that coefficient tells them whether there’s a significant difference between the two groups.
Unfortunately, understanding this doesn’t get us very close to being able to construct a table that’s suitable for publication. And indeed, the replication files for this particular table don’t shed any light on how it’s constructed. My guess is that the log file that the replication do-file spits out is then reformatted by hand for LaTeX?
If we’re handling a regression type output in Stata, we could use estout to package it up nicely, but what if we wanted to more explicitly compare means? Estout doesn’t seem to see t-test outputs, since they’re not considered “estimates.”
Instead, after a t-test, we can type “ret li” (short for “return list”) and that’ll spit out the numbers that we care about. Specifically, we want to preserve r(mu_1) and r(mu_2), our estimates of the means; r(N_1) and r(N_2), the number of observations in each treatment group; and most importantly, either r(t) and r(df_t), the t-stat and its degrees of freedom, or r(p), the probability that our two means are different from each other.
To save these, we’re going to generate results variables.
This creates the following:
|.641||39||.658||41||-.162||78||.871||open||Proportion of Schools Open|
This is honestly already so much better than what I ended up with on this job exam that I feel a little silly not learning it before.
Clearly here, you want to check the balance of more than one variable, so you’d loop over variables and add a local variable to keep tally of which row to put things in. See this article in the Stata Journal for an example of that – I basically pulled the code from that.