The Ask

This morning, I settled down with my cat, Padfoot, and finished reading Mary Roach’s fantastic new book, Grunt: The Curious Science of Humans at War.

It is, like much of her work, laugh-out-loud funny. My favorite part was the chapter about scents where someone declared a scent titled “U.S. Standard Bathroom Malodor” to be “wearable.” My second favorite part was a chapter where Roach plays on the concept of a “missile defense luncheon” by altering the phrase to describe other unpleasant kinds of luncheons. I didn’t know what was happening until the second time I saw it, but once I caught on, I loved it. I would adopt it for everyday life, if I thought anyone would get was I was referring to.

(Side note, and similar to this: in the book Fourth of July Creek, by Smith Henderson, one of Henderson’s characters uses “Wyoming” as a verb to substitute for sobbing. This made such an impression on me that I remembered that turn of phrase but couldn’t remember the details of the story it came from. I only found it by searching “Best Books of 2015” and “Best Books of 2014” until I saw something that looked like it might be the right book.)

In the acknowledgments section of the book, Roach highlights something that I’ve been thinking about a lot lately. She recounts her many asks that allowed this book to happen. “Hey… could you work me into combat simulations where I don’t belong?” and “Could you find someone to approve my spending a few days at sea…?” among others. She expresses gratitude that time after time, people said yes.

This stuck out to me because, as I’ve been working on blog posts for the past ten days, there have already been a few times where it would’ve been handy to ask someone else a question. I’ve thought about surveying political science professors on Facebook to get their sense of when it’s appropriate to schedule appointments. I have some future posts rattling around in my head that would clearly be made stronger and more interesting if I talked to someone first.

I’ve been noticing that all of the media I consume, at one point or another, requires its author to reach out and ask a stranger for a favor. And then to ask them questions!

To be fair, I’ve reached out and asked random people questions before – and when I did, I hid behind the badge of Harvard University. If you can lead an email by saying, “I am a graduate student at Harvard and I’m researching…” then you’ve bought yourself a badge of credibility. I assume the same is true for the folks at NPR and Mary Roach; they never really need to explain why they’re asking a question. They only need to show that it’s part of their job to ask questions.

All this to say: if I’m going to keep working on this for another 354 days, I’d better start getting comfortable with asking questions. The posts are worse for not interacting with anything.

(My second thought is that if you start asking questions, you have to producing a product that you’d be willing to share with the person who donated their time. I’m not sure that I’d be willing to share this blog with anyone as it currently stands. I can’t tell whether I’m being extra thoughtful here or just being cowardly.)


Trees in Grids: A Mystery

About a year and a half ago, I was working on a project that used West African data. We wanted to geocode some of the locations mentioned in our dataset, and so my coworker and I took to Google Maps to search for the cities and villages mentioned.
One evening, a couple of hours past when I really wanted to be at the computer lab, I was looking at satellite imagery of Cote d’Ivoire and saw this:

That caught my eye. There was something very uniform about these trees. I zoomed in.

This definitely isn’t an act of nature.

“So, what? It’s some kind of grove.” you say to yourself, getting ready to click back over to Facebook and check your notifications for the 537th time today. She still hasn’t liked your witty status. She isn’t going to like your witty status.

But hold on a sec, before you go down that rabbit hole of existential despair. This isn’t just one grove. Almost the whole coastline of Cote d’Ivoire is covered in these groves, and they keep going on into Ghana. I found this one in Benin.1


Originally, when I saw this, I thought I was seeing some kind of giant government program. However, that doesn’t seem likely given:

  1. It seems unlikely that Cote d’Ivoire has high enough state capacity to pull off a program of massive tree-planting, just knowing that it recently had a civil war. This wouldn’t necessarily disqualify it, since trees do take a lot of time to grow, but…
  2.  If we look at the border of Ghana and Cote d’Ivoire, there doesn’t seem to be a significant change in the pattern of groves. There’s no obvious sudden reversion to a natural tree pattern.

So, we’re probably looking at groves that are privately owned. They seem pretty damn large to me, though. And most don’t seem to have road access. (By “road” here, I’m including anything that just looks like some kind of worn trail.)

My guess, just from looking at pictures of trees of crops that are common in Cote d’Ivoire, is that these are cassava groves.

I literally just spent the last three hours of my life looking at Google Maps and rubber trees on the Internet, when I knew cassava was a big part of the agriculture in West Africa. Why?

Cassava makes a starchy tuber that can be turned into tapioca flour, and it’s a key component of the diet in Cote d’Ivoire.

artichoke heart
Unless we’re looking at the heart of an artichoke after removing the choke. There is a passing resemblance.

1. I’m not 100% certain that these are the same types of trees. The resolution makes it a little difficult to see the shape of the trees in Cote d’Ivoire. Side note to this footnote: if you look at any city in the United States, you can get down to the point where you can see the stripes that a lawnmower leaves behind on a yard. I wonder how Google Maps decides where to put its high resolution. Are they allowed to use it in China? Cape Town, in South Africa, has resolution equivalent to the US. Belarus does not. Back


So, I intended to keep this blog as anonymous as possible for as long as possible, but it seems that my instinct is to share, share, share. I have to give away some details. The story demands it. I am a student at Harvard and am currently writing this from a campus computer lab.

It is 12:44 am. I’ve just arrived in the computer lab from watching Alt-J’s concert, but I’m not alone here. There’s someone else watching some kind of video – when I was standing I could see it reflected in the glass behind him. It kept switching scenes, from men talking to cars driving, which gives me the impression that it’s a documentary. I also have that impression because it takes very, very large balls to watch a blockbuster in a computer lab, by yourself, at 12:44 am on a Friday night.1

I personally think it takes considerably fewer balls (smaller balls?) to come in to the computer lab at 12:44 am on a Friday and write a blog post, and here’s why: I know that I can no longer catch a bus (curse you and your Puritanism, Boston!) and I know that a Lyft (not Uber, obvs. Fuck Uber.) will be cheaper in one hour. So this is basically intro to microecon – do something that I would’ve done at home, with my cat, or do it here and save two dollas. I’ll take the two dollas. Sorry, Padfoot.

Anyway. I haven’t gotten around to why I need to reveal my location yet. This is a function of Friday wee hours writing. Sorry.

I have to reveal my location because I need to officially rave about Drink, a cocktail bar in downtown Boston, near South Station. This bar does “bespoke cocktails,” which means you can tell the bartender that you’d like to taste something that smells like the Corpse Flower and they’ll be like, “Hm. Okay.” And they’ll attempt to make you that beverage.

After the Alt-J concert let out, I was on my way to South Station when I remember Drink. I’ve never been before, despite living here for 3 years, because it’s not easy for me to get to. But now it was! It was only .3 mi, as the Google Maps put it, and I was game.

I waited 20 or so minutes in an un-air-conditioned hallway with a few men who were very interested in why I’d come to this bar alone. 2

Then, I was ushered to a seat. Brit was the bartender, and she immediately told me that it was at the pleasure of the people sitting next to me that I was allowed to join this table. (In retrospect, not sure whether this was true – there’s no reason it should’ve been true – but I liked it. I immediately started talking to those people.)

The people next to me were John, about to be engaged, Tim, and Andarla. At one point, Brit pulled out a massive cube of glassy ice and went at it with a machete. I told her I’d like a drink that tasted like dirt, and she handed me a drink with whiskey, an amaro, and honey (i.e. basically what I make myself at home, which was comforting rather than boring. It felt like she knew me.).
Brit was hacking her giant cube of ice and handed me some of it. I was like, “But do I have to hold this? It is very cold!” She seized it out of my hand and threw it on the floor. Everyone yelled, “Opa!” and then she handed me another ice block and let me throw it to the ground.

At one point, Tim told Brit to imagine that he was Bruce Wayne, coming to the bar to unwind after a long evening of fighting crime. Brit said, “Say no more.” Then, she took her machete to the cork of a champagne bottle, flinging the cork God knows where, and poured everyone in the vicinity a glass of champagne. She said to Tim, after he expressed concerns, “You’ll just pay for your glass. I haven’t had a chance to saber a champagne in a while, so it was mostly for that.”

I noticed her talking before she sabered the champagne. She said to her
coworker: “Three – no, four – glasses.” The fourth was for me.

1. I’ve never been sure – is 12:30 am Friday night or is it now Saturday morning? I know, technically, that it’s Saturday morning, but if you said to someone, “Yeah, bro! I was watching documentaries in this Harvard Computer Lab all Saturday morning!” they’d be like, “Why don’t you sleep in like a normal fucking person?” So it’s difficult to tell when the night ends and when the morning begins. Personally, I think the break is when you sleep for 1 hour or more. But if you stayed up until the sun rose, you wouldn’t call that Friday night, surely. Back

2. I assume you mean well, but there is still nothing more unrepentantly obnoxious than a man asking a young woman why she’s at a bar by herself. Would you ever ask this of a man? I have a strong desire not to have to justify myself, especially when I am having fun. Do not ask me how I can have fun without other people around. Just. Don’t. Do. That. You want to probe into the underlying psychological factors that allow me to feel comfortable alone, you’ll have to go get yourself a Masters’ of Social Work.Back

Did I Do That?

I’m writing this blog post from a public space. I often enjoy listening to music while I write – I put on Pandora’s “Today’s Alternative Radio” station and am rewarded by music that is generally upbeat and unsurprising. It’s not inspiring, but that means that it’s not distracting.

But just now when I was trying to put this music on, it wasn’t coming from my headphones. Instead, it was coming from the computer.

For me, this is a nightmare scenario. I am playing music and people can hear it. One of my biggest pet peeves is when a person on the bus plays music loud enough on their headphones that I can hear it. The small, tinny rhythm makes it difficult for me to focus on anything else, and I can’t turn up my music loud enough to block it or I’ll be an offender too!

This public space has some regulars, and one of them is an older lady who listens to the news on her computer without any headphones. It enrages me. I steam about the lack of awareness. Doesn’t she see that other people are trying to work here? Does she not care?

So anyway, having music come out of my computer puts me in exactly the same category of rude distractors that I judge. The hunter becomes the prey!1

What’s worse about this is that, usually, this is an error in the order of operations. If music isn’t playing in my headphones, it’s usually because I plugged them in before logging into my account. But this time, I’d plugged them into the wrong port – they were in the mike port.

Here’s why this is terrifying: I was watching tutorial videos about Python for about an hour this morning. Since then, I got up, taking my headphones, and got lunch. So, it’s not necessarily certain that I’ve been playing Python tutorials for an entire group of people today – but there’s this little crumb of doubt now. A persistent voice, whispering, “But what if you plugged your headphones in incorrectly this morning too?”

I like to think someone who tell me if I had.

I’d like to ask someone whether I did. But I don’t know anyone here well enough to voice this question, especially since it betrays that I put little stock in my own perception of reality. So what if I believed I was listening to headphones this morning? Isn’t it possible that it was playing out loud?

The uncertainty of others’ perceptions is something that troubles me nearly as much as the possibility that I might’ve been irritating everyone around me. There are an awful lot of questions about what other people perceived that we never get answers to.

Maybe our best bet is to be Ira Glass and the crew of This American Life? They ask friends and family what they perceived or how they felt and then put it on the radio. I’ll come back to this idea in another post.

1. This, in itself, is a good reason not to be so fucking judgmental all the time. You never know when you’re going to err, and if you turn the intense laser beam of your judgment on yourself, you’re going to get burnt.

Creating a Balance Table in Stata (Part 2)

Yesterday’s post showed how to save the output from non-estimate results in Stata to check the balance for an experiment. Today, we’ll talk about how we can show this for slightly more complicated situations.

Situation 1 – You Have Categorical Variables

This is almost the same as the quantitative data we were handling yesterday, but you can’t use a t-test or regression because both of these techniques are developed for continuous quantitative variables. Instead, let’s use Pearson’s Chi-Squared test for independence. The null hypothesis of this test is that our categorical variable is independent from the treatment; that is, knowing the treatment doesn’t give us additional information about the likelihood that the categorical variable takes a value. See this nice little site for more information, if you’re curious.

To calculate the chi-squared stat in Stata, we just use tab treat categorical_var, chi2 .

Then, if we ask for the results, ret li, we get the number of observations, the number of rows and columns, our chi-squared stat, and our p-value. If we choose to keep the chi-squared stat, we need to keep the number of rows and columns so that our reader can interpret it. Personally, I’d just save the p-value, since that’s what people will look up with the chi-squared statistic anyway.

To save our p-value, we do the same process as before:

gen double p_value = .
replace p_value = r(p) in 1

Situation 2 – You Have Three or More Treatment Arms

This is the situation I found myself in, as I was taking this employment test. I was paralyzed – I knew that a t-test was standard for comparing treatment and control, but I wasn’t sure what’s considered “standard” when you have more than two groups to compare.

However, regardless of what’s “standard,” statistics has a firm answer to this: you’re going to need an ANOVA, short for analysis of variance. This basically compares the variance that exists within group to that which exists between them – if there’s far more variance between the groups than within them, that suggests to us that maybe these groups are statistically different.

To run an ANOVA in Stata, you’d use anova variable_to_check treatment_indicator. But if we use ret li here to try and save our information, we’ll notice that we haven’t got any of the results that we care about!

Since ANOVA is classified under “Linear models and related” in Stata, we’ll instead need to look at eret li. Then, we’ll see that we have almost all of our results, with the noteable (and frankly, perplexing) absence of our p-value. However, we can generate this by setting the cell we want to put it in equal to Ftail(e(df_m), e(df_r), e(F)).

Note: if you’ve got categorical variables and three or more treatment arms, you’d still use a chi-squared test. That test can handle multiple treatments.

One Wrap-Up Note

I briefly considered leaving this for another blog post, since I’m reaching my word limit here, but I’ll be real: I have a strong desire not to write about balance tables for a third day in a row. So here’s a bit of a warning: plenty of people have noted that it doesn’t actually make sense to do a balance check for an experiment.

The whole point of statistics like the t-stat, ANOVA’s F-stat, and so on is that they’re meant to give us information about the population as a whole from the sample that we’re looking at. However, when we’re trying to deduce whether the treated group is different from the control, there is no larger population to make inferences about. This is it. So, if the proportion of men in control is not equal to proportion of men in treatment, we know that our randomization is not balanced on gender!

Then, the question should be: what is a large enough difference along one covariate to be evidence that this isn’t random? I can’t remember off the top of my head whether I’ve ever learned this, but my intuition is that you could figure this out by bootstrapping.

Here’s what I’m imagining: you’d through all of your observations into a bucket and you’d draw from that bucket with replacement to make one group that is marked as “control” and another group that is marked as “treatment.” You could repeat this 10,000 times and then compare your data to this generated set of distributions – if your data looks pretty weird compared to the generated data, then that’s indicative that something went wrong. But I’m just spit-balling. We’d have to prove it.

Second point: yes, balance checks don’t make a whole lot of sense for experiments. However, they certainly do make sense for natural experiments. If you have something distributed as-if-randomly, it’s a good idea to check it.

Third point: I’m not sure whether checking that the means are not significantly different is the best way to do this. Why don’t we check that the distributions aren’t significantly different? We do have statistics for that…

Creating a Balance Table in Stata (Part 1)

I recently had the extremely uncomfortable experience of taking a timed test for employment that wanted me to create a balance table for an experiment with three treatments and having no idea how to do it.

This was uncomfortable because I know, in theory, how a balance table ought to work. I’m trained as a statistician. It’s remarkably embarrassing to stumble on your supposed “core competency.” I wound up turning in some half-assed tables glued together in Excel. Needless to say, I didn’t get a call back.

So, this post is for those of you who find yourself in the unenviable position of having a little too much book knowledge and a little too little practical know-how when it comes to statistical analysis. (Let’s be real – this post is also for me to prove to myself that I do know some of the things!)

To start off: the basic idea of a balance table is that we want to assess whether our randomization worked. We’re interested in assigning people to two or more treatments, and a balance table is a nice check that we haven’t assigned all of the men to treatment 1 and all of the women to treatment 2, or some nonsense like that.

Then, for an experiment that just has treatment and control, we usually just conduct a t-test on a variety of participant traits that we’ve gathered data for.

In the balance table shown below, for the “Incentives Work” paper by Duflo, Hanna, and Ryan, the authors take an equivalent tack – they regress each characteristic on a variable that is 0 for control units and 1 for treated. Then, the coefficient that they get from that regression is just the difference between treatment and control on this trait, and the standard error of that coefficient tells them whether there’s a significant difference between the two groups.

Duflo balance table
It sure would be nice if I knew how to make one of these…

Unfortunately, understanding this doesn’t get us very close to being able to construct a table that’s suitable for publication. And indeed, the replication files for this particular table don’t shed any light on how it’s constructed. My guess is that the log file that the replication do-file spits out is then reformatted by hand for LaTeX?

If we’re handling a regression type output in Stata, we could use estout to package it up nicely, but what if we wanted to more explicitly compare means? Estout doesn’t seem to see t-test outputs, since they’re not considered “estimates.”

Instead, after a t-test, we can type “ret li” (short for “return list”) and that’ll spit out the numbers that we care about. Specifically, we want to preserve r(mu_1) and r(mu_2), our estimates of the means; r(N_1) and r(N_2), the number of observations in each treatment group; and most importantly, either r(t) and r(df_t), the t-stat and its degrees of freedom, or r(p), the probability that our two means are different from each other.

To save these, we’re going to generate results variables.

Code for balance table
Still to come: I learn how to use WordPress so that I can type code straight in instead of print-screening it like some kind of goon.

This creates the following:

mean_1 n_1 mean_2 n_2 t_stat df_t p_value varname varlabel
.641 39 .658 41 -.162 78 .871 open Proportion of Schools Open

This is honestly already so much better than what I ended up with on this job exam that I feel a little silly not learning it before.

Clearly here, you want to check the balance of more than one variable, so you’d loop over variables and add a local variable to keep tally of which row to put things in. See this article in the Stata Journal for an example of that – I basically pulled the code from that.

When to Talk to Professors?

After looking at Enos’s replication files, it kind of looks like I would have to ask him for the property record data if I wanted to redo his analysis on only people who’d recently moved in, so that is, for today at least, a hard pass. Although, it probably would be easy enough to check out, if we did have the data?

This is something that I’ve struggled with throughout graduate school – when do you have enough information to meet with or talk to a professor? Is it sufficient that you have an idea about another analysis that could be run on data that you suspect they have, or that you just like a paper that they wrote? Or is it the other side of the spectrum, where you should only talk to them if you’ve completed the lit review and theory section of your paper and all you need is their expert advice and their data?

I’ve always tended towards the “never, ever speak to a professor”-side of things, but if I were to write down top five reasons I’m considering dropping out of graduate school, “Lack of Sufficient Support from Professors” would probably be number four. It isn’t a far stretch to guess that there might be some relation between these things.

I took to Google to see what The Internet had to say about when it’s appropriate to schedule meetings with professors, as a graduate student, but The Internet was disappointingly quiet on the matter. The closest I got to an answer was this:

Manage Your Advisors.

Keep your advisors aware of what you are doing, but do not bother them. Be an interesting presence, not a pest.

Stephen C. Stearns, Ph.D.

So, that’s not super encouraging. Being an “interesting presence” seems like an especially large ask.

The nice thing about being overly shy about visiting professors is that I end up catching at least some “thought gaps” before running them by someone. For instance, I just realized that I would really need to be able to compare a person’s voting record from before their move to their voting record after their move to assess whether turnout stayed the same.

We also would have to consider that people who move to a new place are just less likely to vote in that new place, in the first election (I don’t know whether this is empirically true, but it certainly sounds like it). So, we might think that we’re observing reduced turnout because someone has stopped feeling racial threat, but we might actually be observing it because they’re new in the neighborhood and haven’t really gotten their bearings.

I’m not really sure where that leaves me on this. Perhaps back at the original conclusion of my post, “On the Move”: we need a natural experiment where some people, kind of randomly picked, are forced to move, and others are not.

Maybe looking at urban renewal is the right tack, and we’re just focusing on the wrong residents?