Margin of Error #5: Don't just preserve the long-form census. Set its data free

Blacked-out census dataI suspect you already know why I think scrapping the long-form census is a terrible idea. Good data is good for society. Done right, statistical research keeps us all honest, forcing us to interact with the world as it actually is, rather than imagining ourselves as part of a reality that is personally or politically convenient. Survey research is plagued with selection bias, and the only institution with the power to gather high-quality data for social science is Statistics Canada. The government’s purported privacy concerns with the long form are justified by a set of preposterous (and ideologically motivated) myths.

But if I’ve been reluctant to argue this at length, it’s because StatsCan has never done much to earn my goodwill. As a journalist interested in statistics, I have come to expect frustration and disappointment when dealing with StatsCan. That’s why I hope that we can take this opportunity to talk about how it could be better—rather than fighting blindly for the status quo.

The most serious problem with Canada’s data authority is access to data, or more accurately, the lack thereof. And all of the restrictions on access, going back many years, have been justified by some extremely strange concerns around privacy. Sound familiar?

On Statistics Canada’s website, you’ll find a variety of publicly-available summary statistics. If you’re willing to pay a more or less reasonable fee, you can buy access to other simple time series—the unemployment rate going back several decades, for example. But while subject- and neighbourhood-level summaries (or “metadata”) can be useful in some contexts, the most valuable census product is “microdata”—individual-level results, coded and cleaned up so that anyone with statistical software can create their own metadata or run regressions.

A microdata record for a given household can contain a lot of personal information. But I’ve been up to my elbows in U.S. Census microdata, analyzing some of the most sensitive information it contains—right down to sexual orientation and income—and I’m confident asserting that nobody has ever recognized themselves or a neighbour in a census record.

That’s because when it’s creating public-use microdata, the U.S. Census Bureau modifies records in specific ways that further obscure people’s identities without affecting researchers’ analysis. They only provide broad location information, for example. As a result, public-use records are specific enough to be useful, but not refined enough to be identifiable.

That, presumably, is why the U.S. government is comfortable posting large samples from its public-use microdata on the ungated web. Anyone can download detailed census microdata for 14 million Americans, and even more exhaustive American Community Survey and General Social Survey data.

Public use microdata samples exist in Canada as well, but you have to be affiliated with an approved university to access them, and the data is substantially degraded in the name of privacy. There is a lot that you can’t do with the Canadian data, because so many useful variables aren’t included, and because 2.7% of the Canadian population is a pretty small sample if you’re already researching a tiny minority, like, say, lesbian families. That’s how, as an undergraduate, I ended up studying the American census even though I could download Canadian microdata.

Lucky for academics, if not anyone else, StatsCan does provide more detailed microdata at a small number of physical locations, the Research Data Centres. Applying for access to the centres is a Byzantine process, only open to students and academics. (Today, as a journalist, I wouldn’t even be considered.)
Researchers must prepare a lengthy proposal for StatsCan, laying out their objectives, and describing and justifying their methodology. Proposals must include information about the accomplishments of the applicants, including “identifiable contributions made by the applicants to the advancement, development and transmission of knowledge related to the disciplines supported by” the Social Sciences and Humanities Research Council.

If approved, researchers have to go through a security screening process and sign a contract with StatsCan. The process can take years, which means that only academics prepared to devote their life to social research ever have access to the RDCs. But a form that asks how many bedrooms you have is fascist, right?

Even government employees have to jump through hoops to access StatsCan’s most useful data: all requests are assessed by a provincial or territorial representative on the Federal-Provincial-Territorial Consultative Council on Statistical Policy:

The request for access is submitted to the Program Manager of Statistics Canada’s Research Data Centre Program who coordinates a review of the proposal by Statistics Canada subject matter experts. The review is completed within 10 working days. If it is determined that Statistics Canada can quickly and efficiently carry out the work, the Departmental representative will be informed of this and of the associated cost to complete the work. However, if Statistics Canada does not have the resources to complete the work quickly and efficiently, the provincial/territorial employee identified is eligible to become a “deemed” Statistics Canada employee, under Section 10 of the Statistics Act, for purposes of completing the work.

Now that’s small government at work.

The upshot of all this is that journalists, bloggers, businesspeople, students, and anyone else with a copy of SPSS and a dream ends up studying the United States rather than Canada. Consciously or not, that influences Canadian identity. It drastically reduces the rewards that we could reap in return for all of the money and time spent administering the census.

It’s also colossally unfair. Sure, the long form means giving up some privacy, and yeah, it’s a hassle. But I don’t need much in return—I just want the right to access the results myself, without getting a PhD and then staring down StatsCan’s bureaucracy. I also want smart people everywhere—not just a few academics—to be able to refine that data gold mine into information that can improve my life. If the underfunded, under-siege U.S. federal government can do it, then surely Ottawa can try.