The Science of Grammar. Now there’s a topic calculated to keep people away in droves. Especially once linguist Geoffrey Nunberg revealed that his lecture at last week’s 2015 AAAS Annual Meeting would address this burning question: which is most correct, “data is” or “data are”?
Comic from xkcd
Yet the seats were full of scientists keen to hear about The Great Data Kerfuffle, as Nunberg calls the debate. It begins when scientists are still in college, he says. There, graduate students drill newcomers that “data” are plural, and “data is” is an abomination.
Classically, the word “data” is a plural noun—a word denoting a collection of items, like pebbles or stars—so it calls for a plural verb. You say “the data are compelling” in the same way you say “the stars are beautiful.” The linguistics of count and mass, Nunberg says, hinge on the difference between discrete, countable “stuff,” like stars, and homogenous, aggregated “gunk,” more like mud.
The usage rule seems straightforward enough, and in a survey of scientific papers across disciplines, Nunberg found a near-universal preference for using the plural verb. But there was a single exception: Computer Science practitioners use “data” with a singular verb, as in, “data is stored.”
Why the difference? Are the other disciplines just being pedantic, enforcing ancient Latin word rules in a changing world of language? “Maybe this is like over-pronouncing Italian when you’re ordering dinner at the Olive Garden,” said Nunberg.
But the linguist believes that language both structures and reflects reality, and can reveal ways of thought—so it’s worth delving into such questions. Is there something these differing uses tell us about ourselves?
Computer scientists deal with large amounts of data. They’re concerned with moving piles of it around, storing it, accessing it. The more they can move, the faster, the better. They think of data in aggregate terms, as “gunk,” and speak of it that way.
But those piles of data were once “stuff”—individual data points gathered, most likely, by a graduate student in the field or lab, painstakingly recording results. According to Nunberg, we say “data are” when we have such a direct interaction with data, at the individual piece-level.
Once the data are collected and analyzed, the information becomes a unified finding. Then the professor at the head of the lab might speak about the broad implications and say, “the data indicates…” But maybe it’s not surprising that graduate students enforce the “data are” rule. To those who labored to obtain it, the data are precious—every last one of them.