Saturday, May 7, 2016

grammatical number - Is "data" treated as singular or plural in formal contexts?



My non-native English speaking friend just asked me: "Data is..." or "Data are..."?



I said both but that's because I've been desensitized from reading/writing both (especially from writing code and adding quick comments).



My question: Is it acceptable to utilize either for a university paper? Or is one safer than the other (when confronted with stickler professors)?







Related questions:




Answer



I have actually considered this quite a bit, being both a linguist who studies these things, and a scholar who publishes papers.



Etymologically speaking, the word data is the plural of datum in Latin. In Latin, data would get plural verb agreement. Now, languages borrow words and do whatever they want with them, so this historical fact about data has no relevance in judging what is "correct" in English. There is significant evidence that data has established itself as a mass noun in English, suggesting that, for most people, "data is" is the most natural way to speak.




However, in a university/scholarly paper, I would recommend using "data are", rather than "data is".



The reason: some stickler professors and pedantic scholars believe that, logically, if datum is an English word for a single piece of data (which it is), that data must logically be plural. The fact that most people do things differently only means, to them, that most people are doing it wrong. Whether you agree with that or not is somewhat irrelevant.



So you have two choices.




  1. If you use "data is", then reasonable people (yes, I am biased) who read your paper will not bat an eye, but stickler professors might judge you on your perceived ignorance or inappropriate level of informality.


  2. If you use "data are", then the stickler professors will not judge you to be ignorant, and the reasonable people will think "that's an acceptable variant" or "this person is a stickler for language" (or if they are me, will think "this person is pandering to the sticklers — a necessary evil"), but nobody will think you are ignorant.





So, choosing (2), "data are" is clearly your safest bet, and is what I always do (and what I find nearly all of my colleagues do).


No comments:

Post a Comment