Meanwhile, over on the e-astronomer, Andy Lawrence recently posted an item about the lamentable tendency of astronomers to abuse the English language. The focus of his venom was “extincted”, a word used by many astro-types as an adjective to describe the state of affairs when light from a source (e.g. a quasar) has suffered “extinction” by intervening matter. “Extinction” is formed from the verb “extinguish” in the same way that “distinction” is formed from “distinguish”. Nobody would describe a professor as “distincted” (certainly not if it is Andy Lawrence) so, clearly, “extincted” is inappropriate. Actually if you really want to nit-pick you could object to “extinction” being applied to an object such as a quasar, when it isn’t actually the object that is suffering from it but the light it has emitted.
But as a gripe, this is fair enough I’d say. Andy went on to encourage his legions of adoring readers to contribute their own pet hates, preferably with an astronomical orientation. My contribution was “decimate” which means “to remove the tenth part” or “to reduce by ten percent”, from the Roman practice of punishing disobedient legions by killing every tenth man, but is often regrettably now used to mean “annihilate” or “obliterate”. You might think this hasn’t got much to do with astronomy but, sadly, it does. Indeed, a press release from STFC discussing the recent ten percent cuts to its grants budget states that consequent reduction in PDRAS
..will not cause the decimation of physics departments as has been speculated in media reports.
I would expect a civil servant to have done a bit better, so presumably this was written by an astronomer too. At any rate, it is precisely wrong.
You might argue that things like this don’t matter. Language evolves, and if modern usage deviates from its previous meanings then we should just let it change. I fully accept the dynamic nature of language and do not by any means object to all such changes. Society changes and so must the words we use. But if a change is (a) a result of sloppiness and (b) results in the loss of a very good use to be replaced by a bad one, then I think educated people should stand their ground and fight it. If we don’t do that language doesn’t just change, it decays.
Most of us practising scientists have to spend a lot of our time writing scientific papers, departmental memos, grant applications and even books. I think many astronomers see this activity as a chore, take no pleasure from it, and invest the minimum care on it. I was fortunate to have a really excellent writer, John Barrow, as my thesis supervisor and he convinced me that it was worth making the effort to write the best prose I could whatever the context. Not only does this attitude eliminate the ambiguity which is the bane of scientific writing. Taking pains over style and grammar also allows one to feel the pleasure of craftsmanship for its own sake. With John’s guidance and encouragement, I learned to enjoy writing through the satisfaction experienced by finding neat forms of words or nice turns of phrase. You never really feel good about what you do if you scrape through at the miminum acceptable level. Try to make the effort and you will be more fulfilled and the long hours of slog you spend putting together a complicated paper will at least be enlivened by a genuine sense of delight when things fall neatly into place, and a warm glow of achievement when you read it back and it sounds not just acceptable but actually good.
But I digress.
One of the other contributors to Andy’s list of examples of bad grammar was a chap called Norman Gray who objected to astronomers’ use of the word “data” as a plural noun, as in “the data indicate” rather than “the data indicates”. I was taken aback by this because I was expecting the opposite objection.
He has a lengthy rant about this on his own blog so I won’t repeat his arguments in detail here, merely a synopsis. The word ”data” is formed from the latin plural of the word “datum” (itself formed from the past participle of the latin verb “dare”, meaning “to give”) hence meaning “things given” or words to that effect. The usage of “data” that we use now (to refer to measurements or quantitative information) seems not to have been present in roman or mediaeval times so Norman argues that it is a deliberate archaism to treat it as a latin plural now. He also argues that “data” in modern usage is a “mass noun” so should on that grounds also be treated as singular.
For those of you who aren’t up with such things, English nouns can be of two forms: “count” and “non-count” (or “mass”). Count nouns are those that can be enumerated and therefore have both plural and singular forms: one eye, two eyes, etc. Non-count nouns (which is a better term than “mass nouns”) are those which describe something which is not enumerable, such as “furniture” or “cutlery”. Such things can’t be counted and they don’t have a different singular and plural forms. You can have two chairs (count noun) but can’t have two furnitures (non-count noun).
Count and non-count nouns require different grammatical treatment. You can ask “how much furniture do you have?” but not how many. The answer to a “how much” question usually requires a unit or measure word (e.g. “a vanload of furniture”) but the answer to a “how many” question would be just a number. Next time you are in a supermarket queue where it says “ten items or less” you will appreciate that it the sign is grammatically incorrect. “Item” is most definitely a count noun, so the correct form should be “ten items or fewer”..
Anyway, Norman Gray asserts that (a) ”data” is a non-count noun and that (b) it should therefore be singular. Forms such as “the data are..” are out (“a vile anacoluthon”) and “the data is…” is in.
So is he right?
Not really. Unkind though it may be to dismantle a carefully constructed obsession, I think his arguments have quite a few problems with them.
For a start, it seems clear to me that there are (at least) two distinct uses of the word data. One is clearly of non-count type. This is the use of “data” to describe an undifferentiated unspecified or unlimited quantity of information such as that stored on a computer disk. Of such stuff you might well ask “how much data do you have?” and the answer would be in some units (e.g. Gbytes). This clearly identifies it as a mass noun.
But there is another meaning, which is that ascribed to specified pieces of information either given (as per the original latin) or obtained from a measurement. Such things are precisely defined, enumerable and clearly therefore of count-noun form. Indeed one such entity could reasonably be called a datum and the plural would be data. This usage applies when the context defines the relevant quantum of information so no unit is required. This is the usage that arises in most scientific papers, as opposed to software manuals. “In Figure 1, the data are plotted…” is correct. Although it sounds clumsy you could well ask in such a situation “how many data do you have?” (meaning how many measurements do you have) and the answer would just be a number. Archaism? No. It’s just right.
To labour the point still further, here are another two sentences that show the different uses:
“If I had less data my disk would have more free space on it.” (Non-count)
“If I had fewer data I would not be able to obtain an astrometric solution.” (Count).
Contrary to Norman’s claims, it is not unusual for the same words (if they’re nouns) to have both count and non-count forms in different contexts. I give the example of “whisky” as in “my glass is full of whisky” (non-count) versus “two whiskies, please, barman”. His objection to this was that in the second case a whisky is an artefact of a metonymic shift which takes the word “whisky” to refer to the glass containing it.
Metonymy involves using a word related to a thing rather than the word for thing itself, as in “I have hungry mouths to feed”; it’s not really the mouths that are fed, but the people the mouths belong to. In fact there’s a bit of this going on when people talk about sources being “extincted” rather than their light.
This invalidates the example because, Norman alleges, the resulting meaning is different. This objection is a bit silly because the whole point is that the two forms should have different meanings, otherwise why have them? In any case the example simply involves me asking for two well-defined quantities of whisky. I’m not convinced of the relevance of metonymy here. What I care about is the whisky, not what it comes in, and when I drink the whisky I don’t drink the glass anyway. Metonymy would apply if I talked about drinking a couple of glasses. Consider “I drank two whiskies, one after the other” versus “I drank two glasses one after the other”. In both cases what has actually been drunk?
There are countless other examples (pun intended). “Fire” can be a mass noun “fire is dangerous”) but also a count noun (“the firemen were fighting three fires simultaneously”). Another nice one is “hair” which is non-count when it is on someone’s head (“my hair is going grey”) but count when they, in the plural, are being split.
Interestingly, though, the non-count forms of these nouns are all singular. Indeed, many non-count nouns exist only in the singular: such nouns are called singularia tantum. Examples include “dust” and “wealth”. So, if we accept that “data” can be a non-count noun, does that mean that it should necessarily be treated as singular when it does take on that role?
An example that might be taken to support this view could be “statistics” (the field thereof) which is a non-count noun. Although it appears to be derived from a plural, you would certainly say “statistics is a hard subject” rather than “statistics are a hard subject”. On the other hand “statistics” can refer to a set, each element of which is a statistic (i.e. a number), thus giving another example of a noun that can be of either count or non-count form; you might reasonably say “the statistics are impressive” in the count case. The non-count form “statistics” is a better example of metonymy than the example above, as it refers to the study of the (count) statistics rather than to the things themselves.
In fact there are also mass nouns, described as pluralia tantum, which exist only in the plural. A (not entirely accurate) list is given here. Examples include scissors and pants, for which the normal measure is a “pair”. Although these are technically non-count nouns (in the sense that you can’t have one scissor, etc) they don’t shed much light on the example in front of us. Perhaps more pertinent is the word “clothes” which is of non-count type but which is certainly plural. You can’t have one “clothe” (or any other number for that matter) but you would definitely say “your clothes are dirty”.
A more subtle example with relevance to the latin root of “data” is “media” which can refer to broadcast media (non-count) or plural of medium (count). “The media are out to get me” seems a correct construction to me, so the non-count form of this noun is a plurale tantum (singular of pluralia tantum).
So, just because a word may be a non-count noun, it doesn’t necessarily have to be singular.
To summarise, my argument is that (a) it is not correct to assert ”data” is a mass noun. It may or may not be, depending on the context. If it is acting as a count noun (which I contend is the case in most science writing) then it is definitely plural. Furthermore, even in cases where it is clearly a mass noun, and especially if you reject the alternative meaning as a count noun, then it is still by no means obvious that it must be treated as singular (because of the existence of the plurale tantum). In fact I would go a bit further and argue that you can only justify the singular non-count form at all if you accept that there is a count alternative. To be honest, though, I think I prefer the singular interpretation in the non-count case, as in “statistics”. It just sounds better.
If anyone has managed to read all the way through this exercise in pedantry I’d be interested to see any comments on my analysis of data.