Pluralia Tantum

Meanwhile, over on the e-astronomer, Andy Lawrence recently posted an item about the lamentable tendency of astronomers to abuse the English language. The focus of his venom was “extincted”, a word used by many astro-types as an adjective to describe the state of affairs when light from a source (e.g. a quasar) has suffered “extinction” by intervening matter. “Extinction” is formed from the verb “extinguish” in the same way that “distinction” is formed from “distinguish”. Nobody would describe a professor as “distincted” (certainly not if it is Andy Lawrence) so, clearly, “extincted” is inappropriate. Actually if you really want to nit-pick you could object to “extinction” being applied to an object such as a  quasar, when it isn’t actually the object that is suffering from it but the light it has emitted.

But as a gripe, this is fair enough I’d say. Andy went on to encourage his legions of adoring readers to contribute their own pet hates, preferably with an astronomical orientation. My contribution was “decimate” which  means “to remove the tenth part” or “to reduce by ten percent”, from the Roman practice of punishing disobedient legions by killing every tenth man, but is often regrettably now used to mean “annihilate” or “obliterate”. You might think this hasn’t got much to do with astronomy but, sadly, it does. Indeed, a press release from STFC discussing the recent ten percent cuts to its grants budget states that consequent reduction in PDRAS

..will not cause the decimation of physics departments as has been speculated in media reports.

I would expect a civil servant to have done a bit better, so presumably this was written by an astronomer too. At any rate, it is precisely wrong.

You might argue that things like this don’t matter.  Language evolves,  and if modern usage deviates from its previous meanings then we should just let it change. I fully accept the dynamic nature of language and do not by any means object to all such changes. Society changes and so must the words we use. But if a change is (a) a result of sloppiness and (b) results in the loss of a very good use to be replaced by a bad one, then I think educated people should stand their ground and fight it. If we don’t do that language doesn’t just change, it decays.

Most of us practising scientists have to spend a lot of our time writing scientific papers, departmental memos, grant applications and even books. I think many astronomers see this activity as a chore, take no pleasure from it, and invest the minimum care on it. I was fortunate to have a really excellent writer, John Barrow, as my thesis supervisor and he convinced me that it was worth making the effort to write the best prose I could whatever the context. Not only does this attitude eliminate the ambiguity which is the bane of scientific writing. Taking pains over style and grammar also allows one to feel the pleasure of craftsmanship for its own sake. With John’s guidance and encouragement, I learned to enjoy writing through the satisfaction experienced by finding neat forms of words or nice turns of phrase. You never really feel good about what you do if you scrape through at the miminum acceptable level. Try to make the effort and you will be more fulfilled and the long hours of slog you spend putting together a complicated paper will at least be enlivened by a genuine sense of delight when things fall neatly into place, and a warm glow of achievement when you read it back and it sounds not just acceptable but actually good.

But I digress.

One of the other contributors to Andy’s list of examples of bad grammar was a chap called Norman Gray who objected to astronomers’ use of the word “data” as a plural noun, as in “the data indicate” rather than “the data indicates”. I was taken aback by this because I was expecting the opposite objection.

He has a lengthy rant about this on his own blog so I won’t repeat his arguments in detail here, merely a synopsis. The word “data” is formed from the latin plural of the word “datum” (itself formed from the past participle of the latin verb “dare”, meaning “to give”) hence meaning “things given” or words to that effect. The usage of “data” that we use now (to refer to measurements or quantitative information) seems not to have been present in roman or mediaeval times so Norman argues that it is a deliberate archaism to treat it as a latin plural now. He also argues that “data” in modern usage is a “mass noun” so should on that grounds also be treated as singular.

For those of you who aren’t up with such things, English nouns can be of two forms: “count” and “non-count” (or “mass”). Count nouns are those that can be enumerated and therefore have both plural and singular forms:  one eye, two eyes, etc. Non-count nouns (which is a better term than “mass nouns”) are those which describe something which is not enumerable, such as “furniture” or “cutlery”. Such things can’t be counted and they don’t have a different singular and plural forms. You can have two chairs (count noun) but can’t have two furnitures (non-count noun).

Count and non-count nouns require different grammatical treatment. You can ask “how much furniture do you have?” but not how many. The answer to a “how much” question usually requires a unit or measure word (e.g. “a vanload of furniture”) but the answer to a “how many” question would be just a number. Next time you are in a supermarket queue where it says “ten items or less” you will appreciate that it the sign is grammatically incorrect. “Item” is most definitely a count noun, so the correct form should be “ten items or fewer”..

Anyway, Norman Gray asserts that (a) “data” is a non-count noun and that (b) it should therefore be singular. Forms such as “the data are..” are out (“a vile anacoluthon”) and “the data is…” is in.

So is he right?

Not really.  Unkind though it may be to dismantle a carefully constructed obsession, I think his arguments have quite a few problems with them.

For a start, it seems clear to me that there are (at least) two distinct uses of the word data. One is clearly of non-count type. This is the use of “data” to describe an undifferentiated unspecified or unlimited quantity of information such as that stored on a computer disk. Of such stuff you might well ask “how much data do you have?” and the answer would be in some units (e.g. Gbytes). This clearly identifies it as a mass noun.

But there is another meaning, which is that ascribed to specified pieces of information either given (as per the original latin) or obtained from a measurement. Such things are precisely defined, enumerable and clearly therefore of count-noun form. Indeed one such entity could reasonably be called a datum and the plural would be data. This usage applies when the context defines the relevant quantum of information so no unit is required. This is the usage that arises in most scientific papers, as opposed to software manuals. “In Figure 1, the data are plotted…” is correct. Although it sounds clumsy you could well ask in such a situation “how many data do you have?” (meaning how many measurements do you have) and the answer would just be a number. Archaism? No. It’s just right.

To labour the point still further,  here are another two sentences that show the different uses:

“If I had less data my disk would have more free space on it.” (Non-count)

“If I had fewer data I would not be able to obtain an astrometric solution.” (Count).

Contrary to Norman’s claims, it is not unusual for the same words (if they’re nouns) to have both count and non-count forms in different contexts. I give the example of “whisky” as in “my glass is full of whisky” (non-count) versus “two whiskies, please, barman”. His objection to this was that in the second case a whisky is an artefact of a metonymic shift which takes the word “whisky” to refer to the glass containing it.

Metonymy involves using a word related to a thing rather than the word for thing itself, as in “I have hungry mouths to feed”; it’s not really the mouths that are fed, but the people the mouths belong to. In fact there’s a bit of this going on when people talk about sources being “extincted” rather than their light.

This invalidates the example because, Norman alleges, the resulting meaning is different. This objection is a bit silly because the whole point is that the two forms should have different meanings, otherwise why have them? In any case the  example  simply involves me asking for two well-defined quantities of whisky. I’m not convinced of the relevance of metonymy here. What I care about is the whisky, not what it comes in, and when I drink the whisky I don’t drink the glass anyway. Metonymy would apply if I talked about drinking a couple of glasses. Consider “I drank two whiskies, one after the other” versus “I drank two glasses one after the other”. In both cases what has actually been drunk?

There are countless other examples (pun intended). “Fire” can be a mass noun “fire is dangerous”) but also a count noun (“the firemen were fighting three fires simultaneously”). Another nice one  is “hair” which is non-count when it is on someone’s head (“my hair is going grey”) but count when  they, in the plural, are being split.

Interestingly, though, the  non-count forms of these nouns are all singular. Indeed, many non-count nouns exist only in the singular: such nouns are called singularia tantum. Examples include “dust” and “wealth”. So,  if we accept that “data” can be a non-count noun, does that mean that it should necessarily be treated as singular when it does take on that role?

An example that might be taken to support this view could be “statistics” (the field thereof) which is a non-count noun. Although it appears to be derived from a plural, you would certainly say “statistics is a hard subject”  rather than “statistics are a hard subject”.  On the other hand “statistics” can refer to a set, each element of which is a statistic (i.e. a number), thus giving another example of a noun that can be of either count or non-count form; you might reasonably say “the statistics are impressive” in the count case.  The non-count form “statistics” is a better  example of metonymy than the example above, as it refers to the study of the (count) statistics rather than to the things themselves.

In fact there are also mass nouns, described as pluralia tantum, which exist only in the plural. A (not entirely accurate) list is given here. Examples include scissors and pants, for which the normal measure  is a “pair”. Although these are technically non-count nouns (in the sense that you can’t have one scissor, etc) they don’t shed much light on the example in front of us. Perhaps more pertinent is the word “clothes” which is of non-count type but which is certainly plural. You can’t have one “clothe” (or any other number for that matter) but you would definitely say “your clothes are dirty”.

A more subtle example with relevance to the latin root of “data” is “media” which can refer to broadcast media (non-count) or plural of medium (count).  “The media are out to get me”  seems a correct construction to me, so the non-count form of this noun is a plurale tantum (singular of pluralia tantum).

So,  just because a word may be a non-count noun, it doesn’t necessarily have to be singular.

To summarise,  my argument is that (a) it is not correct to assert “data” is a mass noun. It may or may not be, depending on the context. If it is acting as a count noun (which I contend is the case in most science writing) then it is definitely plural. Furthermore, even in cases where it is clearly a mass noun, and especially if you reject the alternative meaning as a count noun, then  it is still by no means obvious that it must be treated as singular (because of the existence of the plurale tantum). In fact I would go a bit further and argue that you can only justify the singular non-count form at all if you accept that there is a count alternative. To be honest, though, I think I prefer the singular interpretation in the non-count case, as in “statistics”. It just sounds better.

If anyone has managed to read all the way through this exercise in pedantry I’d be interested to see any comments on my analysis of data.

About these ads

24 Responses to “Pluralia Tantum”

  1. Rob Ivison Says:

    Rats. I came back here, having relished the film noir post – to the extent that I pestered someone for a DVD set for Chrimbo – only to find the blog has entered anorak territory… abort! Back to black!

  2. I used to see data as a non-count noun (like water flowing through a data reduction pipeline), but my co-authors have since persuaded me to treat data as a count noun, citing the Latin origin (but just how often do you see “datum” mentioned in a paper?!).

  3. telescoper Says:

    Bo,

    My point is that, in common with many other nouns, “data” can be either depending on the context. The word pipeline certainly implies the flow of something continuous and uncountable through it, so I would say “the data flows through the pipeline”. If you are dealing with points on a graph the discrete and finite nature is clear. Dealing with things statistically one doesn’t generally refer to individual data but to the set of data. That perhaps explains why datum is relatively rare. But I have certainly seen phrases like “the final datum appears to be an outlier”.

    Perhaps I want to preserve this use because I’m a cosmologist who remembers the time when there wasn’t any data, just a small number of data, and each datum was probably wrong anyway.

    Peter

  4. telescoper Says:

    Rob,

    I’ve just been looking at the blog stats and it appears roughly three times as many people read this item about grammar than read the post about Film Noir. I’m not sure what to make of this datum, but perhaps it indicates that Media Studies is out and English Language is in!

    Peter

  5. Michael Merrifield Says:

    Indeed, a press release from STFC discussing the recent ten percent cuts to its grants budget states that consequent reduction in PDRAS

    ..will not cause the decimation of physics departments as has been speculated in media reports.

    I would expect a civil servant to have done a bit better, so presumably this was written by an astronomer too. At any rate, it is precisely wrong.

    Since pedantry seems to be the order of the day, it should be noted that no physics department receives 100% of its funding from STFC, so a 10% cut in STFC funding would not cause the decimation of physics departments.

  6. telescoper Says:

    Mike

    I take your point.

    It isn’t clear from the context whether each department is supposed to be reduced or it refers to the overall number of departments in the UK. It true that departments don’t get all their income from STFC, but astronomy and particle physics are essential to keeping many of them above water. The reduction in STFC income may end up having a disproportionate effect because of the very tight margins involved. I wouldn’t therefore be surprised if, over the next ten years, UK physics departments were decimated in the sense that ten percent might be lost to closure or merger.

    I should also add that I wasn’t suggesting that one in ten members of every physics department should be taken away and killed, although if this does turn out to be the case I’d happily supply a list of names.

    Peter

  7. Anton Garrett Says:

    Yes indeed. I’ve never boiled a kettle in my life, but I have frequently boiled water in one.

    See George Orwell’s marvellous essay “Politics and the English language” for why this stuff matters (or, at the least, is a sensitive cultural index). The French understand this issue but they try to legislate it, which doesn’t work with something as organic as language. The underlying question is: Who has authority to say what usage is correct?

    Some personal current hates:

    * How the word “phenomena” has shifted from plural to singular in the last decade, so that “phenomenon” is in danger of being exctincted

    * The verbising of nouns

    * “Pressurise” for “press” (with the noun “pressure” as an intermediate in the evolution process)

    * How politicians use “refute” when they mean “deny” (first noted, to my knowledge, by the late great David Stove)

    * “Birthed” for “born”

    Anton

  8. telescoper Says:

    One example I forgot to mention is “conceptualize”. What’s wrong with “conceive”? Isn’t “concept” formed as a participle anyway?

    Let me at those windmills.

  9. […] question that arises from such data is whether these empirical distributions differ significantly from each other or whether they are […]

  10. Adrian Burd Says:

    And one must not neglect ” to have been burglarized”.

    If my memory serves me correctly, isn’t “parliament” also one of those words hat can be plural or singular depending on the context?

    Adrian

  11. John Peacock Says:

    Perhaps I missed it in your admirable discussion, but I would vote for the use of “dataset” as a practical solution in disputed cases. While many of your fellow pedants shrink in horror at “this data shows” and so on, no-one would object to the substitution “data” -> “dataset”. So people who object to “the data are plotted” and want to be singular about it have an obvious and correct remedy to hand. In short, “data are plural, unless it’s a dataset”.

  12. telescoper Says:

    John

    I think “dataset” used to mean something quite specific in computing (i.e. a file) but it now seems to be used more widely and I think it’s quite acceptable in circumstances where it is not clear from the context what is meant. As long as you promise never to hyphenate it!

    I’m less keen on “data point” because I did latin at school and quite like datum as it is. Call me old-fashioned.

    Donum ab hominibus datum dolor est

    Peter

    • “I think “dataset” used to mean something quite specific in computing (i.e. a file) but it now seems to be used more widely and I think it’s quite acceptable in circumstances where it is not clear from the context what is meant.”

      Presumably you mean “acceptable in circumstances where it is clear from the context”.

      The other usage is still around, at least among IBM mainframe types: I ran into it just yesterday.

  13. […] a discussion on the e-astronomer, which subsequently evolved into an extended exercise in pedantry here, it struck me that many words we British think of as being Americanisms were in fact in common use […]

  14. David H. Straayer Says:

    You assert: “If it is acting as a count noun (which I contend is the case in most science writing) then it is definitely plural.”

    I’m not sure I accept that claim. Coming as I do from a computer science background, I claim that the branch of science most directly concerned with ‘data’ as a concept almost invariably uses “data” in the non-count sense. Further, I’d claim that the flow of the use of this word into the general lexicon comes mainly from this discipline.

    You Brits treat the English language as though you had invented it. :-)

    “A language invented by Norman knights for seducing Saxon wenches” – a great description, but I can’t remember the source.

  15. Dave

    If you had read my piece you would have seen that I do admit this specific use of data as a singular non-count noun. Your comment seems to be arguing however that “most science writing” is actually about computer science, which seems a strange belief to me. And we’re not talking about the “general lexicon”, we’re talking about science.

    Peter

    P.S. We Brits didn’t invent the word “data”. The Romans did.

  16. I like this blog, but I did not found answer for my problem about using “data is” and/or “data are”.

    In this case I’m happy that I don’t know as good English as you. If have to know who “invented” that particular word to use it correctly it have to hard.

    I like my language where we take word root and apply our language rules.

    I would like to see how to use word ROBOT. Invented by Czech novel author. Using of such word should follow Check counting rules?

    If I recall correctly it should be
    1 robot, 2 roboty, 5 robotu (no robots)

    Please dont take this as attack it is really not intended to be. You few comments which arise after I read this bloq

  17. I have been in hurry so my last sentence should be: You few comments which arise after I read this blog.

    This may explain datum/data (and robot) thing, but not solve main question.
    http://www.english-zone.com/spelling/plurals.html. Just make we wonder why you need History of word to explain its “irregularity”.

    • telescoper Says:

      I don’t get your point. The words in that list are all of either Latin or Greek origin, which is the reason that they have irregular plurals…

  18. I was just thinking of making the switch to “the data show” when I decided to google it and found this!

    This is the best explanation I’ve found.

    Thanks for saving me from at least a little bit of pomposity.

  19. […] alternative view is given by Peter Coles, another astronomer at Cardiff University, UK, who also explains the issue […]

  20. telfer cronos Says:

    saying “data is singular” is like say “c**t is a Christian word”. It may be true. But is it important to tell us?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 3,748 other followers

%d bloggers like this: