Is nothing > data?

I got this yesterday from one of my office mates who suggested that I stick it somewhere. It’s an advert for a data science company called Pivigo. Logically, the statement on the sticker implies that data is less than nothing, which I don’t think is the point that they’re trying to make. On the other hand, I suppose that by posting this I’ve given Pivigo some free advertising so in some sense it is a successful promotional ploy!

Anyway, when I posted this on Twitter it sparked a little discussion about the vexed issue of whether the word `data’ is singular or plural, so I decided to bore my readers with thoughts on that – not that I’m pedantic or anything.

The word `data’ is formed from the latin plural of the word `datum’ (itself formed from the past participle of the latin verb `dare’, meaning `to give’) hence meaning `things given’ or words to that effect. The usage of `data’ that we use now (to refer to measurements or quantitative information) seems not to have been present in roman or mediaeval times so some argue that it is a deliberate archaism to treat it as a Latin plural now. Moreover, some insist that `data’ in modern usage is a `mass noun’ so should on that grounds also be treated as singular.

For those of you who aren’t up with such things, English nouns can be of two forms: `count nouns’ and `non-count counts’ (also known as `mass nouns’). Count nouns are those that can be enumerated and therefore have both plural and singular forms:  one eye, two eyes, etc. Non-count nouns are those which describe something which is not enumerable, such as `furniture’ or `cutlery’. Such things can’t be counted so they don’t have a different singular and plural forms: you can have two chairs (count noun) but can’t have two furnitures (non-count noun).

Count and non-count nouns require different grammatical treatment. You can ask `how much furniture do you have?’ but not how many. The answer to a `how much’ question usually requires a unit or measure word (e.g. `a vanload of furniture’) but the answer to a `how many’ question would be just a number. Next time you are in a supermarket queue where it says `ten items or less’ you will appreciate that it the sign is grammatically incorrect. `Item’ is most definitely a count noun, so the correct form should be `ten items or fewer’.

In the specific case of `data’, it seems clear to me that there are (at least) two distinct uses of this word. One is the use of `data’ to describe an undifferentiated unspecified or unlimited quantity of information such as that stored on a computer disk. Of such stuff you might well ask `how much data do you have?’ and the answer would be in some units (e.g. Gbytes). This clearly identifies it as a mass noun.

But there is another meaning, which is that ascribed to specified pieces of information either given (as per the original Latin) or obtained from a measurement. Such things are precisely defined, enumerable and clearly therefore of count-noun form. Indeed one such entity could reasonably be called a datum and the plural would be data. This usage applies when the context defines the relevant quantum of information so no unit is required. This is the usage that arises in most scientific papers, as opposed to software manuals. In Figure 1, the data are plotted…’ is correct. Although it sounds clumsy you could well ask in such a situation `how many data do you have?’ (meaning how many measurements do you have) and the answer would just be a number. I don’t find this archaic at all. It seems quite sensible.

To labour the point still further,  here are another two sentences that show the different uses:

“If I had less data my disk would have more free space on it.” (Non-count)

“If I had fewer data I would not be able to obtain an astrometric solution.” (Count).

It is not unusual for the same words (if they’re nouns) to have both count and non-count forms in different contexts. I give the example of `whisky’, as in `my glass is full of whisky’ (non-count) versus `two whiskies, please, barman’.
There are countless other examples (pun intended) of words that can be count nouns or non-count nouns. `Fire’ can be a mass noun `fire is dangerous’) but also a count noun (`the firemen were fighting three fires simultaneously’). Another nice one  is `hair’ which is non-count when it is on someone’s head (`my hair is going grey’) but count when  they, in the plural, are being split.

In the context of data science it seems to me that `data’ is almost always used as a non-count noun and can therefore reasonably be treated as singular. In the context of the statement that `nothing is > data’ it would also appear that `nothing’ is also of non-count form, but whether this is the case or not, the statement seems to imply that `0>data’, which seems to imply that data is negative.

And there’s another question: what does `>’ mean? Wikipedia says `greater than‘, but I think it means `is greater than’, much as `=’ means `equals’ or `is equal to’. So there’s a syntax error in the sticker too…

..or perhaps I might be reading a little too much into this?


