Unlike universal grammar — the hypothesis that humans have innate structural language rules — it is hard to find advocates for universal vocabulary

If we are not born with any semantic understanding of words, we must learn them. By watching how others use language we acquire knowledge of how words are intended to be used. There are continuing dust-ups on what meaning is, but that meanings (whatever they are) are learned through empirical experience is uncontroversial.

Most semantic knowledge relies on an understanding of words, indeed this statement is almost tautological. Even our theories of nature, though they may be succinctly stated in mathematical symbols, are empty without words. We might agree the Schrodinger equation is beautiful, but it takes a textbook’s worth of words to ground its abstract elegance. 

The greatest part of the edifice of personal knowledge is based on one’s understanding of words. How strong are these foundations? Where do they come from?

Here is a personal calculation:

I include details on how I arrived at these estimates in an appendix section. The figures are based on rules of thumb and coarse estimates of how I spend my time. For example, a film script is about 10 thousand words, suggesting one thousand words for every 10 minutes of film or TV and I watch about two films worth of content in a week.

Much more dramatic than a current weekly total is to consider a lifetime diet of words. Combining the above estimates with ones from other phases of my life gives a figure very close to 1.5 billion words:

No doubt this calculation is imperfect. I’ve not accounted for words heard on the bus, or seen on cereal packets. I’ve brushed aside subtle questions about when a word is a word rather than a patch of squiggles or an incoherent sound. Yet I believe this accurately captures the scale of language encountered in a lifetime.

To add some color to this, the English language articles on Wikipedia total about 3.9B words. So 1.5B is not too shabby by this measure. On the other hand, it is a fraction of 1% of the words within the 14M books on the shelves of the Bodleian Library.

It is curious that the written and spoken word vie for my attention with almost equal success. I am also surprised that leisure time provides such a dominant share of my printed and parol nutrition. As is often the case, however, I struggled to disentangle work from leisure in my accounting. For example, much of what I have categorized as leisure reading is work-related. I have not marked it as such since I do it outside of work hours. Perhaps part of the reason leisure outdoes other settings is that in work and education the balance may be towards production rather than consumption — a calculation for another day perhaps.

It is always sobering to see a reminder of the finitude of one’s experience. Though this is but one yardstick to measure the size of a life, it is a significant one. As Fernando Pessoa writes via his mouthpiece Bernardo Soares:

“Prose encompasses all art, in part because words contain the whole world, and in part because the untrammelled word contains every possibility for saying and thinking. ”

You’ll find Pessoa/Soares in different spirits some pages apart:

“A man of true wisdom, with nothing but his senses and a soul that’s never sad, can enjoy the entire spectacle of the world from a chair, without knowing how to read and without talking to anyone.”

I don’t think either quote is quite true or is even intended to be so. Yet there is something here. Words are important, but so are the wordless spaces they span.

Over our lives, words will be granted to us day by day, the wages of living. They take us on a journey from knowing nothing but what we see, to inhabiting worlds we could not have imagined. Set your course carefully, you can never see the whole world.


Quotes are from the Book of Disquiet by Fernando Pessoa, translated by Richard Zenith.

It is germane to note that there are proponents of deep physics — an innate feeling for how objects behave, and also for deep psychology. The implications of this would be that while we are not born knowing the names for certain categories, such as hard, soft, stationary we have conceptual slots waiting for a label to be attached. If this is the case, it could well be that our vocabulary learning is underdetermined by the data. Such a view extends Chomsky’s poverty of the stimulus argument from grammar universals to encompassing lexical learning. 


My current consumption:

  My current consumption Weekly words 
Books One novel-length book per week 100k
News & periodicals ⅓ of The Economist each week & a miscellany of other articles 50k
Conversing socially The amount of time I spend listening to flowing speech socially compresses to perhaps 45 minutes per day (close to 90min of two-way conversation). I assume 125 words per minute. 40k
Podcasts & radio My mileage varies considerably. I often listen to Podcasts while doing chores. Estimate: 45 minutes per day. 40k
TV & film  There are ~10k words in a movie which suggests a simple rule of thumb: 1k words for every 10 minutes of content. I, almost religiously, watch one film per week, and about the equivalent length of TV 20k
Social media I rarely venture onto FB and Twitter and I confess to only reading a portion of group conversations on WhatsApp 3k
Meetings 2 to 3 hours in meetings on a typical day. Again assuming ~2 words per second but discounting ~15% of these (since they come from me) 85k
Emails A fairly long email is around 500 words, most are short and many I skim 15k
Slack At Opensignal we rely on Slack a lot. While the quantity of messages eclipses email, they are (typically) concise. 20k
Misc. work reading Pull requests, papers, Jira tickets 20k

My forecast consumption:

  Consumption forecast — retirement Weekly words 
Books One novel-length book per week 100k
News & periodicals Roughly the same as current levels 50k
Conversing socially Assuming a 50% increase  60k
Podcasts & radio Doubling to 90 minutes per day 80k
TV & film  Assuming three films worth per week 30k
Meetings I will keep myself busy with something, perhaps an hour a week on Handforth parish council 6k
Social media Against the tides of fashion, I plan to increase my social media engagement 6k
Emails The idea of a plentiful correspondence appeals to me, though I doubt it will reach Darwin’s levels 8k

Consumption estimates for my former selves

  Consumption estimates — school Weekly words 
Books (home) One novel length book per month 25k
News & periodicals A couple of features and several articles per week 20k
Conversing socially Assuming a 50% increase on  current levels 60k
Podcasts & radio Fewer podcasts, but quite a lot of radio 40k
TV & film  Slightly more than current levels 20k
Lessons (talking) Assuming 15 minutes of talking in each lesson and 6 lessons per day 45k
Lessons & homework (reading) Assuming another 15 minutes of reading in each lesson, another 30 minutes at home.  120k
Social media A tiny bit of MSN 1k
  Consumption estimates — early years Weekly words 
Books We spend perhaps an hour a night reading to my son, with many interjections on his part. At nursery, it is probably quite similar.  35k
Listening to adults Even if we’re not talking to him, he’s listening! 40k
Conversing socially There is a lot of talking in the nursery I’m sure! 90k
Songs  Assuming 30 minutes per day, 1k per every 10m minutes 20k
TV & film  He does watch more TV than I do, perhaps 1 hour per day 40k

