The power of language June 5, 2018

All human activity requires communication. Our species only became successful and able to dominate all others when we learned to communicate using language, and human languages are immensely complex entities. Nobody is quite sure how many words we use in everyday speech but it runs from hundreds into thousands - maybe tens of thousands. Words differ in the way they sound and the way they are written, resulting in bizarre misunderstandings of the "four candles"/"fork handles" variety. Syntax rules layer on top of the words themselves and apply an agreed way in which to combine them to express meaning, adding further opportunities for things to go dramatically wrong.

Given all this potential for misunderstanding you'd wonder how we ever get anything right, but we do, even when the parties concerned are from different countries. Part of the way we achieve this is by setting up "domains" in which words have specific meanings that they may not have in general speech, and are used in tightly specified ways.

The word "language" itself has two distinct meanings. One is concrete, as in "I speak a foreign language" and the other is more abstract, as in "bad language" or "the language of commerce". The latter case doesn't mean that we swear in Greek or that business people only speak French while the rest of us speak English, only that they have a particular way of describing their work that might be unfamiliar to ordinary people.

There are thousands of domains covering all areas of human activity from bee-keeping to fine art, from car maintenance to cookery and from music to weather forecasting. Each uses a particular subset of the host language (e.g. English) to convey concepts specific to the domain.

If I want to convey instructions to another person - say how to boil an egg - they go something like this:

take a small pan
fill it with water
place the pan on the hob
bring to the boil
lower an egg into the pan
wait 5 minutes and 20 seconds
remove the egg

The domain here is cookery so although the instructions are in English we see a number of keywords - pan, water, hob, boil, egg etc. - that have specific meanings in that domain that they might not have in general speech. The instructions are all words and are imperative - that is, each line is a command. This example is close to being unambiguous and could be made so by a further tightening of the syntax, without affecting the ability of a cook to read and understand it.

So it's interesting to see how different things are when we communicate with computers. What if I want to tell a computer to boil an egg? Let's assume the computer is wired to a fully automated kitchen so physically it's capable of performing the required tasks.  But then things get tricky because computers don't understand text like that given above, even though the only real requirement is that the instructions have to be unambiguous. Nothing more.

To get a computer to boil our egg we have to translate the instructions into a computer language such as C, Java or JavaScript, but the result is something that a cook is unlikely to understand. In fact, a human being isn't even able to "speak" a computer program without difficulty as it's full of symbols, not just words.

When we use the term "computer language" we are in fact invoking a third meaning of the word "language", since computer languages don't conform in any meaningful sense to either of the previous meanings. They aren't in general pronounceable in speech, they don't "belong" to specific tribes, countries or traditions and they don't apply to domains. Instead they are general purpose symbolic vocabularies designed for the convenience of computers, not people.

Something else is different, too. It's a rare computer language that has as many as a hundred different keywords. Imagine that in order to communicate in English you had to use 100 words or fewer and build your sentences by combining those words in different ways, mixing upper and lower case and larding them with a profusion of symbols to reduce wordiness? Can you imagine how hard it would be to construct - or read - the Gettysburg Address or War and Peace?

Human communication has for millennia relied on languages with thousands of distinct words and meanings, able to be used for all purposes and equally good at handling the spoken context as the written. What on earth leads us to believe that when we want to give precise instructions to a computer we'll get the best results by throwing away all this linguistic heritage in favour of something our brains are demonstrably poor at handling?

The answer to this question can wait for the next article in this series.

Categories: Uncategorized