Answering the ultimate question

"That computer learnt word associations from Wikipedia. So it knows that 'opus' goes most commonly with 'Rome', 'sushi' with 'fish' .... and therefore 'robot' with 'violent genocide of the human race'?"

We were discussing the latest colloquium from the Origin's Institute which this week had been given by Geoffrey Hinton, a specialist (and indeed founder) on neural networks from the University of Toronto. Apparently, we had inadvertently stumbled across the cause of the downfall of humanity; that the tool used to teach computers about language also happens to include a scene-by-scene description of the 'Terminator' series.

An artificial neural network is like an electronic brain; it is a computer program that is designed to work in a similar way to biological neurons. Like the brain (and unlike most other types of computer programs), neural networks can 'learn' to do a particular task by being given many examples. They are used in pattern recognition (e.g. reading a person's handwriting) and for finding complex relationships (e.g. stock market predictions or medical phenomena, where the outcome is the result of many combining factors). 

In his talk, Professor Hinton took us through one of the early algorithms for teaching a neural network. In this technique, the computer is given data --for instance an image of a hand-written number-- that it must identify. It looks for specific features in the pixels, such as the presence and position of curves or straight lines, and then makes a guess. The guess is then compared with the correct answer and the relative importance of the different features being detected is adjusted to improve the algorithm. For example, a double loop would be very important since it identifies an '8', whereas a single loop is less significant since it could belong to a '0', '6', '9' or even squiggly drawn '4'. After this training, the computer program can be used to identify a wide range of hand-written numbers accurately.

This method works, but it has its drawbacks. Having a large number of fixed features that must be programmed for the code to identify makes the process slow, inflexible and results in poor scaling. Additionally, the fact it requires labels (e.g. 'number 1', 'number 2') to apply to the data differs from the way our brain works. Each image that our brain processes can rarely be categorised by a single label. For instance, a cow in a field has a colour, a position and key tell-tale signs that indicate it might actually be a man-eating Minotaur -- all of which are not encompassed in the label 'cow'. 

An improvement to this was to replace the pre-defined features that were given to the computer to identify an object with a set of criteria it created itself based on experience. This meant the computer no longer needed to know anything about the data it was being given. It could be a series of drawings of the number 2, pictures of houses or Minotaurs disguised as cows, and the algorithm would find common collection of characteristics that it could use to identify them. An example of such an identified feature for a '2' would be a wedge of light coloured pixels in the top left corner of the image, followed by a diagonal dark line -- the start of the 2's top arch.

Left to do this, the features identified by the neural network fell into two main categories; a small set of coarse criteria based on colour and a much larger set of finely tuned criteria based on shade. An example of both these types of characteristics would be a person standing against a wall. The sharp line between the white of the wall and the darkness of their hair would form a colour-based feature. Their facial features, meanwhile, would be picked out in a multitude of different shades in the same 'skin' colour. Interestingly, the resultant map of these computer-identified features closely resembles that of a monkey's brain.

Algorithmically, the set of data defining characteristics is honed by the computer program calculating first a set of features, then set of features of the features.... then a set of features of the features of the features. This leaves a collection of basic patterns that can be used to accurately identify the type of object for which the network has been trained.

An interesting question Professor Hinton than proposed was could such a neural network use its pattern recognition to predict the next stage of a sequence, rather than just identify objects? In particular, could a program predict the next word in a sentence?

To tackle this problem, the philosophical sounding question 'what is a word?' had to be answered. It turned out to be easiest to consider a word simply as a sequence of characters and to train the neural network to predict the 11th character in the string fed to it. This process could be continually repeated to build up entire sentences.

To teach the network about how words are formed, PhD student Ilya Sutskever gave the computer 5 million strings of 100 characters each from wikipedia. At the end of this training, the computer was told to build entire paragraphs of text to assess what it knew. It turned out to almost always produced real words. In the few occasions where it made some up, they sounded like they ought to exist. For example, 'ephemerable' or 'interdistinguished'. It was also good at semantic associations. It knew that many words that started 'sn' were connected with the upper lip and nose, e.g. 'sneeze', 'snarl' and, uh, 'snow' when used as a synonym for an illegal drug. (A fact noted by the speaker, not the author of this blog). Likewise, it knew that sentences containing 'opus' often also contained 'Rome' and that ones mentioning 'Plato' frequently went on to say 'Wittgenstein'. Similarly to the human brain, however, it often did not know why these connections existed. This produced sentences that made sense grammatically, but would not actually be found. For example, it talked about the "several Irish intelligence agencies in the Mediterranean' which is geographically unlikely.

A fact I found most surprising from this work was the length of information the computer program drew from. When deciding what the next character should be, it did not just look at the few before it, but at the long pattern of characters (that is, entire words) that preceded it. This allowed it to almost always use a consistent tense and to close parenthesise.

The knowledge could be applied to words it had never seen before. Upon being given two uses of the fictitious verb 'to thrunge', it guessed that the next character in 'Shelia thrunge' would be an 's' whereas the one following 'people thrunge' would be a space.

At the end of the day though, all Douglas Adams fans will agree that there really is only one question of any importance for a neural network trained on language. The computer was therefore asked to complete the ultimate question:

'The meaning of life is ...'

To which it replied:

 '.... literacy recognition.'

Clearly, it had been listening to students and postdocs panic about their paper count in the laboratory.

So are we close to really understanding how the human brain works? Professor Hinton took the opinion:

"It's a device with a few trillion parameters ..... how hard can it be?"