Thursday, December 7, 2017

Automatic Learning

By popculturegeek.com
 Originally posted to Flickr as Comic-Con 2004
Terminator statue, CC BY 2.0
https://commons.wikimedia.org
As I said in a previous article, automatic learning is one of the areas of weak artificial intelligence which has been object of research for at least 40 years. Strictly speaking, rather than a field of application, automatic learning is a methodology or technique used by other fields of application, such as neural networks, expert systems or data analysis. Automatic learning is divided into two main branches:
  • Supervised automatic learning, which has been used most frequently up to now. This post is dedicated to explain it.
  • Unsupervised automatic learning, related to the field usually called Data Mining. It has lately been widely advertised by the media in relation to a program (AlphaGo Zero) that, learning by itself, has reached a level comparable to the world champion of the game called Go (at the end of this post I’ll talk more about this).
Neural network with four layers
To explain supervised automatic learning, I’ll take as an example a specific expert system developed by means of this technique. This system is beginning to be used in practice to help lower court judges to decide whether they should (or shouldn’t) remand those accused of a crime into preventive detention, taking into account the possibility that the accused may commit new crimes if they are left in provisional freedom, and also the cost of preventive prison for the public coffers (both criteria are opposed, because the more accused are sent to prison, the less recidivism, but the greater the cost). The procedure used to build the expert system, which I will explain here, was devised over 30 years ago and is also applied in other fields, such as neural networks.
The automatic learning system consists of two algorithms:
  1. An algorithm to solve the problem in question (in our case, to advice that the defendant be remanded in custody or not) in a deterministic way, based on a set of parameters (sometimes thousands) whose concrete value is left open. Obviously, if this algorithm is not well designed, the final system won’t work well.
  2. A second algorithm −called the learning algorithm− whose objective is to adjust the parameters of the first algorithm, those whose value was left unspecified, in such a way that the system works in the best possible way.
  3. To help the second algorithm to adjust the parameters of the first algorithm, a very large set of real cases is available. In the expert system for judges, there were hundreds of thousands. All those cases took place actually, at some point in time, before a human judge, who made a decision, and there is also information about what were the consequences (if a defendant was released, whether there was a relapse or not, during the provisional release) along with the personal data of the accused and their record.
  4. The available historical cases are divided into two groups: the training cases, which are provided to the first algorithm together with the actual result, so that the second algorithm can adjust the optimal values ​​of the parameters in the first in such a way that the number of cases whose result was correctly predicted be as large as possible. The second group are the validation cases. Once the parameters of the first algorithm have been adjusted, this algorithm is used by itself on the new cases without knowing the actual result in real life, to see whether the results it predicts are comparable to the real ones. If this is satisfactory, the first algorithm (in our case the expert system to assist judges) can be considered complete and will be used in practice, unlinked from the learning algorithm, which is no longer needed. If the result is not acceptable, it will be necessary to start all over again using different algorithms, either the solution algorithm or the learning algorithm or both. There are many types of learning algorithms, although none is better than the others in all possible cases, as proved by the no-free-lunch theorem.
This type of learning is called supervised because the parameter adjustment takes place starting from a set of cases whose solution is known. In the case of a neural network, the parameters are the weights of all the connections in the network.
Let us now look at the AlphaGo Zero program, which recently reached a very high level in the Go game. What is the difference in this case with respect to supervised learning?
  • First, the two algorithms, execution and learning, are joined into one.
  • Secondly, rather than starting from a set of training data, the program automatically generates them by playing against itself. That is precisely why it is called unsupervised learning.
The achievement −which is important− has been presented in the media as the beginning of a revolution in automatic learning procedures. Keep in mind, however, that the field of computer games is very appropriate for this type of algorithms. In the first place, the result of each specific case is straightforward (the game is won or lost) and the training cases can be generated automatically in a simple way, making the program play against itself.
It is to be expected that similar programs will appear, specialized in different games (perhaps chess?). But it is clear that this procedure cannot be applied to more real cases, such as the expert system for judges. How could the program generate its own cases, and how could it know what was the practical outcome of the decision? No way. What we have here is a new learning procedure, which can only be applied in very specific and determined circumstances. The media, as usual, are counting their chickens before they are hatched.

The same post in Spanish
Manuel Alfonseca

No comments:

Post a Comment