Machine Learning & Algorithms
Ever since the seventies there has been a good amount of research on Artificial Intelligence. Because of this, it is possible for a computer to learn something. A computer can now make predictions or recognize objects based on examples. An important part of the research focuses on converting our understanding of how learning works to models which can be executed by computers. These types of models are called algorithms.
Learning is still a big mystery. To date, there is no big, generic, overarching algorithm that can replicate all types of learning. There are different types of algorithms for different purposes. Every kind of algorithm is explicitly designed for a certain type of learning or to solve a specific problem. Algorithms can also differ in accuracy for a task and the time needed to train the algorithm.
Algorithms can be roughly divided into two types: supervised and unsupervised. Supervised algorithms are provided with examples of the characteristics the data has as well as the characteristics you want the computer to be trained on. Unsupervised algorithms require the computer to search for the specific characteristics on its own.
For example, a financial institution has a historical overview of all kinds of transactions, including a number of fraud cases. They want to use Machine Learning to detect fraud cases more quickly in the future. If the fraud cases are marked in the historical data, supervised algorithms can be used to identify the specific fraud patterns. If the fraud cases are not marked, than unsupervised algorithms can cluster the transactions and group those which seem identical. By selecting various characteristics on which it is grouped, it is possible to isolate potential fraud cases. In this example, future fraud cases are best recognized with a marked data set and supervised classifying algorithms. Unsupervised cluster algorithms are used to detect new ways of fraud.
In a supervised learning process, by giving examples and counter-examples, the Machine Learning algorithm is trained on a specific data set. The purpose is to improve the accuracy to such an extent that the results can be used. An untrained algorithm will never perform well, but an over-trained algorithm is too little resistant to deviations from the training data. It is important to regularly measure the accuracy of a trained algorithm. This can be done by releasing the trained algorithm on familiar data already classified by people. This way, the accuracy of the data is measured. The difference between the known values and the results of the algorithm defines its accuracy.
Measuring the accuracy of an unsupervised algorithm is much more complicated. This algorithm is used to find underlying connections and contexts. Therefore, it is not always possible to compare the outcome with pre-classified data. These cases are often experimental; statistics or common sense are needed to determine how useful the outcome is for the intended purpose.
Recently there has been renewed attention for “neural networks”. These algorithms are based on the way our brains are functioning. This was first applied in the late eighties, but due to the limited computing power at the time it was only used for text recognition (OCR). Simulating the connections between brain cells requires a lot of parallel computing power. Modern graphic cards (GPU’s) are built around these calculations and can effectively be used for this type of algorithms. Because of the larger computing power of these GPUs, neural networks containing multiple extended layers can be used. The increased complexity makes it possible to link neural networks an automatic measurement of the accuracy (reinforced deep learning). Computers are able to recognize images and speech – sometimes even better than human beings.
In order to apply Machine Learning effectively it is necessary to know what you want to achieve. Detecting known types of fraud requires a different approach and algorithm than finding possible new variations of fraud. Additionally, it is important to monitor the quality of the data which is used to train the algorithm. This helps in choosing the best training methods. Furthermore, a clear success criterion is important; when is the solution fast or accurate enough and how will you measure that?