Implementing Artificial Neural Networks (ANNs)
- Overview
An Artificial Neural Network (ANN) can be considered as a classification and as a forecasting technique. This technique tries to simulate how the human brain works. In this technique, there are three layers, Input, Hidden, and Output above.
The input layer is mapped to the input attributes. For example, age, gender, number of children can be the inputs to the Input layer. The Hidden layer is an intermediate layer where every input with weightage is received to each node in the hidden layer. The Output layer is mapped to the predicted attributes. In our AdventureWorks example, Bike Buyer will be mapped to the output layer.
A neuron is a basic unit that combines multiple inputs and a single output. Combinations of inputs are done with different techniques, and the Microsoft Neural Network uses Weighted Sum. Maximum, Average, logical AND, logical OR are the other techniques used by the different implementation.
After these inputs are calculated, then the activation function is used. In theory, sometimes, small input will have a large output, and on the other hand, large input might be insignificant to the output. Therefore, typically non-linear functions are used for activation. In Microsoft Neural Network uses tanh as the hidden layer activation function and sigmoid function for the output layer.
The brain consists of hundreds of billion of cells called neurons. These neurons are connected together by synapses which are nothing but the connections across which a neuron can send an impulse to another neuron. When a neuron sends an excitatory signal to another neuron, then this signal will be added to all of the other inputs of that neuron. If it exceeds a given threshold then it will cause the target neuron to fire an action signal forward — this is how the thinking process works internally.
Figure 1: A simple Neural Network (NN), with two layers capable of solving linear classification problem.
- An Example
In Computer Science, we model this process by creating “networks” on a computer using matrices. These networks can be understood as abstraction of neurons without all the biological complexities taken into account. To keep things simple, we will just model a simple Neural Network (NN), with two layers capable of solving linear classification problem.
Let’s say we have a problem where we want to predict output given a set of inputs and outputs as training example like so:
Figure 2: Training Examples
Input 1 | Input 2 | Input 3 | Output |
---|---|---|---|
0 | 1 | 1 | 1 |
1 | 0 | 0 | 0 |
1 | 0 | 1 | 1 |
Now we predict the output the following set of inputs.
Figure 3: Test Example
Input 1 | Input 2 | Input 3 | Input 4 |
---|---|---|---|
1 | 0 | 1 | 1 |
Note that the output is directly related to third column i.e. the values of input 3 is what the output is in every training example in Figure 2. So for the test example output value should be 1.
- The Training Process
The training process consists of the following steps:
1. Forward Propagation:
Take the inputs, multiply by the weights (just use random numbers as weights)
Let Y = WiIi = W1I1+W2I2+W3I32. Back Propagation
Note: Repeat the whole process for a few thousands iterations.
- Recurrent Neural Networks (RNNs)
Recurrent neural networks (RNNs) are a class of neural networks that is powerful for modeling sequence data such as time series or natural language. RNNs are popular models that have shown great promise in many Natural Language Processing (NLP) tasks.
Schematically, a RNN layer uses a for loop to iterate over the timesteps of a sequence, while maintaining an internal state that encodes information about the timesteps it has seen so far.