Solving the XOR problem using MLP
In this blog post, we shall cover the basics of what the XOR problem is, and how we can solve it using MLP.
What is XOR?
Exclusive or is a logical operation that outputs true when the inputs differ.
For the XOR gate, the TRUTH table will be as follows
XOR is a classification problem, as it renders binary distinct outputs. If we plot the INPUTS vs OUTPUTS for the XOR gate, it would look something like
The graph plots the two inputs corresponding to their output. Visualizing this plot, we can see that it is impossible to separate the different outputs (1 and 0) using a linear equation.
To separate the two outputs using linear equation(s), we would need to draw two separate lines like
The above graph perfectly shows why these outputs cannot be separated using a single linear equation. This was a major problem with the initial perceptrons (single layer approach).
What is the XOR problem?
As we have seen above, it is impossible to separate the XOR outputs using just a single linear equation. This is a major problem as during the training of machines, for optimized outputs, the machine is expected to form the mathematical equations on its own.
For a problem resembling the outputs of XOR, it was impossible for the machine to set up an equation for good outputs. This is what led to the birth of the concept of hidden layers which are extensively used in Artificial Neural Networks.
Let’s call the output to be Y, so
Y = A1X1 + A2X2 + A3X3 + …. + B
Here B is the bias, and A1, A2, A3 are the weights. Weights are used to control the signal (strength of the connection) of the connection.
Y can also be called the weighted sum.
The information flow inside a perceptron is a feed-forward type, meaning that the signal flows in a single direction from the input layer to the output layer. All the input layers are independent of each other.
The variation in the weight variables controls the process of conversion of the input values to the output values.
The main limitation of a single-layer architecture (perceptrons) is that it separates the data points using a single line. This has a drawback in a problem similar to the XOR problem, as the data points are linearly inseparable.
How is the XOR problem solved?
The solution to the XOR problem lies in multidimensional analysis. We plug in numerous inputs in various layers of interpretation and processing, to generate the optimum outputs.
The inner layers for deeper processing of the inputs are known as hidden layers. The hidden layers are not dependent on any other layers. This architecture is known as Multilayer Perceptron (MLP).
The number of layers in MLP is not fixed and thus can have any number of hidden layers for processing. In the case of MLP, the weights are defined for each hidden layer, which transfers the signal to the next proceeding layer.
Using the MLP approach lets us dive into more than two dimensions, which in turn lets us separate the outputs of XOR using multidimensional equations.
Each hidden unit invokes an activation function, to range down their output values to 0 or 1.
The MLP approach also lies in the class of feed-forward Artificial Neural Network, and thus can only communicate in one direction. MLP solves the XOR problem efficiently by visualizing the data points in multi-dimensions and thus constructing an n-variable equation to fit in the output values.
In this blog, we read about the popular XOR problem and how it is solved by using multi-layered perceptrons. These problems give a sense of understanding of how deep neural networks work to solve complex problems.