Basics: Neural Networks – The Autonomous Blog

Historically, we humans have developed models to understand our world. For example, Newton’s second law of motion gives us the equation $F=ma$ , which tells us that a force ‘F’ applied to a static free body in space of mass ‘m’ would accelerate it with amount ‘a’. This is a mathematical model that defines the relationship between Force, mass and acceleration

Mathematical models are used extensively in various aspects of our daily lives. For instance, if we were to build an electric car, mathematical models would inform us about the car’s movement when the steering wheel is turned by a certain amount, how long the battery will last at a specific speed, or how a particular tire will perform on a particular road. Our predictions are based on these known models.

However, there are phenomena for which we do not have mathematical models. For example, we can represent a line as $y=mx+c$ and a circle as $x^2+y^2=c$ , but how do we represent something like the shape of a human face? This is where neural networks come into play. They are used to construct a black box mathematical model that learns from input data. This approach differs greatly from the traditional method of building mathematical models based on science and logic.

Given the vast unknowns in our world, deep neural networks have become instrumental in finding models for problems that are not easily resolvable through traditional mathematics. Tasks such as face detection, image segmentation, object detection, object pose estimation, and object classification in the field of computer vision rely heavily on deep neural networks. These tasks have become possible primarily due to the development of faster GPUs, enabling faster matrix multiplications for Deep Neural Networks.

To comprehend deep neural networks, we must first grasp the concept of neural networks in general. This post will focus on single-layer neural networks and the associated mathematics.

Neural Networks 101

Imagine a neural network as a simplified model of the human brain, with interconnected artificial “neurons” working together to process and learn from data. These networks come in different shapes and sizes, but we’ll focus on a basic form called a single-layer neural network. It consists of an input layer, a hidden layer (sometimes omitted in this case), and an output layer.

Classification

Classification is a fundamental task in neural networks, where the network is trained to categorize input data into specific classes or categories. Let’s consider a simple example of classifying images of fruits as either apples or oranges using a single-layer neural network. In this case, the network takes an image as input, represented by a feature vector x, and assigns it to one of the two classes: 0 for apples and 1 for oranges.

To perform the classification, we need to define the weights and biases of the neural network. Let’s assume we have two input neurons (one for each feature) and one output neuron. The weight between the first input neuron and the output neuron is denoted as w1, and the weight between the second input neuron and the output neuron is denoted as w2. Additionally, we have a bias term b.

The output of the neural network is computed by applying an activation function, such as the sigmoid function (σ), to the weighted sum of the inputs and biases. The sigmoid function maps the result to a value between 0 and 1, which can be interpreted as the probability of the input belonging to a certain class.

Mathematically, the output (probability) of the neural network for an input vector x can be calculated as follows:

output = σ(w1 * x1 + w2 * x2 + b)

For example, if the weights are w1 = 0.5, w2 = -0.3, and the bias is b = 0.2, the output for an input vector x = [0.8, 0.6] would be:

output = σ(0.5 * 0.8 + (-0.3) * 0.6 + 0.2)

After computing the output, we can interpret it as the probability of the input image being an orange. If the output is above a certain threshold, we can classify the image as an orange; otherwise, it would be classified as an apple.

Through training, the neural network learns the appropriate weights and biases that minimize the classification error. This process involves adjusting the weights using techniques like backpropagation and gradient descent to optimize the network’s performance in classifying the input data accurately.

Classification in neural networks allows us to tackle a wide range of problems, from image recognition to sentiment analysis, and plays a crucial role in various applications of machine learning and artificial intelligence.

Regression

Regression is another important task in neural networks, where the network is trained to predict continuous outputs based on input data. Let’s consider a simple example of predicting the price of a house based on its features using a single-layer neural network.

In regression, the goal is to estimate a numeric value as the output, rather than classifying into discrete categories. The neural network takes an input vector x, representing the features of the house, and predicts the corresponding price.

Similar to the classification example, we define the weights and biases of the neural network. Assuming we have two input neurons (representing two features) and one output neuron for price estimation, we denote the weight between the first input neuron and the output neuron as w1 and the weight between the second input neuron and the output neuron as w2. Additionally, we have a bias term b.

The output of the neural network for regression is calculated by applying an activation function that allows for continuous values, such as a linear activation function. The linear activation function simply computes the weighted sum of the inputs and biases.

Mathematically, the output (predicted price) of the neural network for an input vector x can be expressed as:

output = w1 * x1 + w2 * x2 + b

For example, if the weights are w1 = 100, w2 = 50, and the bias is b = -10, the predicted price for an input vector x = [1500 sq. ft, 3 bedrooms] would be:

output = 100 * 1500 + 50 * 3 - 10

After computing the output, we obtain the estimated price of the house.

During training, the neural network learns the optimal weights and biases that minimize the difference between the predicted output and the actual house prices. This process involves adjusting the weights using techniques like backpropagation and gradient descent to optimize the network’s performance in accurately predicting the house prices.

Regression in neural networks allows us to solve various problems, such as stock market prediction, sales forecasting, and climate modeling. It plays a crucial role in making continuous value predictions based on input data, enabling us to gain insights and make informed decisions in a wide range of applications.

Forward Pass: From Inputs to Outputs

During the forward pass, input data travels through the neural network, undergoing a series of calculations. Each connection between neurons is associated with a weight, representing the strength of the connection. These weights determine how much influence each input has on the final output.

Let’s consider a single input neuron, a hidden neuron (if present), and an output neuron. We denote the input as x, the weight between the input and the hidden neuron as w1, and the weight between the hidden neuron and the output neuron as w2. Additionally, each neuron has an associated bias value.

For classification, the output of the neural network can be computed using an activation function, often the sigmoid function (σ), which squashes the result between 0 and 1:

output = σ(w2 * σ(w1 * x + b1) + b2)

For regression, the output can be the result of a linear activation function:

output = w2 * (w1 * x + b1) + b2

Backpropagation: Unraveling the Gradients

Backpropagation is a crucial algorithm that enables neural networks to learn and adjust their weights based on the error produced during the forward pass. It calculates the gradients of the weights by traversing the network in reverse.

To understand backpropagation, we need to introduce the concept of a loss function. This function quantifies the discrepancy between the predicted output and the true output. The goal of backpropagation is to minimize this error by adjusting the weights.

Mathematically, let’s assume our loss function is represented by L. The backpropagation algorithm calculates the partial derivatives of the loss function with respect to the weights. These derivatives, also known as gradients, indicate the direction and magnitude of weight updates needed to reduce the error.

For simplicity, let’s consider a single-layer neural network with one input neuron, one hidden neuron, and one output neuron. The weight between the input and hidden neuron is denoted as w1, and the weight between the hidden and output neuron is denoted as w2.

During backpropagation, we update the weights using the chain rule of calculus, which allows us to propagate the error backward through the network. The weight update equations can be expressed as:

delta\_w1 = learning\_rate * x * (output\_error * w2 * σ'(w1 * x + b1))

delta\_w2 = learning\_rate * hidden\_output * output\_error

Here, learning_rate is the hyperparameter that controls the step size of weight updates. The output_error represents the derivative of the loss function with respect to the network’s output. The $σ'$ represents the derivative of the activation function.

By updating the weights using these equations, the neural network iteratively learns and adjusts its weights, gradually minimizing the error and improving its performance.

Conclusion

Neural networks, backpropagation, and gradient descent are powerful tools that have revolutionized the field of machine learning. In this article, we explored the fundamentals of single-layer neural networks, both for classification and regression tasks. We demystified the forward pass, where data flows through the network, and introduced the concept of backpropagation, where gradients are computed and used to update the weights.

Backpropagation allows us to calculate the gradients, representing the direction and magnitude of weight updates needed to minimize the error. By leveraging gradient descent, neural networks iteratively adjust their weights, improving their performance.

While we only scratched the surface of this vast and exciting topic, we hope this article provided a solid foundation for your understanding of neural networks, backpropagation, and gradient descent. As you continue your journey, remember to explore more advanced architectures, activation functions, and optimization techniques to unleash the full potential of artificial intelligence.

Keep learning, stay curious, and hope you all have some happy neural adventures ahead! ✌🏼