This is an old revision of the document!
ML ; network-for-ML-workload}}
NOTES ABOUT MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE AI
Notes:
Vectors and matrices are basic for machine learning.
transfer learning: reuse existing models.
/Documents/PLURALSIGHT/datascience/Applied-Machine-Learning-mainsource /Users/santosj/Documents/PLURALSIGHT/datascience/bin/activatejupyter notebook /Users/santosj/Documents/PLURALSIGHT/datascience/Applied-Machine-Learning-mainTraining a neural network is a structured, iterative process that allows the model to learn patterns and relationships in data. The goal is to optimise the model’s internal parameters (weights and biases) so it can make accurate predictions. This process consists of several key steps that repeat over many cycles (epochs), each refining the model’s understanding. The major steps are forward propagation, loss calculation, backpropagation with gradient calculation, and weight updates via gradient descent. Below is a detailed breakdown of each step:
In forward propagation, input data is passed through the layers of the neural network to produce an output or prediction.
The process begins at the input layer, where raw data (such as numerical features, pixel values, or word embeddings) enters the network. This data is then processed through one or more hidden layers, each consisting of multiple neurons.
Each neuron in a layer applies a weighted sum to the inputs it receives from the previous layer, adds a bias term, and then applies an activation function (such as ReLU, sigmoid, or tanh) to introduce non-linearity. This non-linear transformation allows the network to model complex patterns and relationships in the data, which would not be possible with simple linear transformations.
The transformed data continues to propagate through the layers until it reaches the output layer, which produces the final prediction. In a classification task, the output might be a set of probabilities indicating the likelihood of each class, while in a regression task, it could be a continuous value.
After forward propagation, the network has produced an output, but it still needs to know how accurate that output is. This is done through loss calculation, where the prediction is compared to the actual target value (also known as the ground truth or label).
A loss function is used to quantify the difference between the predicted output and the true value. The choice of loss function depends on the type of problem:
The loss function outputs a scalar value that represents how far off the model’s prediction was. A lower loss indicates better performance for that sample, while a higher loss signals a poor prediction.
Once the loss is calculated, the network must determine how to adjust its parameters (weights and biases) to reduce this error in future predictions. This adjustment process requires knowing how sensitive the loss is to each parameter. This is achieved through backpropagation.
Backpropagation is an algorithm that computes the gradients of the loss function with respect to each parameter in the network. It applies the chain rule of calculus to systematically calculate these gradients by moving backwards through the network, from the output layer to the input layer.
For each neuron and its parameters:
The result of backpropagation is a collection of gradients for all weights and biases, which provide the direction and magnitude of change needed to improve the model’s performance.
With the gradients calculated, the network updates its parameters to reduce the loss. This is done using an optimisation algorithm, most commonly gradient descent.
In gradient descent, each parameter is updated by moving it slightly in the opposite direction of its gradient (because gradients point towards the direction of increasing loss). The learning rate is a hyperparameter that determines how large these updates are; if it's too large, the model might overshoot the optimal values, while if it's too small, training might be very slow.
There are several variations of gradient descent:
This weight update step allows the network to improve its predictions over time. After the update, the training loop returns to forward propagation with the next data sample or batch, repeating the process over many iterations.
These steps—forward propagation, loss calculation, backpropagation with gradient calculation, and weight updates—are repeated across thousands or millions of data samples over multiple epochs. This iterative process gradually fine-tunes the network’s parameters, allowing it to learn complex patterns and make accurate predictions.
As training progresses, the loss typically decreases, indicating that the model’s predictions are improving. Once the model reaches an acceptable level of performance, training can be stopped, and the network is ready for inference on unseen data.
Current practical models (is important to check they support Ollama)
https://github.com/ollama/ollama/blob/main/docs/gpu.md
PRACTICAL NOTES ON MODELS:
tensorsweight and bias « When defining a model weight and bias are called, generically, parameters.embedding (Embeddings capture the inherent properties and relationships of the original data in a condensed format and are often used in Machine Learning use cases. See Link « Better classificationIs a neural network design that activates only a few specialised sub-models (experts) per input, based on a gating mechanism. This allows models to scale to massive sizes efficiently, improving performance while reducing compute costs by avoiding the need to use the entire model every time.