User Tools

Site Tools


network_stuff:machine_learning

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
network_stuff:machine_learning [2025/04/28 20:52] jotasandokunetwork_stuff:machine_learning [2025/05/12 18:52] (current) jotasandoku
Line 1: Line 1:
 +[[https://camarreal.duckdns.org/doku.php?id=network_stuff:machine_learning|ML]]  ;  [[https://camarreal.duckdns.org/doku.php?id=network_stuff:machine_learning:networking|network-for-ML-workload]]
 +
 __NOTES ABOUT MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE AI__ __NOTES ABOUT MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE AI__
 \\ \\
Line 71: Line 73:
 ====== Classical Training Steps for Neural Networks ====== ====== Classical Training Steps for Neural Networks ======
  
-Training a neural network involves four key steps:+Training a neural network is a structured, iterative process that allows the model to learn patterns and relationships in data. The goal is to optimise the model’s internal parameters (weights and biases) so it can make accurate predictions. This process consists of several key steps that repeat over many cycles (epochs), each refining the model’s understanding. The major steps are **forward propagation**, **loss calculation**, **backpropagation with gradient calculation**, and **weight updates via gradient descent**. Below is a detailed breakdown of each step:
  
 ===== 1. Forward Propagation ===== ===== 1. Forward Propagation =====
-Input data passes through each layer of the network, applying weightsbiases, and activation functions, to produce a prediction at the output layer.+In forward propagation, input data is passed through the layers of the neural network to produce an output or prediction.  
 + 
 +The process begins at the **input layer**where raw data (such as numerical featurespixel values, or word embeddings) enters the network. This data is then processed through one or more **hidden layers**, each consisting of multiple **neurons**.  
 + 
 +Each neuron in a layer applies a **weighted sum** to the inputs it receives from the previous layer, adds a **bias term**, and then applies an **activation function** (such as ReLUsigmoid, or tanh) to introduce non-linearity. This non-linear transformation allows the network to model complex patterns and relationships in the data, which would not be possible with simple linear transformations. 
 + 
 +The transformed data continues to propagate through the layers until it reaches the **output layer**, which produces the final prediction. In a classification task, the output might be a set of probabilities indicating the likelihood of each class, while in a regression task, it could be a continuous value.
  
 ===== 2. Loss Calculation ===== ===== 2. Loss Calculation =====
-The model’s prediction is compared to the actual label using a loss function (e.g., cross-entropy for classification). This produces a scalar loss value representing the prediction error.+After forward propagation, the network has produced an output, but it still needs to know how accurate that output is. This is done through **loss calculation**, where the prediction is compared to the **actual target value** (also known as the ground truth or label).
  
-===== 3. Backpropagation Gradient Calculation ===== +A **loss function** is used to quantify the difference between the predicted output and the true value. The choice of loss function depends on the type of problem: 
-Using backpropagation, gradients of the loss are calculated with respect to each weight and biasThe chain rule is applied layer by layer, from output back to input, determining how much each parameter contributed to the loss.+  * For **regression tasks**, common loss functions include **Mean Squared Error (MSE)**, which penalises larger errors more heavily. 
 +  * For **classification tasks**, **Cross-Entropy Loss** is commonly used, which measures the difference between two probability distributions (the predicted probabilities and the true labels). 
 + 
 +The loss function outputs a **scalar value** that represents how far off the model’s prediction was. A lower loss indicates better performance for that sample, while a higher loss signals a poor prediction. 
 + 
 +===== 3. Backpropagation and Gradient Calculation ===== 
 +Once the loss is calculatedthe network must determine how to adjust its **parameters** (weights and biases) to reduce this error in future predictions. This adjustment process requires knowing how sensitive the loss is to each parameter. This is achieved through **backpropagation**. 
 + 
 +**Backpropagation** is an algorithm that computes the **gradients** of the loss function with respect to each parameter in the networkIt applies the **chain rule of calculus** to systematically calculate these gradients by moving backwards through the network, from the **output layer** to the **input layer**. 
 + 
 +For each neuron and its parameters: 
 +  * It calculates how much a small change in that parameter would affect the loss
 +  * This gradient tells the network whether increasing or decreasing that parameter would reduce the loss. 
 + 
 +The result of backpropagation is a collection of gradients for all weights and biases, which provide the direction and magnitude of change needed to improve the model’s performance.
  
 ===== 4. Weight Update (Gradient Descent) ===== ===== 4. Weight Update (Gradient Descent) =====
-Gradients are used to adjust weights and biases in the direction that reduces the loss. This is done via gradient descent or variants like Adamguided by the learning rate.+With the gradients calculated, the network updates its parameters to reduce the loss. This is done using an **optimisation algorithm**, most commonly **gradient descent**. 
 + 
 +In **gradient descent**each parameter is updated by moving it slightly in the **opposite direction of its gradient** (because gradients point towards the direction of increasing loss). The **learning rate** is a hyperparameter that determines how large these updates are; if it's too large, the model might overshoot the optimal values, while if it's too small, training might be very slow. 
 + 
 +There are several variations of gradient descent: 
 +  * **Stochastic Gradient Descent (SGD)**: updates parameters using one data point at a time. 
 +  * **Mini-Batch Gradient Descent**: updates parameters using small batches of data. 
 +  * **Adam Optimiser**: an adaptive learning rate method that adjusts the learning rate for each parameter individually based on past gradients. 
 + 
 +This weight update step allows the network to improve its predictions over time. After the update, the training loop returns to forward propagation with the next data sample or batch, repeating the process over many iterations. 
 + 
 +===== Summary ===== 
 +These steps—**forward propagation**, **loss calculation**, **backpropagation with gradient calculation**, and **weight updates**—are repeated across thousands or millions of data samples over multiple **epochs**. This iterative process gradually fine-tunes the network’s parameters, allowing it to learn complex patterns and make accurate predictions. 
 + 
 +As training progresses, the loss typically decreases, indicating that the model’s predictions are improving. Once the model reaches an acceptable level of performance, training can be stopped, and the network is ready for inference on unseen data.
  
-These steps repeat over many data samples, improving the network’s performance through each iteration. 
  
 ---- ----
network_stuff/machine_learning.1745873559.txt.gz · Last modified: by jotasandoku