This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| network_stuff:machine_learning [2025/04/28 20:52] – jotasandoku | network_stuff:machine_learning [2025/05/12 18:52] (current) – jotasandoku | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | [[https:// | ||
| + | |||
| __NOTES ABOUT MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE AI__ | __NOTES ABOUT MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE AI__ | ||
| \\ | \\ | ||
| Line 71: | Line 73: | ||
| ====== Classical Training Steps for Neural Networks ====== | ====== Classical Training Steps for Neural Networks ====== | ||
| - | Training a neural network | + | Training a neural network |
| ===== 1. Forward Propagation ===== | ===== 1. Forward Propagation ===== | ||
| - | Input data passes | + | In forward propagation, |
| + | |||
| + | The process begins at the **input layer**, where raw data (such as numerical features, pixel values, or word embeddings) enters the network. This data is then processed through one or more **hidden layers**, each consisting of multiple **neurons**. | ||
| + | |||
| + | Each neuron in a layer applies a **weighted sum** to the inputs it receives from the previous layer, adds a **bias term**, and then applies an **activation | ||
| + | |||
| + | The transformed data continues to propagate through the layers until it reaches the **output layer**, which produces the final prediction. In a classification task, the output might be a set of probabilities indicating the likelihood of each class, while in a regression task, it could be a continuous value. | ||
| ===== 2. Loss Calculation ===== | ===== 2. Loss Calculation ===== | ||
| - | The model’s | + | After forward propagation, |
| - | ===== 3. Backpropagation | + | A **loss function** is used to quantify the difference between the predicted output and the true value. The choice of loss function depends on the type of problem: |
| - | Using backpropagation, gradients of the loss are calculated | + | * For **regression tasks**, common loss functions include **Mean Squared Error (MSE)**, which penalises larger errors more heavily. |
| + | * For **classification tasks**, **Cross-Entropy Loss** is commonly used, which measures the difference between two probability distributions (the predicted probabilities and the true labels). | ||
| + | |||
| + | The loss function outputs a **scalar value** that represents how far off the model’s prediction was. A lower loss indicates better performance for that sample, while a higher loss signals a poor prediction. | ||
| + | |||
| + | ===== 3. Backpropagation | ||
| + | Once the loss is calculated, the network must determine how to adjust its **parameters** (weights and biases) to reduce this error in future predictions. This adjustment process requires knowing how sensitive the loss is to each parameter. This is achieved through **backpropagation**. | ||
| + | |||
| + | **Backpropagation** is an algorithm that computes the **gradients** of the loss function | ||
| + | |||
| + | For each neuron and its parameters: | ||
| + | * It calculates | ||
| + | * This gradient tells the network whether increasing or decreasing that parameter would reduce the loss. | ||
| + | |||
| + | The result of backpropagation is a collection of gradients for all weights and biases, which provide the direction and magnitude of change needed to improve the model’s performance. | ||
| ===== 4. Weight Update (Gradient Descent) ===== | ===== 4. Weight Update (Gradient Descent) ===== | ||
| - | Gradients are used to adjust weights and biases in the direction that reduces | + | With the gradients calculated, the network updates its parameters to reduce |
| + | |||
| + | In **gradient descent**, each parameter is updated | ||
| + | |||
| + | There are several variations of gradient descent: | ||
| + | * **Stochastic Gradient Descent (SGD)**: updates parameters using one data point at a time. | ||
| + | * **Mini-Batch Gradient Descent**: updates parameters using small batches of data. | ||
| + | * **Adam Optimiser**: | ||
| + | |||
| + | This weight update step allows the network to improve its predictions over time. After the update, the training loop returns to forward propagation with the next data sample or batch, repeating the process over many iterations. | ||
| + | |||
| + | ===== Summary ===== | ||
| + | These steps—**forward propagation**, | ||
| + | |||
| + | As training progresses, the loss typically decreases, indicating that the model’s predictions are improving. Once the model reaches an acceptable level of performance, | ||
| - | These steps repeat over many data samples, improving the network’s performance through each iteration. | ||
| ---- | ---- | ||