This is an old revision of the document!

Supervised learning: tagging. http://stanford.io/2nRlxxp
- Training with all data, tagging it so it can predict future events. Example: train raspberry pi so it can recognise bird images captured with the camera.
Semi-supervised learning: reinforcement learning.
- it does not require training data. But a lot of Try and Error instead.
Unsupervised learning: Discovering patterns in unlabelled data
- Is all about clustering data and inferring relationships.
- k-Means clustering

Deep Learning (ie: neuronal networks) http://stanford.io/2BsQ91Q
- Layers: Input, Hidden, Output. But also Bias input (poking the hidden layers)

Reinforcement Learning: BEYOND SELF SUPERVISION TODO

Train the model but also transfer learning: reuse existing models.

Tips and tricks: http://stanford.io/2MEHwFM

For model complexity
- low: bias (flat line(
- high: a lot of variance (adjust data a lot, not good either

BOOK Oreilly: 'Applied Machine Learning and AI for Engineers' Jeff Proise github « book
- /Documents/PLURALSIGHT/datascience/Applied-Machine-Learning-main
- source /Users/santosj/Documents/PLURALSIGHT/datascience/bin/activate
- jupyter notebook /Users/santosj/Documents/PLURALSIGHT/datascience/Applied-Machine-Learning-main
https://github.com/javier-antich/ml4nce/blob/main/UC2/UC2-multivariate-outlier-detection.ipynb

Future reading: Machine Learning for Network and Cloud Engineers External Link
Oreilly: Machine Learning with scikit-learn David Mertz github
- http://localhost:8888/notebooks/WhatIsML.ipynb
Oreilly: Wide overview by Rob Barton, Jerome Henry : ml_fundamentals.pfd.pdf Rob Barton, Jerome « DONE
Oreilly: models and misfits by Dr Mark Fenner : ml_models_misfits_drmarkfenner.pdf ; https://github.com/mfenner1/mlwpy_live « DONE

Managed datasets with panda's and scikit-learn
- Link

convolution studies how a shape is modified by another)
cnn relu cnn relu cnn …

Classical Training Steps for Neural Networks

Training a neural network involves four key steps:

1. Forward Propagation

Input data passes through each layer of the network, applying weights, biases, and activation functions, to produce a prediction at the output layer.

2. Loss Calculation

The model’s prediction is compared to the actual label using a loss function (e.g., cross-entropy for classification). This produces a scalar loss value representing the prediction error.

3. Backpropagation & Gradient Calculation

Using backpropagation, gradients of the loss are calculated with respect to each weight and bias. The chain rule is applied layer by layer, from output back to input, determining how much each parameter contributed to the loss.

4. Weight Update (Gradient Descent)

Gradients are used to adjust weights and biases in the direction that reduces the loss. This is done via gradient descent or variants like Adam, guided by the learning rate.

These steps repeat over many data samples, improving the network’s performance through each iteration.

AI HARDWARE - GPUs

AMD Instinct MI series
Amazon's Inferentia (for machine learning inference on AWS)
Google's TPUs (Tensor Processing Units, custom hardware for Google’s machine learning tasks)
Intel Gaudi (designed for deep learning training)
NVIDIA GPUs (e.g., A100, H100, used for training and inference in deep learning applications)
NVIDIA Tensor Cores (hardware feature within NVIDIA GPUs, optimized for mixed-precision AI workloads)

Current practical models (is important to check they support Ollama)
https://github.com/ollama/ollama/blob/main/docs/gpu.md

Nvidia H100
48 GB Nvidia RTX 6000 Ada graphics card

Attention mechanism (just a formula that makes easier for training models)
Transformer architecture (hugging face created it)
- transformers are created in the attention mechanism.
  - precursor was tensorflow-hug

PRACTICAL NOTES ON MODELS:

Models multiply matrices.
Those matrices are multi-dimensionals : tensors
- They are made of weight and bias « When defining a model weight and bias are called, generically, parameters.
- Eg: 100B (all tensor's bias and weights, added together)
HF transformers library is ~different from transformers architecture. HF's is framework for loading, training, fine-tuning, and deploying transformer models across NLP and vision tasks. It provides access to thousands of pretrained models, simplifies workflows with task-specific pipelines, and supports custom training on new datasets. Beyond downloading models, Transformers enables production-ready deployment with optimizations for diverse hardware

HUGGINGFACE

models, datasets and prototypes
open-source and open-weight
we can download pre-trained Llama, via ollama and then fine-tune it.
- One of the reason is so it identifies patterns better (tex, images…). This process is called embedding (Embeddings capture the inherent properties and relationships of the original data in a condensed format and are often used in Machine Learning use cases. See Link « Better classification
  - embedding: phrases in » vectors out

Mixture of Experts (MoE)

Is a neural network design that activates only a few specialised sub-models (experts) per input, based on a gating mechanism. This allows models to scale to massive sizes efficiently, improving performance while reducing compute costs by avoiding the need to use the entire model every time.

dokucama

Table of Contents

Classical Training Steps for Neural Networks

1. Forward Propagation

2. Loss Calculation

3. Backpropagation & Gradient Calculation

4. Weight Update (Gradient Descent)

AI HARDWARE - GPUs

HUGGINGFACE

Mixture of Experts (MoE)

dokucama

User Tools

Site Tools

Table of Contents

Classical Training Steps for Neural Networks

1. Forward Propagation

2. Loss Calculation

3. Backpropagation & Gradient Calculation

4. Weight Update (Gradient Descent)

AI HARDWARE - GPUs

HUGGINGFACE

Mixture of Experts (MoE)

Page Tools