User Tools

Site Tools


network_stuff:machine_learning

This is an old revision of the document!


NOTES ABOUT MACHINE LEARNING AND ARTIFICIAL INTELLIGENCE AI

Notes:
Vectors and matrices are basic for machine learning.

  • Supervised learning: tagging. http://stanford.io/2nRlxxp
    • Training with all data, tagging it so it can predict future events. Example: train raspberry pi so it can recognise bird images captured with the camera.
  • Semi-supervised learning: reinforcement learning.
    • it does not require training data. But a lot of Try and Error instead.
  • Unsupervised learning: Discovering patterns in unlabelled data
    • Is all about clustering data and inferring relationships.
    • k-Means clustering

  • Deep Learning (ie: neuronal networks) http://stanford.io/2BsQ91Q
    • Layers: Input, Hidden, Output. But also Bias input (poking the hidden layers)


  • Reinforcement Learning: BEYOND SELF SUPERVISION TODO


  • Train the model but also transfer learning: reuse existing models.


  • For model complexity
    • low: bias (flat line(
    • high: a lot of variance (adjust data a lot, not good either




  • Managed datasets with panda's and scikit-learn
  • convolution studies how a shape is modified by another)
  • cnn relu cnn relu cnn …

Classical Training Steps for Neural Networks

Training a neural network involves four key steps:

1. Forward Propagation

Input data passes through each layer of the network, applying weights, biases, and activation functions, to produce a prediction at the output layer.

2. Loss Calculation

The model’s prediction is compared to the actual label using a loss function (e.g., cross-entropy for classification). This produces a scalar loss value representing the prediction error.

3. Backpropagation & Gradient Calculation

Using backpropagation, gradients of the loss are calculated with respect to each weight and bias. The chain rule is applied layer by layer, from output back to input, determining how much each parameter contributed to the loss.

4. Weight Update (Gradient Descent)

Gradients are used to adjust weights and biases in the direction that reduces the loss. This is done via gradient descent or variants like Adam, guided by the learning rate.

These steps repeat over many data samples, improving the network’s performance through each iteration.


AI HARDWARE - GPUs

  • AMD Instinct MI series
  • Amazon's Inferentia (for machine learning inference on AWS)
  • Google's TPUs (Tensor Processing Units, custom hardware for Google’s machine learning tasks)
  • Intel Gaudi (designed for deep learning training)
  • NVIDIA GPUs (e.g., A100, H100, used for training and inference in deep learning applications)
  • NVIDIA Tensor Cores (hardware feature within NVIDIA GPUs, optimized for mixed-precision AI workloads)

Current practical models (is important to check they support Ollama)
https://github.com/ollama/ollama/blob/main/docs/gpu.md

  • Nvidia H100
  • 48 GB Nvidia RTX 6000 Ada graphics card

  • Attention mechanism (just a formula that makes easier for training models)
  • Transformer architecture (hugging face created it)
    • transformers are created in the attention mechanism.
      • precursor was tensorflow-hug

PRACTICAL NOTES ON MODELS:

  • Models multiply matrices.
  • Those matrices are multi-dimensionals : tensors
    • They are made of weight and bias « When defining a model weight and bias are called, generically, parameters.
    • Eg: 100B (all tensor's bias and weights, added together)
  • HF transformers library is ~different from transformers architecture. HF's is framework for loading, training, fine-tuning, and deploying transformer models across NLP and vision tasks. It provides access to thousands of pretrained models, simplifies workflows with task-specific pipelines, and supports custom training on new datasets. Beyond downloading models, Transformers enables production-ready deployment with optimizations for diverse hardware

HUGGINGFACE

  • models, datasets and prototypes
  • open-source and open-weight
  • we can download pre-trained Llama, via ollama and then fine-tune it.
    • One of the reason is so it identifies patterns better (tex, images…). This process is called embedding (Embeddings capture the inherent properties and relationships of the original data in a condensed format and are often used in Machine Learning use cases. See Link « Better classification
      • embedding: phrases in » vectors out

Mixture of Experts (MoE)

Is a neural network design that activates only a few specialised sub-models (experts) per input, based on a gating mechanism. This allows models to scale to massive sizes efficiently, improving performance while reducing compute costs by avoiding the need to use the entire model every time.

network_stuff/machine_learning.1745873559.txt.gz · Last modified: by jotasandoku