Foundations & Math for Machine Learning

Objectives: Foundations & Math for Machine Learning

Machine Learning Mathematics Notes

Mathematics for Machine Learning

These notes cover all essential mathematics for Machine Learning (ML), explaining formulas, why they are used, and their applications.

1. Linear Algebra / Aljebra ya Mistari

Linear Algebra is fundamental in ML for handling datasets, features, and transformations. It involves vectors, matrices, and operations.

Aljebra ya Mistari ni msingi muhimu katika ML kwa kushughulikia datasets, features, na transformations. Inahusisha vectors, matrices, na operations.

1.1 Vectors / Vectors

  • Vector: Ordered list of numbers v = [v1, v2, ..., vn]
  • Norm (Magnitude) / Norm (Ukubwa):
    ||v|| = √(v1² + v2² + ... + vn²)
    Norm inatuonyesha ukubwa wa vector, muhimu katika normalization.
  • Dot Product / Bidhaa ya Dot:
    u · v = u1*v1 + u2*v2 + ... + un*vn
    Inatumika kupima similarity kati ya features mbili.

1.2 Matrices / Matrices

  • Matrix multiplication:
    C = A × B
  • Transpose:
    A^T
  • Inverse:
    A^(-1)
    (if exists)
  • Applications:
    • Linear regression: Xβ = y
    • Neural networks: weights as matrices

2. Calculus / Hisabati ya Mabadiliko

Calculus is used in ML for optimization, such as minimizing loss functions.

Hisabati ya mabadiliko hutumika katika ML kwa optimization, kama kupunguza loss functions.

2.1 Derivatives / Mchoro wa Kwanza

  • Definition:
    f'(x) = lim(h→0) (f(x+h) - f(x)) / h
  • Gradient Descent (Optimization method):
    θ = θ - α * ∇θ J(θ)
    Kutumia derivatives kupunguza loss function.

2.2 Partial Derivatives / Mchoro Sehemu

  • Used in multivariable functions like neural networks.
  • Example:
    ∂f/∂x, ∂f/∂y
    Hii inatuwezesha kuboresha kila parameter kwa tofauti tofauti.

3. Probability & Statistics / Uwezekano na Takwimu

Probability and statistics are core to ML models for prediction and inference.

Uwezekano na takwimu ni msingi kwa modeli za ML kwa utabiri na inference.

3.1 Probability / Uwezekano

  • P(A) = Probability of event A
  • Bayes Theorem:
    P(A|B) = (P(B|A) * P(A)) / P(B)
    Inatumika katika Naive Bayes classifiers.

3.2 Distributions / Mgawanyo

  • Gaussian / Normal Distribution:
    f(x) = (1/(σ√(2π))) * e^(-(x-μ)²/(2σ²))
    Inatumika kwa linear regression assumptions na probabilistic models.
  • Bernoulli, Binomial, Poisson distributions – for classification and count predictions

4. Linear Regression / Uhusiano wa Mistari

Linear regression predicts continuous output using a linear combination of input features.

Linear regression hutabiri output endelevu kwa kutumia mchanganyiko wa mistari wa features.

y = β0 + β1*x1 + β2*x2 + ... + βn*xn + ε

Loss function (Mean Squared Error):

MSE = (1/n) Σ (yi - ŷi)²
Inapima tofauti kati ya prediction na value halisi.

5. Logistic Regression / Uhusiano wa Logistic

Used for binary classification problems.

Inatumiwa kwa matatizo ya uainishaji wa binary.

ŷ = 1 / (1 + e^(-z)), z = β0 + β1*x1 + ... + βn*xn

Cost function:

J(θ) = -1/m Σ [y log ŷ + (1-y) log (1-ŷ)]

6. Gradient Descent / Kupungua kwa Hatua

  • Optimization algorithm for minimizing loss functions
  • Update rule:
    θ = θ - α * ∇θ J(θ)
    Tuna badilisha parameters polepole hadi tufikie loss ndogo zaidi.

7. Linear Algebra in Neural Networks / Aljebra ya Mistari kwenye Neural Networks

  • Forward Propagation:
    Z = XW + b, A = f(Z)
  • Backward Propagation: using gradients of weights to update using gradient descent

8. Additional Topics / Mada Nyingine Muhimu

  • Eigenvalues & Eigenvectors – Principal Component Analysis (PCA) for dimensionality reduction
  • Singular Value Decomposition (SVD) – used in recommender systems
  • Norms (L1, L2) – Regularization in regression and neural networks
  • Covariance & Correlation – understanding relationships between features

Summary / Muhtasari

Mastering the above mathematics allows one to understand, design, and optimize ML algorithms from scratch.

Kujua hesabu hizi kunakuwezesha kuelewa, kubuni, na kuboresha algorithms za ML kutoka mwanzo.

Machine Learning Full Mathematics Notes

Machine Learning Mathematics - Complete Formulas (>50 Formulas)

These notes contain all important mathematics formulas in Machine Learning (ML), their uses, and examples.

1. Linear Algebra / Aljebra ya Mistari

  • Vector norm:
    ||v|| = √(Σ vi²)
  • Dot product:
    u · v = Σ ui*vi
  • Cross product (3D):
    u × v = |i j k; u1 u2 u3; v1 v2 v3|
  • Matrix multiplication:
    C = A × B
  • Matrix transpose:
    A^T
  • Matrix inverse:
    A^(-1)
  • Trace:
    Tr(A) = Σ a_ii
  • Determinant:
    det(A)
  • Eigenvalue/eigenvector:
    Av = λv
  • Frobenius norm:
    ||A||_F = √ΣΣ a_ij²

2. Calculus / Hisabati ya Mabadiliko

  • Derivative:
    f'(x) = lim(h→0) (f(x+h)-f(x))/h
  • Partial derivative:
    ∂f/∂x
  • Gradient vector:
    ∇f = [∂f/∂x1, ∂f/∂x2, ..., ∂f/∂xn]
  • Hessian matrix:
    H = [[∂²f/∂xi∂xj]]
  • Chain rule:
    d(f(g(x)))/dx = f'(g(x)) * g'(x)
  • Integration:
    ∫ f(x) dx
  • Definite integral:
    ∫_a^b f(x) dx

3. Probability & Statistics / Uwezekano na Takwimu

  • Probability:
    P(A) = n(A)/n(S)
  • Conditional probability:
    P(A|B) = P(A∩B)/P(B)
  • Bayes theorem:
    P(A|B) = P(B|A)P(A)/P(B)
  • Mean:
    μ = (Σ xi)/n
  • Variance:
    σ² = (Σ(xi-μ)²)/n
  • Standard deviation:
    σ = √σ²
  • Covariance:
    cov(X,Y) = Σ(xi-μx)(yi-μy)/n
  • Correlation coefficient:
    ρ = cov(X,Y)/(σxσy)
  • Gaussian distribution:
    f(x) = (1/(σ√(2π))) e^(-(x-μ)²/(2σ²))
  • Bernoulli distribution:
    P(X=1)=p, P(X=0)=1-p
  • Binomial distribution:
    P(X=k) = C(n,k)p^k(1-p)^(n-k)
  • Poisson distribution:
    P(X=k) = λ^k e^-λ / k!

4. Linear & Logistic Regression / Uhusiano wa Mistari na Logistic

  • Linear regression:
    y = β0 + β1x1 + ... + βnxn + ε
  • Mean Squared Error (MSE):
    MSE = (1/n) Σ(yi-ŷi)²
  • Gradient update:
    βj = βj - α ∂MSE/∂βj
  • Logistic function:
    σ(z) = 1/(1+e^-z)
  • Logistic regression cost:
    J(θ) = -1/m Σ [y log ŷ + (1-y) log(1-ŷ)]

5. Gradient Descent / Kupungua kwa Hatua

  • Update rule:
    θ = θ - α ∇θ J(θ)
  • Stochastic Gradient Descent: update per sample
  • Mini-batch Gradient Descent: update per small batch
  • Learning rate adjustment:
    α_t = α0 / (1+ decay*t)

6. Neural Networks / Mitandao ya Neuron

  • Forward pass:
    Z = XW + b
  • Activation functions:
    Sigmoid: σ(z)=1/(1+e^-z)
    ReLU: f(z)=max(0,z)
    Tanh: tanh(z)=(e^z-e^-z)/(e^z+e^-z)
  • Loss functions:
    MSE, Cross-Entropy: L = -Σ yi log ŷi
  • Backpropagation:
    ∂L/∂W = δ * X^T
  • Weight update:
    W = W - α ∂L/∂W

7. Regularization / Kuzuia Overfitting

  • L1 Regularization (Lasso):
    J(θ) = MSE + λ Σ|θj|
  • L2 Regularization (Ridge):
    J(θ) = MSE + λ Σθj²
  • Elastic Net: combination of L1 & L2

8. Dimensionality Reduction / Kupunguza Dimensionality

  • PCA: maximize variance
    Z = XW, W = eigenvectors of covariance matrix
  • SVD:
    X = UΣV^T

9. Support Vector Machines (SVM) / Mashine za Msaada wa Vector

  • Hyperplane:
    w·x + b = 0
  • Margin:
    Margin = 2 / ||w||
  • Hinge loss:
    L = max(0, 1 - y(w·x + b))

10. Clustering / Kundi la Data

  • K-Means: update centroids
    μk = (1/|Ck|) Σ xi in Ck
  • Distance metrics: Euclidean:
    d(x,y) = √Σ(xi-yi)²
  • Cosine similarity:
    cos θ = (A·B)/||A|| ||B||

11. Advanced Optimization / Uboreshaji wa Juu

  • Newton's Method:
    θ = θ - H^-1 ∇J(θ)
  • Adam Optimizer updates:
    m_t = β1*m_{t-1} + (1-β1)∇J(θ)
    v_t = β2*v_{t-1} + (1-β2)(∇J(θ))²
    θ = θ - α * m_t / (√v_t + ε)

12. Summary / Muhtasari

These 50+ formulas cover **everything a student or developer needs to understand mathematics behind Machine Learning**, including linear algebra, calculus, probability, regression, classification, neural networks, SVM, clustering, and optimization.

Formulas hizi 50+ zinashughulikia kila kitu kinachohitajika kuelewa hesabu nyuma ya Machine Learning, ikijumuisha aljebra ya mistari, hisabati ya mabadiliko, uwezekano, regression, classification, neural networks, SVM, clustering, na optimization.

Complete Mathematics for Machine Learning

Complete Mathematics for Machine Learning

Hisabati Kamili kwa Kujifunza Mashine

1. Probability & Statistics

Uwezekano & Takwimu

1.1 Probability Basics

An event E is a subset of outcomes in a sample space S. Probability measures the likelihood of E occurring.

Tukio E ni sehemu ndogo ya matokeo katika nafasi ya sampuli S. Uwezekano unaonyesha uwezekano wa E kutokea.

P(E) = Number of favorable outcomes / Total outcomes

Why: Helps quantify uncertainty, crucial in ML for prediction models.

Kwanini: Inasaidia kupima hatari na kutabiri matokeo katika mashine za kujifunza.

1.2 Conditional Probability & Bayes' Theorem

P(A|B) = P(A ∩ B)/P(B)

Bayes: P(A|B) = [P(B|A) * P(A)] / P(B)

Uwezekano sharti unatumika sana katika classifiers na probabilistic models.

Real-life: Email spam detection. P(Spam|“win”) = Probability email is spam given word “win”.

1.3 Distributions & Applications

  • Bernoulli: Binary outcomes (0 or 1) - used in logistic regression
  • Binomial: Sum of Bernoulli trials - classification/count prediction
  • Gaussian/Normal: Continuous variables - assumption in many ML algorithms
  • Poisson: Event counts - used in queuing, traffic predictions
  • Exponential: Time between events - used in survival analysis
  • Multinomial: Categorical outcomes - used in NLP, document classification
Gaussian: f(x) = (1 / √(2πσ²)) * exp(-(x-μ)² / (2σ²))

Where used: Naive Bayes, probabilistic generative models.

1.4 Expectation, Variance, Covariance

E[X] = Σ x*P(x)
Var(X) = E[(X-μ)²]
Cov(X,Y) = E[(X-μX)(Y-μY)]
Corr(X,Y) = Cov(X,Y)/(σX * σY)

Used in feature scaling, correlation analysis, PCA, and regularization.

1.5 Law of Large Numbers & Central Limit Theorem

LLN: Sample mean approaches true mean as sample size increases. CLT: Distribution of sample mean → Normal for large n.

LLN: Wastani wa sampuli unakaribia wastani halisi kadri sampuli inavyoongezeka. CLT: Wastani wa sampuli unafuata mgawanyo wa kawaida kadri sampuli inavyokua kubwa.

Normal Distribution Curve

2. Linear Algebra

Aljebra ya Mistari

2.1 Vectors & Matrices

Vector: v = [v1, v2, ..., vn]

Matrix multiplication: C = A*B, C_ij = Σ_k A_ik * B_kj

Example: A=[[1,2],[3,4]], B=[[2,0],[1,2]] → C=[[4,4],[10,8]]

2.2 Eigenvalues & Eigenvectors

Av = λv

Used in PCA, dimensionality reduction, and spectral clustering.

2.3 Matrix Factorization & SVD

A = U Σ Vᵀ, rank = non-zero singular values, positive-definite matrices for optimization problems.

3. Calculus & Optimization

Hisabati ya Mabadiliko & Uboreshaji

3.1 Derivatives & Gradients

df/dx, ∇f = [∂f/∂x1, ∂f/∂x2, ...]
Hessian H_ij = ∂²f/∂xi∂xj

Used in gradient descent and optimization of neural networks.

3.2 Chain Rule (Backpropagation)

dz/dx = dz/dy * dy/dx

Fundamental in training deep neural networks.

3.3 Convex Functions

f(θx + (1-θ)y) ≤ θf(x) + (1-θ)f(y). Guarantees global minimum for convex optimization.

4. Information Theory

Nadharia ya Taarifa

Entropy: H(X) = -Σ p(x) log p(x)
Cross-Entropy: H(p,q) = -Σ p(x) log q(x)
KL-Divergence: D_KL(p||q) = Σ p(x) log (p(x)/q(x))
Mutual Information: I(X;Y) = H(X)+H(Y)-H(X,Y)

Used in classification loss functions, feature selection, and generative models.

5. Statistics for Machine Learning

Takwimu kwa Kujifunza Mashine

5.1 Hypothesis Testing & Confidence Intervals

CI = x̄ ± z*(σ/√n)

Used to evaluate model performance and test hypotheses.

5.2 Bias-Variance Tradeoff & Bootstrapping

Bias: Error due to assumptions, Variance: Error due to data variability. Bootstrapping: Resampling to estimate statistics.

Bias-Variance Tradeoff Illustration
Foundations & Math for Machine Learning - Notes

Foundations & Math for Machine Learning

Msingi na Hisabati kwa Kujifunza Mashine

1. Probability & Statistics

Uwezekano & Takwimu

Probability Basics (Uwezekano)

An event E is a subset of outcomes in a sample space S.

Tukio E ni sehemu ndogo ya matokeo katika nafasi ya sampuli S.

P(E) = Number of favorable outcomes / Total number of outcomes

Conditional Probability & Bayes' Theorem

Conditional probability: P(A|B) is the probability of A given B.

P(A|B) = P(A ∩ B) / P(B)

Uwezekano sharti: P(A|B) ni uwezekano wa A ikiwa B imetokea.

Bayes' Theorem: P(A|B) = [P(B|A) * P(A)] / P(B)
Example: Suppose 1% of people have a disease. Test is 99% accurate.
Compute probability a person has disease if test is positive:
P(Disease|Positive) = (0.99 * 0.01) / [(0.99*0.01) + (0.01*0.99)] ≈ 0.5

Distributions (Mienendo)

Common distributions used in ML:

  • Bernoulli: Binary outcomes (0 or 1)
  • Binomial: Sum of Bernoulli trials
  • Gaussian (Normal): Continuous, bell-shaped curve
  • Poisson: Count of events in fixed interval
  • Exponential: Time between Poisson events
  • Multinomial: Generalization of binomial for multiple categories
Gaussian: f(x) = (1 / √(2πσ²)) * exp(-(x-μ)² / (2σ²))

Gaussian: f(x) = (1 / √(2πσ²)) * exp(-(x-μ)² / (2σ²))

Expectation, Variance, Covariance, Correlation

E[X] = Σ x * P(x)
Var(X) = E[(X - μ)²]
Cov(X,Y) = E[(X - μX)(Y - μY)]
Corr(X,Y) = Cov(X,Y) / (σX * σY)

Law of Large Numbers & Central Limit Theorem

LLN: Sample mean → True mean as n → ∞

CLT: Sample mean ~ Normal distribution for large n

2. Linear Algebra

Aljebra ya Mistari

Vectors & Matrices

Vector: v = [v1, v2, ..., vn]

Matrix multiplication: C = A * B where C_ij = Σ_k A_ik * B_kj

Example:
A = [[1,2],[3,4]], B = [[2,0],[1,2]] → C = [[4,4],[10,8]]

Eigenvalues & Eigenvectors

Av = λv

λ ni eigenvalue na v ni eigenvector

Matrix Factorization & Rank

SVD: A = U Σ Vᵀ, Rank = number of non-zero singular values

Positive-definite matrix: xᵀ A x > 0 ∀ x ≠ 0

3. Calculus & Optimization

Hisabati ya Mabadiliko & Uboreshaji

Derivatives & Gradients

df/dx, ∇f = [∂f/∂x1, ∂f/∂x2, ...]

Hessian matrix: H_ij = ∂²f/∂xi∂xj

Directional derivative: D_v f(x) = ∇f(x) ⋅ v

Chain Rule (Backpropagation)

dz/dx = dz/dy * dy/dx

Convex Optimization

Convex function: f(θx + (1-θ)y) ≤ θ f(x) + (1-θ) f(y)

Global minimum exists for convex functions.

4. Information Theory

Nadharia ya Taarifa

Entropy: H(X) = -Σ p(x) log p(x)
Cross-entropy: H(p,q) = -Σ p(x) log q(x)
KL-divergence: D_KL(p||q) = Σ p(x) log (p(x)/q(x))
Mutual information: I(X;Y) = H(X) + H(Y) - H(X,Y)

5. Statistics for ML

Takwimu kwa Kujifunza Mashine

Hypothesis Testing & Confidence Intervals

CI = x̄ ± z*(σ/√n)

Bias–Variance Tradeoff, Sampling, Bootstrapping

Bias: Difference between expected prediction & true value.
Variance: How predictions vary for different training sets.
Bootstrapping: Resampling technique to estimate statistics.

Example: Normal Distribution

Normal Distribution Curve

Mfano: Mchoro wa mgawanyo wa kawaida

Reference Book: N/A

Author name: SIR H.A.Mwala Work email: biasharaboraofficials@gmail.com
#MWALA_LEARN Powered by MwalaJS #https://mwalajs.biasharabora.com
#https://educenter.biasharabora.com

:: 1::