Vector: Ordered list of numbers v = [v1, v2, ..., vn]
Norm (Magnitude) / Norm (Ukubwa):
||v|| = √(v1² + v2² + ... + vn²)
Norm inatuonyesha ukubwa wa vector, muhimu katika normalization.
Dot Product / Bidhaa ya Dot:
u · v = u1*v1 + u2*v2 + ... + un*vn
Inatumika kupima similarity kati ya features mbili.

1.2 Matrices / Matrices

Matrix multiplication:
C = A × B
Transpose:
A^T
Inverse:
A^(-1)
(if exists)
Applications:
- Linear regression: Xβ = y
- Neural networks: weights as matrices

2. Calculus / Hisabati ya Mabadiliko

Calculus is used in ML for optimization, such as minimizing loss functions.

Hisabati ya mabadiliko hutumika katika ML kwa optimization, kama kupunguza loss functions.

2.1 Derivatives / Mchoro wa Kwanza

Definition:
f'(x) = lim(h→0) (f(x+h) - f(x)) / h
Gradient Descent (Optimization method):
θ = θ - α * ∇θ J(θ)
Kutumia derivatives kupunguza loss function.

2.2 Partial Derivatives / Mchoro Sehemu

Used in multivariable functions like neural networks.
Example:
∂f/∂x, ∂f/∂y
Hii inatuwezesha kuboresha kila parameter kwa tofauti tofauti.

3. Probability & Statistics / Uwezekano na Takwimu

Probability and statistics are core to ML models for prediction and inference.

Uwezekano na takwimu ni msingi kwa modeli za ML kwa utabiri na inference.

3.1 Probability / Uwezekano

P(A) = Probability of event A
Bayes Theorem:
P(A|B) = (P(B|A) * P(A)) / P(B)
Inatumika katika Naive Bayes classifiers.

3.2 Distributions / Mgawanyo

Gaussian / Normal Distribution:
f(x) = (1/(σ√(2π))) * e^(-(x-μ)²/(2σ²))
Inatumika kwa linear regression assumptions na probabilistic models.
Bernoulli, Binomial, Poisson distributions – for classification and count predictions

4. Linear Regression / Uhusiano wa Mistari

Linear regression predicts continuous output using a linear combination of input features.

Linear regression hutabiri output endelevu kwa kutumia mchanganyiko wa mistari wa features.

y = β0 + β1*x1 + β2*x2 + ... + βn*xn + ε

Loss function (Mean Squared Error):

MSE = (1/n) Σ (yi - ŷi)²

Inapima tofauti kati ya prediction na value halisi.

5. Logistic Regression / Uhusiano wa Logistic

Used for binary classification problems.

Inatumiwa kwa matatizo ya uainishaji wa binary.

ŷ = 1 / (1 + e^(-z)), z = β0 + β1*x1 + ... + βn*xn

Cost function:

J(θ) = -1/m Σ [y log ŷ + (1-y) log (1-ŷ)]

6. Gradient Descent / Kupungua kwa Hatua

Optimization algorithm for minimizing loss functions
Update rule:
θ = θ - α * ∇θ J(θ)
Tuna badilisha parameters polepole hadi tufikie loss ndogo zaidi.

7. Linear Algebra in Neural Networks / Aljebra ya Mistari kwenye Neural Networks

Forward Propagation:
Z = XW + b, A = f(Z)
Backward Propagation: using gradients of weights to update using gradient descent

8. Additional Topics / Mada Nyingine Muhimu

Eigenvalues & Eigenvectors – Principal Component Analysis (PCA) for dimensionality reduction
Singular Value Decomposition (SVD) – used in recommender systems
Norms (L1, L2) – Regularization in regression and neural networks
Covariance & Correlation – understanding relationships between features

Summary / Muhtasari

Mastering the above mathematics allows one to understand, design, and optimize ML algorithms from scratch.

Kujua hesabu hizi kunakuwezesha kuelewa, kubuni, na kuboresha algorithms za ML kutoka mwanzo.

Machine Learning Full Mathematics Notes

Machine Learning Mathematics - Complete Formulas (>50 Formulas)

These notes contain all important mathematics formulas in Machine Learning (ML), their uses, and examples.

1. Linear Algebra / Aljebra ya Mistari

Vector norm:
||v|| = √(Σ vi²)
Dot product:
u · v = Σ ui*vi
Cross product (3D):
u × v = |i j k; u1 u2 u3; v1 v2 v3|
Matrix multiplication:
C = A × B
Matrix transpose:
A^T
Matrix inverse:
A^(-1)
Trace:
Tr(A) = Σ a_ii
Determinant:
det(A)
Eigenvalue/eigenvector:
Av = λv
Frobenius norm:
||A||_F = √ΣΣ a_ij²

2. Calculus / Hisabati ya Mabadiliko

Derivative:
f'(x) = lim(h→0) (f(x+h)-f(x))/h
Partial derivative:
∂f/∂x
Gradient vector:
∇f = [∂f/∂x1, ∂f/∂x2, ..., ∂f/∂xn]
Hessian matrix:
H = [[∂²f/∂xi∂xj]]
Chain rule:
d(f(g(x)))/dx = f'(g(x)) * g'(x)
Integration:
∫ f(x) dx
Definite integral:
∫_a^b f(x) dx

3. Probability & Statistics / Uwezekano na Takwimu

Probability:
P(A) = n(A)/n(S)
Conditional probability:
P(A|B) = P(A∩B)/P(B)
Bayes theorem:
P(A|B) = P(B|A)P(A)/P(B)
Mean:
μ = (Σ xi)/n
Variance:
σ² = (Σ(xi-μ)²)/n
Standard deviation:
σ = √σ²
Covariance:
cov(X,Y) = Σ(xi-μx)(yi-μy)/n
Correlation coefficient:
ρ = cov(X,Y)/(σxσy)
Gaussian distribution:
f(x) = (1/(σ√(2π))) e^(-(x-μ)²/(2σ²))
Bernoulli distribution:
P(X=1)=p, P(X=0)=1-p
Binomial distribution:
P(X=k) = C(n,k)p^k(1-p)^(n-k)
Poisson distribution:
P(X=k) = λ^k e^-λ / k!

4. Linear & Logistic Regression / Uhusiano wa Mistari na Logistic

Linear regression:
y = β0 + β1x1 + ... + βnxn + ε
Mean Squared Error (MSE):
MSE = (1/n) Σ(yi-ŷi)²
Gradient update:
βj = βj - α ∂MSE/∂βj
Logistic function:
σ(z) = 1/(1+e^-z)
Logistic regression cost:
J(θ) = -1/m Σ [y log ŷ + (1-y) log(1-ŷ)]

5. Gradient Descent / Kupungua kwa Hatua

Update rule:
θ = θ - α ∇θ J(θ)
Stochastic Gradient Descent: update per sample
Mini-batch Gradient Descent: update per small batch
Learning rate adjustment:
α_t = α0 / (1+ decay*t)

6. Neural Networks / Mitandao ya Neuron

Forward pass:
Z = XW + b
Activation functions:
Sigmoid: σ(z)=1/(1+e^-z)

ReLU: f(z)=max(0,z)

Tanh: tanh(z)=(e^z-e^-z)/(e^z+e^-z)
Loss functions:
MSE, Cross-Entropy: L = -Σ yi log ŷi
Backpropagation:
∂L/∂W = δ * X^T
Weight update:
W = W - α ∂L/∂W

7. Regularization / Kuzuia Overfitting

L1 Regularization (Lasso):
J(θ) = MSE + λ Σ|θj|
L2 Regularization (Ridge):
J(θ) = MSE + λ Σθj²
Elastic Net: combination of L1 & L2

8. Dimensionality Reduction / Kupunguza Dimensionality

PCA: maximize variance
Z = XW, W = eigenvectors of covariance matrix
SVD:
X = UΣV^T

9. Support Vector Machines (SVM) / Mashine za Msaada wa Vector

Hyperplane:
w·x + b = 0
Margin:
Margin = 2 / ||w||
Hinge loss:
L = max(0, 1 - y(w·x + b))

10. Clustering / Kundi la Data

K-Means: update centroids
μk = (1/|Ck|) Σ xi in Ck
Distance metrics: Euclidean:
d(x,y) = √Σ(xi-yi)²
Cosine similarity:
cos θ = (A·B)/||A|| ||B||

11. Advanced Optimization / Uboreshaji wa Juu

Newton's Method:
θ = θ - H^-1 ∇J(θ)
Adam Optimizer updates:
m_t = β1*m_{t-1} + (1-β1)∇J(θ)
v_t = β2*v_{t-1} + (1-β2)(∇J(θ))²
θ = θ - α * m_t / (√v_t + ε)

12. Summary / Muhtasari

These 50+ formulas cover **everything a student or developer needs to understand mathematics behind Machine Learning**, including linear algebra, calculus, probability, regression, classification, neural networks, SVM, clustering, and optimization.

Formulas hizi 50+ zinashughulikia kila kitu kinachohitajika kuelewa hesabu nyuma ya Machine Learning, ikijumuisha aljebra ya mistari, hisabati ya mabadiliko, uwezekano, regression, classification, neural networks, SVM, clustering, na optimization.

Complete Mathematics for Machine Learning

Hisabati Kamili kwa Kujifunza Mashine

1. Probability & Statistics

Uwezekano & Takwimu

1.1 Probability Basics

An event E is a subset of outcomes in a sample space S. Probability measures the likelihood of E occurring.

Tukio E ni sehemu ndogo ya matokeo katika nafasi ya sampuli S. Uwezekano unaonyesha uwezekano wa E kutokea.

P(E) = Number of favorable outcomes / Total outcomes

Why: Helps quantify uncertainty, crucial in ML for prediction models.

Kwanini: Inasaidia kupima hatari na kutabiri matokeo katika mashine za kujifunza.

1.2 Conditional Probability & Bayes' Theorem

P(A|B) = P(A ∩ B)/P(B)

Bayes: P(A|B) = [P(B|A) * P(A)] / P(B)

Uwezekano sharti unatumika sana katika classifiers na probabilistic models.

Real-life: Email spam detection. P(Spam|“win”) = Probability email is spam given word “win”.

1.3 Distributions & Applications

Bernoulli: Binary outcomes (0 or 1) - used in logistic regression
Binomial: Sum of Bernoulli trials - classification/count prediction
Gaussian/Normal: Continuous variables - assumption in many ML algorithms
Poisson: Event counts - used in queuing, traffic predictions
Exponential: Time between events - used in survival analysis
Multinomial: Categorical outcomes - used in NLP, document classification

Gaussian: f(x) = (1 / √(2πσ²)) * exp(-(x-μ)² / (2σ²))

Where used: Naive Bayes, probabilistic generative models.

1.4 Expectation, Variance, Covariance

E[X] = Σ x*P(x)
Var(X) = E[(X-μ)²]
Cov(X,Y) = E[(X-μX)(Y-μY)]
Corr(X,Y) = Cov(X,Y)/(σX * σY)

Used in feature scaling, correlation analysis, PCA, and regularization.

1.5 Law of Large Numbers & Central Limit Theorem

LLN: Sample mean approaches true mean as sample size increases. CLT: Distribution of sample mean → Normal for large n.

LLN: Wastani wa sampuli unakaribia wastani halisi kadri sampuli inavyoongezeka. CLT: Wastani wa sampuli unafuata mgawanyo wa kawaida kadri sampuli inavyokua kubwa.

2. Linear Algebra

Aljebra ya Mistari

2.1 Vectors & Matrices

Vector: v = [v1, v2, ..., vn]

Matrix multiplication: C = A*B, C_ij = Σ_k A_ik * B_kj

Example: A=[[1,2],[3,4]], B=[[2,0],[1,2]] → C=[[4,4],[10,8]]

2.2 Eigenvalues & Eigenvectors

Av = λv

Used in PCA, dimensionality reduction, and spectral clustering.

2.3 Matrix Factorization & SVD

A = U Σ Vᵀ, rank = non-zero singular values, positive-definite matrices for optimization problems.

3. Calculus & Optimization

Hisabati ya Mabadiliko & Uboreshaji

3.1 Derivatives & Gradients

df/dx, ∇f = [∂f/∂x1, ∂f/∂x2, ...]
Hessian H_ij = ∂²f/∂xi∂xj

Used in gradient descent and optimization of neural networks.

3.2 Chain Rule (Backpropagation)

dz/dx = dz/dy * dy/dx

Fundamental in training deep neural networks.

3.3 Convex Functions

f(θx + (1-θ)y) ≤ θf(x) + (1-θ)f(y). Guarantees global minimum for convex optimization.

4. Information Theory

Nadharia ya Taarifa

Entropy: H(X) = -Σ p(x) log p(x)
Cross-Entropy: H(p,q) = -Σ p(x) log q(x)
KL-Divergence: D_KL(p||q) = Σ p(x) log (p(x)/q(x))
Mutual Information: I(X;Y) = H(X)+H(Y)-H(X,Y)

Used in classification loss functions, feature selection, and generative models.

5. Statistics for Machine Learning

Takwimu kwa Kujifunza Mashine

5.1 Hypothesis Testing & Confidence Intervals

CI = x̄ ± z*(σ/√n)

Used to evaluate model performance and test hypotheses.

5.2 Bias-Variance Tradeoff & Bootstrapping

Bias: Error due to assumptions, Variance: Error due to data variability. Bootstrapping: Resampling to estimate statistics.

Foundations & Math for Machine Learning - Notes

Foundations & Math for Machine Learning

Msingi na Hisabati kwa Kujifunza Mashine

1. Probability & Statistics

Uwezekano & Takwimu

Probability Basics (Uwezekano)

An event E is a subset of outcomes in a sample space S.

Tukio E ni sehemu ndogo ya matokeo katika nafasi ya sampuli S.

P(E) = Number of favorable outcomes / Total number of outcomes

Conditional Probability & Bayes' Theorem

Conditional probability: P(A|B) is the probability of A given B.

P(A|B) = P(A ∩ B) / P(B)

Uwezekano sharti: P(A|B) ni uwezekano wa A ikiwa B imetokea.

Bayes' Theorem: P(A|B) = [P(B|A) * P(A)] / P(B)

Example: Suppose 1% of people have a disease. Test is 99% accurate.
Compute probability a person has disease if test is positive:
P(Disease|Positive) = (0.99 * 0.01) / [(0.99*0.01) + (0.01*0.99)] ≈ 0.5

Distributions (Mienendo)

Common distributions used in ML:

Bernoulli: Binary outcomes (0 or 1)
Binomial: Sum of Bernoulli trials
Gaussian (Normal): Continuous, bell-shaped curve
Poisson: Count of events in fixed interval
Exponential: Time between Poisson events
Multinomial: Generalization of binomial for multiple categories

Gaussian: f(x) = (1 / √(2πσ²)) * exp(-(x-μ)² / (2σ²))

Expectation, Variance, Covariance, Correlation

E[X] = Σ x * P(x)
Var(X) = E[(X - μ)²]
Cov(X,Y) = E[(X - μX)(Y - μY)]
Corr(X,Y) = Cov(X,Y) / (σX * σY)

Law of Large Numbers & Central Limit Theorem

LLN: Sample mean → True mean as n → ∞

CLT: Sample mean ~ Normal distribution for large n

2. Linear Algebra

Aljebra ya Mistari

Vectors & Matrices

Vector: v = [v1, v2, ..., vn]

Matrix multiplication: C = A * B where C_ij = Σ_k A_ik * B_kj

Example:
A = [[1,2],[3,4]], B = [[2,0],[1,2]] → C = [[4,4],[10,8]]

Eigenvalues & Eigenvectors

Av = λv

λ ni eigenvalue na v ni eigenvector

Matrix Factorization & Rank

SVD: A = U Σ Vᵀ, Rank = number of non-zero singular values

Positive-definite matrix: xᵀ A x > 0 ∀ x ≠ 0

3. Calculus & Optimization

Hisabati ya Mabadiliko & Uboreshaji

Derivatives & Gradients

df/dx, ∇f = [∂f/∂x1, ∂f/∂x2, ...]

Hessian matrix: H_ij = ∂²f/∂xi∂xj

Directional derivative: D_v f(x) = ∇f(x) ⋅ v

Chain Rule (Backpropagation)

dz/dx = dz/dy * dy/dx

Convex Optimization

Convex function: f(θx + (1-θ)y) ≤ θ f(x) + (1-θ) f(y)

Global minimum exists for convex functions.

4. Information Theory

Nadharia ya Taarifa

Entropy: H(X) = -Σ p(x) log p(x)
Cross-entropy: H(p,q) = -Σ p(x) log q(x)
KL-divergence: D_KL(p||q) = Σ p(x) log (p(x)/q(x))
Mutual information: I(X;Y) = H(X) + H(Y) - H(X,Y)

5. Statistics for ML

Takwimu kwa Kujifunza Mashine

Hypothesis Testing & Confidence Intervals

CI = x̄ ± z*(σ/√n)

Bias–Variance Tradeoff, Sampling, Bootstrapping

Bias: Difference between expected prediction & true value.
Variance: How predictions vary for different training sets.
Bootstrapping: Resampling technique to estimate statistics.

Example: Normal Distribution

Mfano: Mchoro wa mgawanyo wa kawaida

Reference Book: N/A

Author name: SIR H.A.Mwala Work email: biasharaboraofficials@gmail.com
#MWALA_LEARN Powered by MwalaJS #https://mwalajs.biasharabora.com
#https://educenter.biasharabora.com

:: 1::

⬅ ➡

📰 Latest News & Learning resources

FREE AND REWARDED INTER-SCHOOL EXAMS COMPETITIONS PRO CHALLENGE LEAGUE 🏆 WILL BE STARTED AT THE END OF NOVEMBER AND WILL BE DONE IN MWALA-LEARN

11/12/2025

MAJINA WALIOITWA KWENYE USAILI AJIRA ZA MUDA - USIMAMIZI WA UCHAGUZI 2025*

10/5/2025

TAARIFA KWA WAMILIKI WA MADUKA NA BIASHARA

9/24/2025

TANGAZO LA NAFASI ZA KAZI HALMASHAURI YA WILAYA YA MAGU 12-09-2025

9/12/2025

NEWS FROM Higher Education Students' Loans Board

9/2/2025

MWALA_LEARN LIBRARY

MWALA_LEARN_PRE MOCK, MOCK & PRE NECTA WITH SOLUTION 2024.pdf

Foundations & Math for Machine Learning

Mathematics for Machine Learning

1. Linear Algebra / Aljebra ya Mistari

1.1 Vectors / Vectors

1.2 Matrices / Matrices

2. Calculus / Hisabati ya Mabadiliko

2.1 Derivatives / Mchoro wa Kwanza

2.2 Partial Derivatives / Mchoro Sehemu

3. Probability & Statistics / Uwezekano na Takwimu

3.1 Probability / Uwezekano

3.2 Distributions / Mgawanyo

4. Linear Regression / Uhusiano wa Mistari

5. Logistic Regression / Uhusiano wa Logistic

6. Gradient Descent / Kupungua kwa Hatua

7. Linear Algebra in Neural Networks / Aljebra ya Mistari kwenye Neural Networks

8. Additional Topics / Mada Nyingine Muhimu

Summary / Muhtasari

Machine Learning Mathematics - Complete Formulas (>50 Formulas)

1. Linear Algebra / Aljebra ya Mistari

2. Calculus / Hisabati ya Mabadiliko

3. Probability & Statistics / Uwezekano na Takwimu

4. Linear & Logistic Regression / Uhusiano wa Mistari na Logistic

5. Gradient Descent / Kupungua kwa Hatua

6. Neural Networks / Mitandao ya Neuron

7. Regularization / Kuzuia Overfitting

8. Dimensionality Reduction / Kupunguza Dimensionality

9. Support Vector Machines (SVM) / Mashine za Msaada wa Vector

10. Clustering / Kundi la Data

11. Advanced Optimization / Uboreshaji wa Juu

12. Summary / Muhtasari

Complete Mathematics for Machine Learning

1. Probability & Statistics

1.1 Probability Basics

1.2 Conditional Probability & Bayes' Theorem

1.3 Distributions & Applications

1.4 Expectation, Variance, Covariance

1.5 Law of Large Numbers & Central Limit Theorem

2. Linear Algebra

2.1 Vectors & Matrices

2.2 Eigenvalues & Eigenvectors

2.3 Matrix Factorization & SVD

3. Calculus & Optimization

3.1 Derivatives & Gradients

3.2 Chain Rule (Backpropagation)

3.3 Convex Functions

4. Information Theory

5. Statistics for Machine Learning

5.1 Hypothesis Testing & Confidence Intervals

5.2 Bias-Variance Tradeoff & Bootstrapping

Foundations & Math for Machine Learning

1. Probability & Statistics

Probability Basics (Uwezekano)

Conditional Probability & Bayes' Theorem

Distributions (Mienendo)

Expectation, Variance, Covariance, Correlation

Law of Large Numbers & Central Limit Theorem

2. Linear Algebra

Vectors & Matrices

Eigenvalues & Eigenvectors

Matrix Factorization & Rank

3. Calculus & Optimization

Derivatives & Gradients

Chain Rule (Backpropagation)

Convex Optimization

4. Information Theory

5. Statistics for ML

Hypothesis Testing & Confidence Intervals

Bias–Variance Tradeoff, Sampling, Bootstrapping

Example: Normal Distribution

📰 Latest News & Learning resources

MWALA_LEARN LIBRARY

SCHOOL@ALL_LEVELS MATERIALS LIBRARY

Top-level Folder ID: 14bE03l23tkIhtIExStmWDrOUm9UpEsMn