Derivatives & Gradients
Derivatives & Gradients
Definition
Core Statement
A Derivative measures the instantaneous rate of change of a function with respect to one of its variables. It is the slope of the tangent line to the graph at a point.
A Gradient (
Purpose
- Optimization: Finding where the slope is zero (Minima/Maxima) is the key to training AI.
- Sensitivity Analysis: "If I increase Price by $1, how much does Demand change?"
- Backpropagation: How neural networks learn (propagating error backwards via the Chain Rule).
Intuition
- Slope = 0: You are at a peak (maximum) or valley (minimum).
- Slope > 0: Value is increasing. To go down, move left.
- Slope < 0: Value is decreasing. To go down, move right.
Types
1. Ordinary Derivative
Function has only one input variable (
- Example:
.
2. Partial Derivative
Function has multiple inputs (
- Example:
. (Treat as a constant number like 5). .
3. The Gradient
The vector of all partials:
- Direction: Points in the direction of steepest ascent.
- Magnitude: How steep the slope is.
Worked Example: Minimizing Cost
Problem
Cost Function:
Goal: Find
- Derivative:
. - Set to Zero:
- Conclusion: The minimum cost occurs at
.
(Check:. Any other gives ).
Assumptions
Limitations & Pitfalls
Pitfalls
- Local vs Global: Setting derivative to 0 finds all flat points (minima, maxima, saddle points). It doesn't guarantee the best one.
- Vanishing Gradients: In deep networks, if many derivatives < 1 are multiplied (Chain Rule), the product approaches zero. The network stops learning.
- Exploding Gradients: Conversely, if derivatives > 1, the product grows exponentially. Steps become huge and unstable.
Python Implementation
import sympy as sp
# Symbolic Math
w = sp.Symbol('w')
J = w**2 - 4*w + 5
# Calculate Derivative
derivative = sp.diff(J, w)
print(f"Derivative: {derivative}")
# Solve for 0
roots = sp.solve(derivative, w)
print(f"Critical Point at w = {roots[0]}")
Related Concepts
- Gradient Descent - Using derivates iteratively.
- Optimization - The broader field.
- Neural Networks - Use partial derivatives (weights).
- Taylor Series - Approximating functions using derivatives.