Derivatives & Gradients

Derivatives & Gradients

Definition

Core Statement

A Derivative measures the instantaneous rate of change of a function with respect to one of its variables. It is the slope of the tangent line to the graph at a point.
A Gradient () is a vector containing the derivatives for all variables in a multivariable function.


Purpose

  1. Optimization: Finding where the slope is zero (Minima/Maxima) is the key to training AI.
  2. Sensitivity Analysis: "If I increase Price by $1, how much does Demand change?"
  3. Backpropagation: How neural networks learn (propagating error backwards via the Chain Rule).

Intuition


Types

1. Ordinary Derivative dfdx

Function has only one input variable (y=f(x)).

2. Partial Derivative fx

Function has multiple inputs (z=f(x,y)). We ask: "How does z change if I wiggle x, holding y constant?"

3. The Gradient f

The vector of all partials: f=[2x,2y].


Worked Example: Minimizing Cost

Problem

Cost Function: J(w)=w24w+5.
Goal: Find w that minimizes Cost.

  1. Derivative: J(w)=2w4.
  2. Set to Zero:2w4=02w=4w=2
  3. Conclusion: The minimum cost occurs at w=2.
    (Check: 224(2)+5=48+5=1. Any other w gives >1).

Assumptions


Limitations & Pitfalls

Pitfalls

  1. Local vs Global: Setting derivative to 0 finds all flat points (minima, maxima, saddle points). It doesn't guarantee the best one.
  2. Vanishing Gradients: In deep networks, if many derivatives < 1 are multiplied (Chain Rule), the product approaches zero. The network stops learning.
  3. Exploding Gradients: Conversely, if derivatives > 1, the product grows exponentially. Steps become huge and unstable.


Python Implementation

import sympy as sp

# Symbolic Math
w = sp.Symbol('w')
J = w**2 - 4*w + 5

# Calculate Derivative
derivative = sp.diff(J, w)
print(f"Derivative: {derivative}")

# Solve for 0
roots = sp.solve(derivative, w)
print(f"Critical Point at w = {roots[0]}")