Perceptron: Learning to Fit y = x²

Model: ŷ = w₀ + w₁·x + w₂·x²  →  Target: y = x²

Adjust the weight sliders to make the blue curve match the green one. The closer they match, the lower the loss!

Global loss definition: L = MSE = 1n Σ (yᵢ − ŷᵢ)², with ŷᵢ = w₀ + w₁·xᵢ + w₂·xᵢ² and yᵢ = xᵢ². So L(w₀,w₁,w₂) = 1n Σ (xᵢ² − w₀ − w₁·xᵢ − w₂·xᵢ²)².

Weights (the knobs)
w₀ 2.00
w₁ 1.00
w₂ 0.00
Loss (MSE)
Step 0 · ·
Show Data Table
xTrue yPredicted ŷ Error (y − ŷ)(y − ŷ)²
Gradients (the Jacobian)

The Jacobian (∂L/∂wⱼ) is the gradient of the loss w.r.t. each weight. A gradient step moves each weight by Δwⱼ = −LR × ∂L/∂wⱼ — the minus sign means we walk downhill, and the learning rate (LR) scales how big the step is. Near zero means that weight is already close to optimal along its axis.

gradient magnitude → ∂L/∂wⱼ direction next step Δwⱼ
∂L/∂w₀
∂L/∂w₁
∂L/∂w₂
Drag the weight sliders and watch how the curves, loss, and gradients change!
Gradient Slices (Loss vs each weight)

Each plot shows how the loss changes when we move one weight while keeping the others fixed. The red dot is your current position, the dashed line is the tangent (gradient slope), and the orange step vector goes from the red dot to the blue triangle ▲ (next point) — i.e. w_new = w − LR × ∂L/∂w.

Loss vs w₀

Loss vs w₁

Loss vs w₂

Loss Landscape (3D surface)

The red dot marks where you are now. Gradient descent = rolling downhill. Drag to rotate!

w₁ = 1.00 (fixed)
View:
Loss Isosurface (all 3 weights)

The full loss lives in 4D: L(w₀, w₁, w₂). An isosurface shows all weight combinations that produce the same loss. As you lower the threshold, the shell shrinks toward the optimal point — gradient descent navigates through these nested shells inward.

5.0
Loss Over Time No steps yet
Mini Controls
w₀ 2.00 w₁ 1.00 w₂ 0.00