Understanding Backpropagation: Calculating Gradients for Hidden Layer Weights and Biases

wpnews.pro

cd /news/machine-learning/understanding-backpropagation-calcul… · home › topics › machine-learning › article

[ARTICLE · art-45420] src=dev.to ↗ pub=2026-06-30T18:51Z topic=machine-learning verified=true sentiment=· neutral

Understanding Backpropagation: Calculating Gradients for Hidden Layer Weights and Biases

Ganesh, building the AI code reviewer *git-lrc*, explains how to calculate gradients for hidden layer weights and biases in backpropagation. The derivation uses the chain rule to propagate error from the output layer to hidden parameters w1, b1, w2, and b2. The formulas are then applied in gradient descent to update the neural network's parameters.

read3 min views1 publishedJun 30, 2026

Hello, I'm Ganesh. I'm building git-lrc, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. Star git-lrc on GitHub to help more developers discover the project. Do give it a try and share your feedback for improving the product.

In the previous article, we derived formulas for updating the output layer weights w3, w4, and bias b3. Now, we will understand how to calculate the gradients for the hidden layer parameters: w1, b1, w2, and b2.

To find the gradients of the parameters in the hidden layer, we need to trace how changing these values affects the final prediction and the error (SSR).

Let's recall the structure of our neural network:

For the top neuron: x1 = input * w1 + b1

y1 = f(x1) = log(1 + e^x1) *(using the softplus function)*

For the bottom neuron:

x2 = input * w2 + b2

y2 = f(x2) = log(1 + e^x2) (using the softplus function) Finally, the prediction:

Predicted = y1 * w3 + y2 * w4 + b3

And the prediction error:

SSR = Σ (observed − predicted)² Since w1, b1, w2, and b2 are not directly connected to the output prediction, we must use the chain rule to backpropagate the error from the output layer back to the hidden layer.

Let's calculate the gradient for the top neuron's weight w1 first.

A change in w1 affects x1, which affects the output y1, which affects the predicted value, which finally affects the SSR.

So, by the chain rule:

dSSR/dw1 = dSSR/d(predicted) * d(predicted)/dy1 * dy1/dx1 * dx1/dw1 Let's calculate each of these values:

As we saw in the previous articles, this is the derivative of SSR with respect to the predicted value:

dSSR/d(predicted) = -2 * (Observed - Predicted) Since Predicted = y1 * w3 + y2 * w4 + b3, and all other terms are treated as constants w.r.t y1:

d(predicted)/dy1 = w3 Since y1 = log(1 + e^x1), the derivative of the softplus function is the logistic sigmoid function:

dy1/dx1 = e^x1 / (1 + e^x1) Since x1 = input * w1 + b1, differentiating w.r.t w1 gives:

dx1/dw1 = input

Multiplying these parts together, we get:

dSSR/dw1 = -2 * (Observed - Predicted) * w3 * (e^x1 / (1 + e^x1)) * input Similarly, for the top neuron's bias b1:

dSSR/db1 = dSSR/d(predicted) * d(predicted)/dy1 * dy1/dx1 * dx1/db1 The only term that changes here is the last one:

dx1/db1 = 1 (since x1 = input * w1 + b1, derivative w.r.t b1 is 1) So:

dSSR/db1 = -2 * (Observed - Predicted) * w3 * (e^x1 / (1 + e^x1)) * 1 Following the same logic, we can find the gradients for the bottom neuron's parameters:

dSSR/dw2 = dSSR/d(predicted) * d(predicted)/dy2 * dy2/dx2 * dx2/dw2

dSSR/dw2 = -2 * (Observed - Predicted) * w4 * (e^x2 / (1 + e^x2)) * input

dSSR/db2 = dSSR/d(predicted) * d(predicted)/dy2 * dy2/dx2 * dx2/db2

dSSR/db2 = -2 * (Observed - Predicted) * w4 * (e^x2 / (1 + e^x2)) * 1

Once we calculate all these derivatives (dSSR/dw1, dSSR/db1, dSSR/dw2, dSSR/db2), we can update the hidden layer weights and biases using gradient descent:

Step size w1 = derivation w1 * Learning rate

New w1 = old w1 - Step size w1 Step size b1 = derivation b1 * Learning rate

New b1 = old b1 - Step size b1 Step size w2 = derivation w2 * Learning rate

New w2 = old w2 - Step size w2 Step size b2 = derivation b2 * Learning rate

New b2 = old b2 - Step size b2 By doing this repeatedly, the model minimizes the error and converges to the optimal values for all weights and biases.

We have successfully derived the formulas to calculate the gradients for w1, b1, w2, and b2. Combined with the output layer derivations, we now have the math for the entire neural network's backpropagation!

In the next article, we will see how to implement this in code.

Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.

source & further reading

dev.to — original article Building a multi-step AI agent in Java with Spring AI 2.0 Two Terminals, One Pot of Tea: Parallel Claude Code with Git Worktrees Inside AI Engineer World's Fair 2026: What 6,000 Engineers Showed Up to Build

~/api · this article 200

$curl api.wpnews.pro/v1/news/understanding-backpropag…

Read original on dev.to → dev.to/ganesh-kumar/understanding-backpropagatio…

mentioned entities

Ganesh

git-lrc

GitHub

HexmosTech

metadata

slugunderstanding-backpropagation-calculating-gradients-for-hidden-layer-weights-and

topic#machine-learning

secondary2 topics

sentimentneutral

canonicaldev.to

navigation

← prevAker sells Cognite to Schneider …

next →Forward Deployed Engineers come …

── more in #machine-learning 4 stories · sorted by recency

dev.to · 30 Jun · #machine-learning

Two Terminals, One Pot of Tea: Parallel Claude Code with Git Worktrees

dev.to · 26 Jun · #machine-learning

Building LSTMs with PyTorch and Lightning AI Part 4: Training Step and Initial Predictions

github.com · 16 Jun · #machine-learning

Show HN: git-lrc – Free, Micro AI Code Reviews That Run on Git Commit

dev.to · 30 Jun · #machine-learning

Building a multi-step AI agent in Java with Spring AI 2.0

── more on @ganesh 3 stories trending now

wpnews · 27 May · #machine-learning

hunting for headroom on modded-nanoGPT (WR #82)

wpnews · 30 May · #ai-tools

I was wasting 10 minutes every Claude session. So I built a fix.

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required