{"slug": "understanding-backpropagation-calculating-gradients-for-hidden-layer-weights-and", "title": "Understanding Backpropagation: Calculating Gradients for Hidden Layer Weights and Biases", "summary": "Ganesh, building the AI code reviewer *git-lrc*, explains how to calculate gradients for hidden layer weights and biases in backpropagation. The derivation uses the chain rule to propagate error from the output layer to hidden parameters w1, b1, w2, and b2. The formulas are then applied in gradient descent to update the neural network's parameters.", "body_md": "Hello, I'm Ganesh. I'm building *git-lrc*, an AI code reviewer that runs on every commit. It is free, unlimited, and source-available on Github. [Star git-lrc on GitHub](https://github.com/HexmosTech/git-lrc) to help more developers discover the project. Do give it a try and share your feedback for improving the product.\n\nIn the previous article, we derived formulas for updating the output layer weights w3, w4, and bias b3. Now, we will understand how to calculate the gradients for the hidden layer parameters: w1, b1, w2, and b2.\n\nTo find the gradients of the parameters in the hidden layer, we need to trace how changing these values affects the final prediction and the error (SSR).\n\nLet's recall the structure of our neural network:\n\nFor the top neuron:\n\nx1 = input * w1 + b1\n\ny1 = f(x1) = log(1 + e^x1) *(using the softplus function)*\n\nFor the bottom neuron:\n\nx2 = input * w2 + b2\n\ny2 = f(x2) = log(1 + e^x2) *(using the softplus function)*\n\nFinally, the prediction:\n\nPredicted = y1 * w3 + y2 * w4 + b3\n\nAnd the prediction error:\n\nSSR = Σ (observed − predicted)²\n\nSince w1, b1, w2, and b2 are not directly connected to the output prediction, we must use the chain rule to backpropagate the error from the output layer back to the hidden layer.\n\nLet's calculate the gradient for the top neuron's weight w1 first.\n\nA change in w1 affects x1, which affects the output y1, which affects the predicted value, which finally affects the SSR.\n\nSo, by the chain rule:\n\ndSSR/dw1 = dSSR/d(predicted) * d(predicted)/dy1 * dy1/dx1 * dx1/dw1\n\nLet's calculate each of these values:\n\nAs we saw in the previous articles, this is the derivative of SSR with respect to the predicted value:\n\ndSSR/d(predicted) = -2 * (Observed - Predicted)\n\nSince Predicted = y1 * w3 + y2 * w4 + b3, and all other terms are treated as constants w.r.t y1:\n\nd(predicted)/dy1 = w3\n\nSince y1 = log(1 + e^x1), the derivative of the softplus function is the logistic sigmoid function:\n\ndy1/dx1 = e^x1 / (1 + e^x1)\n\nSince x1 = input * w1 + b1, differentiating w.r.t w1 gives:\n\ndx1/dw1 = input\n\nMultiplying these parts together, we get:\n\ndSSR/dw1 = -2 * (Observed - Predicted) * w3 * (e^x1 / (1 + e^x1)) * input\n\nSimilarly, for the top neuron's bias b1:\n\ndSSR/db1 = dSSR/d(predicted) * d(predicted)/dy1 * dy1/dx1 * dx1/db1\n\nThe only term that changes here is the last one:\n\ndx1/db1 = 1 (since x1 = input * w1 + b1, derivative w.r.t b1 is 1)\n\nSo:\n\ndSSR/db1 = -2 * (Observed - Predicted) * w3 * (e^x1 / (1 + e^x1)) * 1\n\nFollowing the same logic, we can find the gradients for the bottom neuron's parameters:\n\ndSSR/dw2 = dSSR/d(predicted) * d(predicted)/dy2 * dy2/dx2 * dx2/dw2\n\ndSSR/dw2 = -2 * (Observed - Predicted) * w4 * (e^x2 / (1 + e^x2)) * input\n\ndSSR/db2 = dSSR/d(predicted) * d(predicted)/dy2 * dy2/dx2 * dx2/db2\n\ndSSR/db2 = -2 * (Observed - Predicted) * w4 * (e^x2 / (1 + e^x2)) * 1\n\nOnce we calculate all these derivatives (dSSR/dw1, dSSR/db1, dSSR/dw2, dSSR/db2), we can update the hidden layer weights and biases using gradient descent:\n\nStep size w1 = derivation w1 * Learning rate\n\nNew w1 = old w1 - Step size w1\n\nStep size b1 = derivation b1 * Learning rate\n\nNew b1 = old b1 - Step size b1\n\nStep size w2 = derivation w2 * Learning rate\n\nNew w2 = old w2 - Step size w2\n\nStep size b2 = derivation b2 * Learning rate\n\nNew b2 = old b2 - Step size b2\n\nBy doing this repeatedly, the model minimizes the error and converges to the optimal values for all weights and biases.\n\nWe have successfully derived the formulas to calculate the gradients for w1, b1, w2, and b2. Combined with the output layer derivations, we now have the math for the entire neural network's backpropagation!\n\nIn the next article, we will see how to implement this in code.\n\nAny feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.", "url": "https://wpnews.pro/news/understanding-backpropagation-calculating-gradients-for-hidden-layer-weights-and", "canonical_source": "https://dev.to/ganesh-kumar/understanding-backpropagation-calculating-gradients-for-hidden-layer-weights-and-biases-3k94", "published_at": "2026-06-30 18:51:03+00:00", "updated_at": "2026-06-30 19:19:00.776163+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "developer-tools"], "entities": ["Ganesh", "git-lrc", "GitHub", "HexmosTech"], "alternates": {"html": "https://wpnews.pro/news/understanding-backpropagation-calculating-gradients-for-hidden-layer-weights-and", "markdown": "https://wpnews.pro/news/understanding-backpropagation-calculating-gradients-for-hidden-layer-weights-and.md", "text": "https://wpnews.pro/news/understanding-backpropagation-calculating-gradients-for-hidden-layer-weights-and.txt", "jsonld": "https://wpnews.pro/news/understanding-backpropagation-calculating-gradients-for-hidden-layer-weights-and.jsonld"}}