# Building LSTMs with PyTorch and Lightning AI Part 1: First Steps with LSTMs

> Source: <https://dev.to/rijultp/building-lstms-with-pytorch-and-lightning-ai-part-1-first-steps-with-lstms-3hem>
> Published: 2026-06-21 18:25:33+00:00

In this article, we will explore how to implement an **LSTM using PyTorch and Lightning**.

For more details about LSTMs, there is a [separate series of articles available here](https://dev.to/rijultp/understanding-lstms-a-better-recurrent-neural-network-h6b).

To begin, we first import the required modules.

``` python
import torch
import torch.nn as nn
import torch.nn.functional as F
```

We also introduce a new optimizer:

``` python
from torch.optim import Adam
```

Adam is used to fit the neural network to the data.

It works similarly to SGD, but in practice, Adam often converges faster and adapts the learning rate more effectively.

Next, we continue with the remaining imports:

``` python
import lightning as L
from torch.utils.data import TensorDataset, DataLoader
```

We define the neural network by creating a Lightning module.

``` python
class LSTMByHand(L.LightningModule):
    def __init__(self):
        # Create and initialize weight and bias tensors

    def lstm_unit(self, input_value, long_memory, short_memory):
        # LSTM computations

    def forward(self, input):
        # Forward pass through the unrolled LSTM

    def configure_optimizers(self):
        # Configure Adam optimizer

    def training_step(self, batch, batch_idx):
        # Compute loss and log training progress
```

Now let’s implement the `__init__`

method.

This is where we initialize all weights and biases.

``` python
class LSTMByHand(L.LightningModule):
    def __init__(self):
        super().__init__()

        mean = torch.tensor(0.0)  # Mean of the normal distribution
        std = torch.tensor(1.0)   # Standard deviation

        # -------------------------
        # Forget Gate (l = "lr")
        # -------------------------
        self.wlr1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.wlr2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.blr1 = nn.Parameter(torch.tensor(0.0), requires_grad=True)

        # -------------------------
        # Input Gate (p = "pr")
        # -------------------------
        self.wpr1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.wpr2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.bpr1 = nn.Parameter(torch.tensor(0.0), requires_grad=True)

        # -------------------------
        # Cell Candidate (p)
        # -------------------------
        self.wp1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.wp2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.bp1 = nn.Parameter(torch.tensor(0.0), requires_grad=True)

        # -------------------------
        # Output Gate (o)
        # -------------------------
        self.wo1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.wo2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)
        self.bo1 = nn.Parameter(torch.tensor(0.0), requires_grad=True)
```

Unlike earlier examples, we initialize weights using a **normal distribution**.

Before moving further, let’s understand what that means.

Imagine measuring the heights of a large group of people:

When plotted, this forms a **symmetric bell-shaped curve**.

This is called a **normal distribution**.

We use:

`0`

`1`

Also, all parameters have `requires_grad=True`

, meaning they will be trained during backpropagation.

Next, we will explore the `lstm_unit`

function and how the LSTM actually processes information step by step.

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

[git-lrc](https://github.com/HexmosTech/git-lrc) fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

Any feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.

Give it a ⭐ [star on Github](https://github.com/HexmosTech/git-lrc)
