Building LSTMs with PyTorch and Lightning AI Part 1: First Steps with LSTMs

A developer implemented an LSTM from scratch using PyTorch and Lightning AI, detailing the initialization of weights and biases for the forget, input, cell candidate, and output gates. The tutorial uses the Adam optimizer and normal distribution for weight initialization, with all parameters set to trainable.

In this article, we will explore how to implement an LSTM using PyTorch and Lightning . For more details about LSTMs, there is a separate series of articles available here https://dev.to/rijultp/understanding-lstms-a-better-recurrent-neural-network-h6b . To begin, we first import the required modules. python import torch import torch.nn as nn import torch.nn.functional as F We also introduce a new optimizer: python from torch.optim import Adam Adam is used to fit the neural network to the data. It works similarly to SGD, but in practice, Adam often converges faster and adapts the learning rate more effectively. Next, we continue with the remaining imports: python import lightning as L from torch.utils.data import TensorDataset, DataLoader We define the neural network by creating a Lightning module. python class LSTMByHand L.LightningModule : def init self : Create and initialize weight and bias tensors def lstm unit self, input value, long memory, short memory : LSTM computations def forward self, input : Forward pass through the unrolled LSTM def configure optimizers self : Configure Adam optimizer def training step self, batch, batch idx : Compute loss and log training progress Now let’s implement the init method. This is where we initialize all weights and biases. python class LSTMByHand L.LightningModule : def init self : super . init mean = torch.tensor 0.0 Mean of the normal distribution std = torch.tensor 1.0 Standard deviation ------------------------- Forget Gate l = "lr" ------------------------- self.wlr1 = nn.Parameter torch.normal mean=mean, std=std , requires grad=True self.wlr2 = nn.Parameter torch.normal mean=mean, std=std , requires grad=True self.blr1 = nn.Parameter torch.tensor 0.0 , requires grad=True ------------------------- Input Gate p = "pr" ------------------------- self.wpr1 = nn.Parameter torch.normal mean=mean, std=std , requires grad=True self.wpr2 = nn.Parameter torch.normal mean=mean, std=std , requires grad=True self.bpr1 = nn.Parameter torch.tensor 0.0 , requires grad=True ------------------------- Cell Candidate p ------------------------- self.wp1 = nn.Parameter torch.normal mean=mean, std=std , requires grad=True self.wp2 = nn.Parameter torch.normal mean=mean, std=std , requires grad=True self.bp1 = nn.Parameter torch.tensor 0.0 , requires grad=True ------------------------- Output Gate o ------------------------- self.wo1 = nn.Parameter torch.normal mean=mean, std=std , requires grad=True self.wo2 = nn.Parameter torch.normal mean=mean, std=std , requires grad=True self.bo1 = nn.Parameter torch.tensor 0.0 , requires grad=True Unlike earlier examples, we initialize weights using a normal distribution . Before moving further, let’s understand what that means. Imagine measuring the heights of a large group of people: When plotted, this forms a symmetric bell-shaped curve . This is called a normal distribution . We use: 0 1 Also, all parameters have requires grad=True , meaning they will be trained during backpropagation. Next, we will explore the lstm unit function and how the LSTM actually processes information step by step. AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production. git-lrc https://github.com/HexmosTech/git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free. Any feedback or contributors are welcome It's online, source-available, and ready for anyone to use. Give it a ⭐ star on Github https://github.com/HexmosTech/git-lrc