{"slug": "building-lstms-with-pytorch-and-lightning-ai-part-1-first-steps-with-lstms", "title": "Building LSTMs with PyTorch and Lightning AI Part 1: First Steps with LSTMs", "summary": "A developer implemented an LSTM from scratch using PyTorch and Lightning AI, detailing the initialization of weights and biases for the forget, input, cell candidate, and output gates. The tutorial uses the Adam optimizer and normal distribution for weight initialization, with all parameters set to trainable.", "body_md": "In this article, we will explore how to implement an **LSTM using PyTorch and Lightning**.\n\nFor more details about LSTMs, there is a [separate series of articles available here](https://dev.to/rijultp/understanding-lstms-a-better-recurrent-neural-network-h6b).\n\nTo begin, we first import the required modules.\n\n``` python\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n```\n\nWe also introduce a new optimizer:\n\n``` python\nfrom torch.optim import Adam\n```\n\nAdam is used to fit the neural network to the data.\n\nIt works similarly to SGD, but in practice, Adam often converges faster and adapts the learning rate more effectively.\n\nNext, we continue with the remaining imports:\n\n``` python\nimport lightning as L\nfrom torch.utils.data import TensorDataset, DataLoader\n```\n\nWe define the neural network by creating a Lightning module.\n\n``` python\nclass LSTMByHand(L.LightningModule):\n    def __init__(self):\n        # Create and initialize weight and bias tensors\n\n    def lstm_unit(self, input_value, long_memory, short_memory):\n        # LSTM computations\n\n    def forward(self, input):\n        # Forward pass through the unrolled LSTM\n\n    def configure_optimizers(self):\n        # Configure Adam optimizer\n\n    def training_step(self, batch, batch_idx):\n        # Compute loss and log training progress\n```\n\nNow let’s implement the `__init__`\n\nmethod.\n\nThis is where we initialize all weights and biases.\n\n``` python\nclass LSTMByHand(L.LightningModule):\n    def __init__(self):\n        super().__init__()\n\n        mean = torch.tensor(0.0)  # Mean of the normal distribution\n        std = torch.tensor(1.0)   # Standard deviation\n\n        # -------------------------\n        # Forget Gate (l = \"lr\")\n        # -------------------------\n        self.wlr1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)\n        self.wlr2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)\n        self.blr1 = nn.Parameter(torch.tensor(0.0), requires_grad=True)\n\n        # -------------------------\n        # Input Gate (p = \"pr\")\n        # -------------------------\n        self.wpr1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)\n        self.wpr2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)\n        self.bpr1 = nn.Parameter(torch.tensor(0.0), requires_grad=True)\n\n        # -------------------------\n        # Cell Candidate (p)\n        # -------------------------\n        self.wp1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)\n        self.wp2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)\n        self.bp1 = nn.Parameter(torch.tensor(0.0), requires_grad=True)\n\n        # -------------------------\n        # Output Gate (o)\n        # -------------------------\n        self.wo1 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)\n        self.wo2 = nn.Parameter(torch.normal(mean=mean, std=std), requires_grad=True)\n        self.bo1 = nn.Parameter(torch.tensor(0.0), requires_grad=True)\n```\n\nUnlike earlier examples, we initialize weights using a **normal distribution**.\n\nBefore moving further, let’s understand what that means.\n\nImagine measuring the heights of a large group of people:\n\nWhen plotted, this forms a **symmetric bell-shaped curve**.\n\nThis is called a **normal distribution**.\n\nWe use:\n\n`0`\n\n`1`\n\nAlso, all parameters have `requires_grad=True`\n\n, meaning they will be trained during backpropagation.\n\nNext, we will explore the `lstm_unit`\n\nfunction and how the LSTM actually processes information step by step.\n\nAI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.\n\n[git-lrc](https://github.com/HexmosTech/git-lrc) fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.\n\nAny feedback or contributors are welcome! It's online, source-available, and ready for anyone to use.\n\nGive it a ⭐ [star on Github](https://github.com/HexmosTech/git-lrc)", "url": "https://wpnews.pro/news/building-lstms-with-pytorch-and-lightning-ai-part-1-first-steps-with-lstms", "canonical_source": "https://dev.to/rijultp/building-lstms-with-pytorch-and-lightning-ai-part-1-first-steps-with-lstms-3hem", "published_at": "2026-06-21 18:25:33+00:00", "updated_at": "2026-06-21 18:34:10.190360+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "developer-tools"], "entities": ["PyTorch", "Lightning AI", "Adam"], "alternates": {"html": "https://wpnews.pro/news/building-lstms-with-pytorch-and-lightning-ai-part-1-first-steps-with-lstms", "markdown": "https://wpnews.pro/news/building-lstms-with-pytorch-and-lightning-ai-part-1-first-steps-with-lstms.md", "text": "https://wpnews.pro/news/building-lstms-with-pytorch-and-lightning-ai-part-1-first-steps-with-lstms.txt", "jsonld": "https://wpnews.pro/news/building-lstms-with-pytorch-and-lightning-ai-part-1-first-steps-with-lstms.jsonld"}}