WhatsApp has become the default operating system for daily communication in regions like India. For modern web platforms—particularly in EdTech, local logistics, or localized services—forcing users to log into a complex desktop portal often results in a steep drop-off in user engagement.
When building LoopLearnX (an automated homework evaluation and tutoring tool for CBSE students), we realized that students rarely log in to a web dashboard on a desktop to upload their homework. Instead, they do their homework on physical notebooks, snap a picture, and expect instant grading.
Integrating a custom, self-hosted WhatsApp interface directly into our Next.js application was not just a convenience—it was the single most critical driver of student engagement.
This guide details the technical blueprint of how we built a resilient, memory-aware WhatsApp AI Bot using @whiskeysockets/baileys
and Next.js, hosted on an Oracle Cloud VPS. We will cover the exact production failures we encountered, learnings learned, and why custom self-hosting beats off-the-shelf agent frameworks.
For many demographics, WhatsApp represents friction-free engagement. Users don't need to remember passwords, manage active sessions, or learn a new user interface. By bringing our platform inside a messaging channel, we instantly enabled frictionless student homework submissions.
To connect an application to WhatsApp, you have two primary routes:
@whiskeysockets/baileys
):To keep operations lightweight, we split the application into a two-tier architecture:
Baileys
to maintain WebSocket connections with WhatsApp servers 24/7. It listens to incoming messages, handles media download streams, and converts payloads into clean base64 data to pass forward.
[Student WhatsApp]
│
▼ (WebSocket 24/7 connection)
[Node.js VPS Gateway (Baileys + PM2)]
│
▼ (HTTP POST with x-bot-secret)
[Next.js Serverless Route (Vercel)]
├── 1. Authenticate Request
├── 2. Query Student Profile & History (Supabase)
├── 3. Classify & Evaluate Intent (Gemini API)
└── 4. Write new Submission Record (Supabase)
│
▼ (JSON Reply)
[Node.js VPS Gateway (Safe Queued Output)] ──► Sent back to Student WhatsApp
index.js
)
The core responsibilities of index.js
on the VPS are maintaining the WebSocket session, managing authentication states, rendering QR codes for linking, and mounting an Express endpoint to monitor status.
// index.js
require("dotenv").config();
const {
default: makeWASocket,
useMultiFileAuthState,
DisconnectReason,
} = require("@whiskeysockets/baileys");
const { Boom } = require("@hapi/boom");
const pino = require("pino");
const express = require("express");
const qrcodeTerminal = require("qrcode-terminal");
const qrcode = require("qrcode");
const { handleIncomingMessage } = require("./bridge");
const app = express();
const PORT = process.env.PORT || 3000;
let sock = null;
let botStatus = "starting";
let currentQrImage = null;
async function connectToWhatsApp() {
// 1. Initialize multi-file authentication state
const { state, saveCreds } = await useMultiFileAuthState("auth_info_baileys");
sock = makeWASocket({
auth: state,
printQRInTerminal: false, // We render custom QR inside terminal & web UI
logger: pino({ level: "silent" }),
});
// 2. Listen for connection state updates
sock.ev.on("connection.update", async (update) => {
const { connection, lastDisconnect, qr } = update;
if (qr) {
botStatus = "qr_needed";
// Render QR in terminal
qrcodeTerminal.generate(qr, { small: true });
// Generate Data URL QR for web UI status page
currentQrImage = await qrcode.toDataURL(qr);
}
if (connection === "close") {
const shouldReconnect =
lastDisconnect?.error instanceof Boom
? lastDisconnect.error.output?.statusCode !==
DisconnectReason.loggedOut
: true;
botStatus = shouldReconnect ? "disconnected" : "logged_out";
console.log("Connection closed. Reconnecting...", shouldReconnect);
if (shouldReconnect) {
connectToWhatsApp();
}
} else if (connection === "open") {
botStatus = "connected";
console.log("✅ WhatsApp WebSocket Connected successfully!");
}
});
// 3. Save updated credentials on session changes
sock.ev.on("creds.update", saveCreds);
// 4. Mount incoming message listener
sock.ev.on("messages.upsert", async (m) => {
if (m.type === "notify") {
for (const msg of m.messages) {
if (!msg.key.fromMe) {
await handleIncomingMessage(sock, msg);
}
}
}
});
}
// Simple web UI endpoint for linking & status monitoring
app.get("/", (req, res) => {
res.send(`
<html>
<body style="font-family: Arial, sans-serif; text-align: center; margin-top: 100px;">
<h1>LoopLearnX Bot Status</h1>
<p>Current Status: <strong>${botStatus}</strong></p>
${botStatus === "qr_needed" && currentQrImage ? `<img src="${currentQrImage}" alt="Scan QR Code" />` : ""}
</body>
</html>
`);
});
app.listen(PORT, () => {
console.log(`Express status server running on port ${PORT}`);
connectToWhatsApp();
});
bridge.js
)
The bridge.js
file handles payload filtering, captures typed text, and handles complex media streams.
One of the biggest issues in production is text messages arriving empty at Vercel. WhatsApp packs text differently based on messaging schemas. We wrote a nested parser that extracts text under all possible client payloads. Additionally, when receiving an image, the bot downloads the file buffer, converts it to base64, and triggers our serverless endpoint:
// bridge.js
const axios = require("axios");
const { downloadMediaMessage } = require("@whiskeysockets/baileys");
const API_URL = process.env.LOOPLEARN_API_URL;
const BOT_SECRET = process.env.WHATSAPP_BOT_SECRET;
async function handleIncomingMessage(sock, msg) {
const jid = msg.key.remoteJid;
if (!jid || jid.endsWith("@g.us")) return; // Skip group chats
const phone = jid.replace("@s.whatsapp.net", "");
const content = msg.message;
const imageMsg = content?.imageMessage;
const isText = !!(
content?.conversation || content?.extendedTextMessage?.text
);
// 1. Text Message Processing Route
if (isText) {
const textBody =
content?.conversation || content?.extendedTextMessage?.text || "";
if (!textBody.trim()) return;
await callApi("/api/whatsapp/receive", {
phone,
messageType: "text",
textBody: textBody.trim(),
})
.then((data) => {
if (data?.replyText) queueMessage(sock, jid, data.replyText);
})
.catch(() => {
queueMessage(sock, jid, "⚠️ System check failed. Please try again.");
});
return;
}
// 2. Multimodal Photo Homework Route
if (imageMsg) {
queueMessage(
sock,
jid,
"📸 Photo mila! Evaluate ho raha hai... thodi der ruko. ⏳",
);
let imageBuffer;
try {
// Securely download the encrypted media buffer from WhatsApp servers
imageBuffer = await downloadMediaMessage(msg, "buffer", {});
} catch (e) {
console.error("Image download error:", e.message);
queueMessage(sock, jid, "❌ Photo download fail. Please try again.");
return;
}
const imageBase64 = imageBuffer.toString("base64");
const mimeType = imageMsg.mimetype || "image/jpeg";
await callApi("/api/whatsapp/receive", {
phone,
imageBase64,
mimeType,
messageType: "image",
})
.then((data) => {
const reply =
data?.replyText ?? "⚠️ Evaluation failed. Dobara try karo.";
queueMessage(sock, jid, reply);
})
.catch((e) => {
console.error("API error:", e.message);
queueMessage(
sock,
jid,
"⚠️ Server connection timeout. Please try again.",
);
});
return;
}
}
async function callApi(path, body) {
const res = await axios.post(`${API_URL}${path}`, body, {
headers: {
"Content-Type": "application/json",
"x-bot-secret": BOT_SECRET,
},
timeout: 90000, // 90-second timeout — Gemini Vision can be slow
});
return res.data;
}
If your bot sends multiple API calls instantly to the same recipient or pushes bulk updates simultaneously, WhatsApp will trigger a session ban. We mitigated this risk using an asynchronous, rate-limited memory queue:
const sendQueue = [];
let sending = false;
function queueMessage(sock, jid, text) {
sendQueue.push({ jid, text });
processSendQueue(sock);
}
async function processSendQueue(sock) {
if (sending || !sendQueue.length) return;
sending = true;
while (sendQueue.length) {
const { jid, text } = sendQueue.shift();
try {
await sock.sendMessage(jid, { text });
} catch (e) {
console.error("WebSocket send error:", e.message);
}
// Artificial delay mimicking natural human interaction patterns
await sleep(1500 + Math.random() * 1500);
}
sending = false;
}
To run the Node.js Baileys gateway in a professional VPS environment, you must secure your server with PM2 process monitors and fail-safes.
Connect to your Ubuntu server:
sudo apt update && sudo apt upgrade -y
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
sudo npm install -g pm2
ecosystem.config.js
) Create a custom configuration file. Warning: You must run only 1 instance to prevent authorization lock conflicts:
// ecosystem.config.js
module.exports = {
apps: [
{
name: "looplearnX-bot",
script: "index.js",
instances: 1, // DO NOT USE MAX (Cluster mode breaks Baileys)
autorestart: true,
watch: false,
max_memory_restart: "500M",
restart_delay: 5000, // Wait 5s before rebooting on crash
env: {
NODE_ENV: "production",
},
},
],
};
Start the bot and make it persistent across system updates:
pm2 start ecosystem.config.js
pm2 save
pm2 startup
To monitor logs and check performance status:
pm2 logs looplearnX-bot
pm2 status
When setting up a WhatsApp integration, many teams consider wrapper services like Hermes, Coze, or standard flow builders like Landbot. Here is a technical breakdown of why we rejected off-the-shelf agents in favor of a custom Baileys/Next.js stack:
| Evaluation Metric | Off-The-Shelf Agents (e.g. Hermes, Landbot) | Custom Self-Hosted Stack (Baileys + Next.js) |
|---|---|---|
| API & Database Integration | ||
| Restricted to webhooks and limited UI components. | Direct access to server-side Postgres (Supabase client), executing transactions natively. | |
| Memory Architecture | ||
| Generic system chat history (context window size limitations). | Custom Memory Context Routing. We query previous attempts for that exact homework plan ID and feed that specific context straight to Gemini. | |
| Hinglish & Direct Tone Tuning | ||
| Very hard to enforce strict localized prompt guidelines consistently. | Full controller prompts. The model speaks in second-person direct Hinglish ("Aapne" instead of "Student ne"). | |
| Pricing Scaling | ||
| Per-message/per-run markup pricing (can grow to thousands of dollars). | ||
| $0 SaaS Fees. You only pay for a $3 VPS (Oracle/Hetzner) and raw token consumption on Gemini API. |
Integrating the Baileys WhatsApp Bot with Next.js on an Oracle Cloud VPS completely transformed the adoption curve of our LoopLearnX EdTech platform. Instead of fighting friction on desktops, students now have an active personal AI tutor in their pockets.
Self-hosting using Baileys gives you total database sovereignty, complete control over token pricing, and the ability to customize your conversational workflows with zero platform restrictions. The key to operational success is keeping your VPS thread-safe, deploying rate-limited queues, and handling serverless timeout boundaries gracefully.
Naveen Gaur is a WordPress Performance Specialist & Full-Stack Consultant specializing in speed optimization, Core Web Vitals, and technical audits for high-performance websites.