{"slug": "i-built-a-study-notes-summarizer-in-the-browser-no-ai-api-just-word-frequency", "title": "I Built a Study-Notes Summarizer in the Browser — No AI API, Just Word-Frequency Scoring", "summary": "A developer built a browser-based study-notes summarizer using extractive summarization with word-frequency scoring, no AI API or backend required. The tool splits text into sentences, counts word frequencies after removing stopwords, scores sentences by average frequency of high-frequency words, and returns the top sentences in original order. It runs entirely client-side, ensuring data privacy.", "body_md": "I built a study-notes summarizer in the browser — no AI API, just word-frequency scoring\n\nFinals season is brutal. You have 40 pages of lecture notes, 3 hours left, and a brain that stopped cooperating at page 12.\n\nWhat if you could paste all those notes and instantly get the 5 sentences that actually matter?\n\nThat's what I built today — a fully browser-side study-notes summarizer. No OpenAI key. No backend. No data leaving your machine. Just pure vanilla JavaScript and a technique called **extractive summarization**.\n\nHere's how it works, explained like you're a person and not a research paper.\n\nThink about how you'd summarize something manually. You'd read through, notice which topics keep popping up, and highlight the sentences that mention those topics most.\n\nThat's literally what the algorithm does. It counts words.\n\nThe insight: **sentences that use the most-discussed words are probably the most important sentences.**\n\nBefore scoring anything, we need individual sentences to score.\n\n```\nfunction splitSentences(text) {\n  return text\n    .replace(/([.!?])\\s+/g, \"$1\\n\")\n    .split(\"\\n\")\n    .map(s => s.trim())\n    .filter(s => s.length > 20);\n}\n```\n\nWe split on `.`\n\n, `!`\n\n, and `?`\n\n, then throw away anything shorter than 20 characters (those are usually headers or artifacts, not real sentences).\n\nNow we count how often each word appears across the whole text. But here's the trick: we **drop stopwords first**.\n\nStopwords are the glue words — \"the\", \"a\", \"is\", \"in\", \"and\", \"or\". They appear everywhere, so if we counted them, every sentence would look equally important. We want signal, not noise.\n\n``` js\nconst STOPWORDS = new Set([\n  \"the\", \"a\", \"an\", \"is\", \"it\", \"in\", \"on\", \"at\", \"to\",\n  \"and\", \"or\", \"but\", \"of\", \"for\", \"with\", \"this\", \"that\",\n  \"are\", \"was\", \"were\", \"be\", \"been\", \"has\", \"have\", \"had\"\n  // ... about 50 more\n]);\n\nfunction wordFreq(sentences) {\n  const freq = {};\n  for (const s of sentences) {\n    for (const w of tokenize(s)) {\n      if (!STOPWORDS.has(w) && w.length > 2) {\n        freq[w] = (freq[w] || 0) + 1;\n      }\n    }\n  }\n  return freq;\n}\n```\n\nAfter this step, you have a dictionary like:\n\n`\"learning\"`\n\n→ 8`\"data\"`\n\n→ 6`\"model\"`\n\n→ 5`\"overfitting\"`\n\n→ 4Those are your topics. Those are what the text is about.\n\nNow score every sentence based on how many high-frequency words it contains. Divide by sentence length to avoid rewarding long rambling sentences.\n\n``` js\nfunction scoreSentence(sentence, freq) {\n  const words = tokenize(sentence)\n    .filter(w => !STOPWORDS.has(w) && w.length > 2);\n  if (!words.length) return 0;\n  const sum = words.reduce((acc, w) => acc + (freq[w] || 0), 0);\n  return sum / words.length;\n}\n```\n\nA sentence like \"Machine learning models trained on large datasets achieve better generalization\" will score high because `learning`\n\n, `models`\n\n, `datasets`\n\n, and `generalization`\n\nall appear frequently in your notes.\n\nA sentence like \"This is also true in many ways\" will score near zero. It's filler.\n\nHere's the part beginners miss. If you just return sentences in score order, the summary sounds like a jumbled mess because sentences reference things from earlier in the text.\n\nSort by score, take the top N, then **re-sort by original position**.\n\n``` js\nfunction pickTop(sentences, scores, n) {\n  return scores\n    .map((score, i) => ({ i, score }))\n    .sort((a, b) => b.score - a.score)  // rank by importance\n    .slice(0, n)\n    .map(x => x.i)\n    .sort((a, b) => a - b)              // restore reading order!\n    .map(i => sentences[i]);\n}\n```\n\nNow the summary reads like a coherent passage rather than a shuffle of random facts.\n\n**Extractive summarization** (what we just built) picks real sentences verbatim from the original text. It cannot make things up. It cannot hallucinate. It works offline. It respects your privacy completely.\n\n**Abstractive summarization** (what ChatGPT does) generates brand new text. It can be more fluent and can combine ideas from multiple sentences. But it can also make stuff up, it needs an API call, and your notes leave your device.\n\nFor studying, extractive wins. You want the actual sentences from your notes — not a paraphrase that might subtly change the meaning of something.\n\nYou paste 20 paragraphs of machine learning notes. You set the slider to 5 sentences. You click Summarize.\n\nYou get back:\n\nEverything happens in your browser. The notes never leave your device.\n\nWord-frequency scoring is 30-year-old technology. It's the foundation of old-school search engines and still powers the \"related articles\" features of many news sites.\n\nFor study notes specifically, it works really well because:\n\nThe TF (term frequency) scoring you see here is also the first half of TF-IDF, which powers document search at scale. Same concept, extended to compare documents against a corpus.\n\nThe full thing is live at [dev48v.infy.uk/solvefromzero.php](https://dev48v.infy.uk/solvefromzero.php) — Day 6.\n\nLoad the sample notes (20 sentences of ML content), try different summary lengths, and flip open the debug panel to see every sentence's score. It's a pretty good intuition-builder for how extractive summarization actually works under the hood.\n\nThe source is about 120 lines of vanilla JS — no libraries, no build step, no dependencies.\n\nThis is Day 6 of **SolveFromZero** — I'm building 50 real hackathon problems from scratch, one per day. Each day has three tabs: see it working, understand the algorithm, and build it step by step.\n\nYesterday was a resume bullet improver (regex-based Verb·What·Impact rewriting). Tomorrow: sign-language to text via webcam and MediaPipe.\n\nIf you're studying for exams and want to try this on your actual notes — drop a comment with how well it worked. I'm curious what subject it struggles with most.", "url": "https://wpnews.pro/news/i-built-a-study-notes-summarizer-in-the-browser-no-ai-api-just-word-frequency", "canonical_source": "https://dev.to/dev48v/i-built-a-study-notes-summarizer-in-the-browser-no-ai-api-just-word-frequency-scoring-2bf9", "published_at": "2026-06-15 16:52:37+00:00", "updated_at": "2026-06-15 17:07:19.000311+00:00", "lang": "en", "topics": ["natural-language-processing", "developer-tools", "machine-learning"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/i-built-a-study-notes-summarizer-in-the-browser-no-ai-api-just-word-frequency", "markdown": "https://wpnews.pro/news/i-built-a-study-notes-summarizer-in-the-browser-no-ai-api-just-word-frequency.md", "text": "https://wpnews.pro/news/i-built-a-study-notes-summarizer-in-the-browser-no-ai-api-just-word-frequency.txt", "jsonld": "https://wpnews.pro/news/i-built-a-study-notes-summarizer-in-the-browser-no-ai-api-just-word-frequency.jsonld"}}