I built a study-notes summarizer in the browser — no AI API, just word-frequency scoring
Finals season is brutal. You have 40 pages of lecture notes, 3 hours left, and a brain that stopped cooperating at page 12.
What if you could paste all those notes and instantly get the 5 sentences that actually matter?
That's what I built today — a fully browser-side study-notes summarizer. No OpenAI key. No backend. No data leaving your machine. Just pure vanilla JavaScript and a technique called extractive summarization.
Here's how it works, explained like you're a person and not a research paper.
Think about how you'd summarize something manually. You'd read through, notice which topics keep popping up, and highlight the sentences that mention those topics most.
That's literally what the algorithm does. It counts words.
The insight: sentences that use the most-discussed words are probably the most important sentences.
Before scoring anything, we need individual sentences to score.
function splitSentences(text) {
return text
.replace(/([.!?])\s+/g, "$1\n")
.split("\n")
.map(s => s.trim())
.filter(s => s.length > 20);
}
We split on .
, !
, and ?
, then throw away anything shorter than 20 characters (those are usually headers or artifacts, not real sentences).
Now we count how often each word appears across the whole text. But here's the trick: we drop stopwords first.
Stopwords are the glue words — "the", "a", "is", "in", "and", "or". They appear everywhere, so if we counted them, every sentence would look equally important. We want signal, not noise.
const STOPWORDS = new Set([
"the", "a", "an", "is", "it", "in", "on", "at", "to",
"and", "or", "but", "of", "for", "with", "this", "that",
"are", "was", "were", "be", "been", "has", "have", "had"
// ... about 50 more
]);
function wordFreq(sentences) {
const freq = {};
for (const s of sentences) {
for (const w of tokenize(s)) {
if (!STOPWORDS.has(w) && w.length > 2) {
freq[w] = (freq[w] || 0) + 1;
}
}
}
return freq;
}
After this step, you have a dictionary like:
"learning"
→ 8"data"
→ 6"model"
→ 5"overfitting"
→ 4Those are your topics. Those are what the text is about.
Now score every sentence based on how many high-frequency words it contains. Divide by sentence length to avoid rewarding long rambling sentences.
function scoreSentence(sentence, freq) {
const words = tokenize(sentence)
.filter(w => !STOPWORDS.has(w) && w.length > 2);
if (!words.length) return 0;
const sum = words.reduce((acc, w) => acc + (freq[w] || 0), 0);
return sum / words.length;
}
A sentence like "Machine learning models trained on large datasets achieve better generalization" will score high because learning
, models
, datasets
, and generalization
all appear frequently in your notes.
A sentence like "This is also true in many ways" will score near zero. It's filler.
Here's the part beginners miss. If you just return sentences in score order, the summary sounds like a jumbled mess because sentences reference things from earlier in the text.
Sort by score, take the top N, then re-sort by original position.
function pickTop(sentences, scores, n) {
return scores
.map((score, i) => ({ i, score }))
.sort((a, b) => b.score - a.score) // rank by importance
.slice(0, n)
.map(x => x.i)
.sort((a, b) => a - b) // restore reading order!
.map(i => sentences[i]);
}
Now the summary reads like a coherent passage rather than a shuffle of random facts.
Extractive summarization (what we just built) picks real sentences verbatim from the original text. It cannot make things up. It cannot hallucinate. It works offline. It respects your privacy completely.
Abstractive summarization (what ChatGPT does) generates brand new text. It can be more fluent and can combine ideas from multiple sentences. But it can also make stuff up, it needs an API call, and your notes leave your device.
For studying, extractive wins. You want the actual sentences from your notes — not a paraphrase that might subtly change the meaning of something.
You paste 20 paragraphs of machine learning notes. You set the slider to 5 sentences. You click Summarize.
You get back:
Everything happens in your browser. The notes never leave your device.
Word-frequency scoring is 30-year-old technology. It's the foundation of old-school search engines and still powers the "related articles" features of many news sites.
For study notes specifically, it works really well because:
The TF (term frequency) scoring you see here is also the first half of TF-IDF, which powers document search at scale. Same concept, extended to compare documents against a corpus.
The full thing is live at dev48v.infy.uk/solvefromzero.php — Day 6.
Load the sample notes (20 sentences of ML content), try different summary lengths, and flip open the debug panel to see every sentence's score. It's a pretty good intuition-builder for how extractive summarization actually works under the hood.
The source is about 120 lines of vanilla JS — no libraries, no build step, no dependencies.
This is Day 6 of SolveFromZero — I'm building 50 real hackathon problems from scratch, one per day. Each day has three tabs: see it working, understand the algorithm, and build it step by step.
Yesterday was a resume bullet improver (regex-based Verb·What·Impact rewriting). Tomorrow: sign-language to text via webcam and MediaPipe.
If you're studying for exams and want to try this on your actual notes — drop a comment with how well it worked. I'm curious what subject it struggles with most.