{"slug": "hackerrank-open-sourced-its-ats-my-resume-scored-90-100-oh-wait-74-no-88", "title": "HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74. No – 88", "summary": "HackerRank open-sourced its applicant tracking system (ATS) scoring tool, but testing reveals extreme score variability—ranging from 66 to 99 on the same resume—due to LLM non-determinism. The tool's scoring is inconsistent across runs, with project and experience categories showing high variance or lack of differentiation, raising concerns about reliability in hiring decisions.", "body_md": "# HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74/100. No — 88/100. Actually 83/100.\n\n### How hiring is becoming a luck filter.\n\nThis open-source ATS by HackerRank has been blowing up recently: [https://github.com/interviewstreet/hiring-agent](https://github.com/interviewstreet/hiring-agent)\n\nIt’s popped up on LinkedIn and Reddit with hundreds, sometimes thousands, of likes.[1](#footnote-1) A coworker mentioned it to me in passing a few days ago.\n\nI’ve decided to test it out.\n\nFirst working run: 90/100. Felt pretty good!\n\nI had some debug prints scattered around from troubleshooting the setup, so I cleaned those up and ran it again.\n\n74/100.\n\nSame resume. Same command. The only thing I changed was deleting print statements.\n\nI disabled `DEVELOPMENT_MODE`\n\nand put it in a loop to run a hundred times.\n\nThe scores range from 66 to 99.\n\nIf your company’s cutoff sits at 85, I fail 65% of the time. Same exact resume, different luck.\n\nHere a quick rundown on how the tool works:\n\nYour PDF gets parsed into text. An LLM is called six times to extract structured information — your basics, work history, education, skills, projects, awards. It pulls your GitHub profile, scans your top repos, appends them as extra context. Then everything gets fed into the LLM at once to be graded.\n\nThe scoring is out of 100, with up to 20 bonus points on top:\n\n35 points for open source contributions\n\n30 for personal projects\n\n25 for work experience\n\n10 for technical skills\n\nUp to 20 bonus points for startup experience, a portfolio site, a technical blog, etc.\n\nThe default model is gemma3:4b, running at temperature 0.1 — low, supposedly nudging the model toward deterministic outputs.\n\nHere’s what I found when I looked at those individual categories.\n\nLook at technical skills: I scored 8/10 in 98 out of 100 runs. Nearly perfect consistency. How come? Because technical skills are a checklist. You either know React or you don’t. There’s nothing for an LLM to judge — a five year old could match that check-list.\n\nNow look at projects — there’s HUGE variation.\n\nLLMs struggle to make a judgment call like that consistently. Sometimes my projects “lack architectural complexity”, sometimes they “demonstrate real-world deployment”. Which one the LLM spits out is a roll of the dice.\n\nTemperature 0.1 is already low, but even going down to temperature 0 doesn’t fix this. Someone opened a GitHub issue back in October showing scores of 27, 34, 32, 34, 34, 30 across six consecutive runs at temperature 0.[2](#footnote-2) This non-determinism isn’t a bug you can just fine-tune away, it’s a fundamental design flaw.\n\nI was worried part of this might be the model. After all, gemma3:4b was a local model running on my machine.\n\nGemini resulted in a tighter distribution — scores clustered between 48 and 64. But if your cutoff is 60, you’re still failing 28% of the time through no fault of your own.\n\nThe Open Source scores have become consistent — that’s a legit improvement. But project scores are still all over the place.\n\nExperience has me the most concerned.\n\n25/25.\n\nEvery single run.\n\nI went back and pulled up an old resume — one internship on it.\n\nAlso 25/25.\n\nThe clue is in the prompt…\n\n```\n### Production (0-25 points)\n- Analyze the 'work' and 'volunteer' sections for real-world, internship, or production experience\n- **SPECIAL CONSIDERATION**: Give extra points for founder roles, co-founder positions, or early-stage engineer roles (first 10-20 employees) at startups\n```\n\nThe entire thing is two lines long.\n\nNo rubric. No examples. No anchors for what earns a 15 versus a 25.\n\nA junior engineer with one internship gets 25/25. A principal engineer with a decade of distributed systems gets 25/25. I get 25/25. Experience has two lines and no anchors — consistent, but useless. Projects has a detailed rubric with examples but it’s the noisiest category — inconsistent, also useless. There are some things that LLMs just can’t do well, no matter how you prompt.\n\nUse an LLM to parse a resume into structured data — great, that’s what they’re good at. Use one to check whether someone knows Python — amazing. Use one to judge whether a candidate’s experience is worth 18 points or 24 points? You get a vibe-check. Something HR teams, bar raisers, and a dozen other initiatives have spent decades trying to avoid.\n\nThe 65% weighting on open source + projects doesn’t help either. I’d take the engineer with 30 years of experience who built S3 over someone with two internships and an open source project — but this tool wouldn’t. Some of the best engineers I know have built things that never ended up on GitHub. That’s over half of their score gone before any human looks their way.\n\nIf you’re an engineer with any say in how your company handles resume screening: please be very careful with AI-screening tools. A tool that can’t differentiate isn’t filtering for quality — it’s just filtering. You might as well throw out half the resumes and tell the the applicants you don’t fuck with bad luck.\n\n*Correction (June 28): A reader flagged that the resume_evaluation_criteria.jinja template says “Software Intern” on line 1 — nowhere documented, nowhere else referenced in the repo. The same template that later gives bonus points for “founder roles, co-founder positions, or early-stage engineer roles.” I re-ran with an explicit Senior SWE prompt and got identical results — the scoring dimensions are position-agnostic.*\n\n[2](#footnote-anchor-2)\n\nNon-determinism at temperature 0 was flagged in [this GitHub issue](https://github.com/interviewstreet/hiring-agent/issues/35), opened October 2025.", "url": "https://wpnews.pro/news/hackerrank-open-sourced-its-ats-my-resume-scored-90-100-oh-wait-74-no-88", "canonical_source": "https://danunparsed.com/p/hackerrank-open-source-ats", "published_at": "2026-06-29 01:44:40+00:00", "updated_at": "2026-06-29 02:28:46.620515+00:00", "lang": "en", "topics": ["large-language-models", "ai-tools", "ai-ethics", "ai-research"], "entities": ["HackerRank", "Gemma", "Gemini", "GitHub", "LinkedIn", "Reddit"], "alternates": {"html": "https://wpnews.pro/news/hackerrank-open-sourced-its-ats-my-resume-scored-90-100-oh-wait-74-no-88", "markdown": "https://wpnews.pro/news/hackerrank-open-sourced-its-ats-my-resume-scored-90-100-oh-wait-74-no-88.md", "text": "https://wpnews.pro/news/hackerrank-open-sourced-its-ats-my-resume-scored-90-100-oh-wait-74-no-88.txt", "jsonld": "https://wpnews.pro/news/hackerrank-open-sourced-its-ats-my-resume-scored-90-100-oh-wait-74-no-88.jsonld"}}