Job Searcher A team of developers has built an AI-powered job search tool that uses a fine-tuned Qwen3-8B language model to generate LinkedIn queries, scrape job postings, and score each role against a candidate's resume across five fit dimensions. The system, trained on 2,500 synthetic resumes and roughly 10,000 LinkedIn listings, returns a shortlist with per-job reasoning rather than a raw list of openings. The project is open-sourced on HuggingFace and runs on a single GPU via llama.cpp, with a live demo available for public use. Text Generation • Updated Job Searcher Team Article /blog Published June 6, 2026 thinking about each listing is higher than the cost of submitting to one. Watch the short tour:drop a resume, watch the queries stream, read the per-job reasoning. How it works A run has three steps. Queries. The student reads the resume and the preferences you set job type, work modality, location, free-form notes and drafts a small set of LinkedIn-shaped search queries, reasoning out loud as it goes. Search. Those queries hit LinkedIn through JobSpy https://github.com/Bunsly/JobSpy , one at a time. Scoring. For each posting, the model reads the resume, job pair and writes a five-dimension fit score: skills match experience relevance education and certifications industry / domain fit seniority alignment Figure 1. End-to-end steps of the framework. What you get back isn't a list of fifty roles. It's a small shortlist with defensible reasoning. You can read why the model thinks the second-ranked job beats the third. Technical Details Dataset Curation - The teacher and the student The teacher is DeepSeek V4 Pro . Strong at structured reasoning, willing to follow a strict output schema, cheap enough to run once over a large corpus offline. It is used as a label generator, not as an inference-time dependency. The student is Qwen3-8B . Small enough to fit on a single ZeroGPU slice once quantized to Q4 K M, large enough to absorb the teacher's structured judgement. The corpus came from a closed loop, resume-aware end-to-end: Resumes. 2,500, built on Divyaamith/Kaggle-Resume https://huggingface.co/datasets/Divyaamith/Kaggle-Resume . Queries. The teacher first drafted LinkedIn-shaped search queries from each resume. Jobs. JobSpy then scraped LinkedIn for what those queries actually returned. About 10,000 postings, every one of them surfaced by a query the teacher itself wrote for that specific resume. Labels. The teacher then scored every resulting resume, job pair across the same five dimensions used at inference, with one sentence of reasoning per dimension. Everything ships in four foreign-key-clean configs at build-small-hackathon/job-search-distill https://huggingface.co/datasets/build-small-hackathon/job-search-distill . Training Modal Two LoRA SFT runs on a single A100 via Modal https://modal.com , one per task: Adapter. Rank 16, alpha 16, dropout off, attention plus MLP projections. Schedule. One epoch per task. Mid-epoch checkpoints every 200 steps so a partial run could be sanity-checked before the full one finished. Output. Safetensors at, and a Q4 K M base plus LoRA-GGUF sidecars at build-small-hackathon/job-searcher-qwen3-8B for the llama.cpp serving path. build-small-hackathon/job-searcher-qwen3-8B-gguf LoraConfig r=16, lora alpha=16, task type="CAUSAL LM", target modules= "q proj", "k proj", "v proj", "o proj", "gate proj", "up proj", "down proj", , The Space - Inference llama.cpp The Space runs llama-cpp-python with the pre-built CUDA wheel on a HuggingFace ZeroGPU Space. Two design choices that matter: ZeroGPU recycles the CUDA context per call, so a module-level instance would hold a dead context on the second use. Llama inside @spaces.GPU . One GPU call per submission, not per job. All fit evaluations for one submission run inside a single @spaces.GPU call. The model loads once and yields events for every job, instead of paying a fresh cold start and a fresh proxy-token request per posting. Streaming uses the OpenAI-shaped create chat completion stream=True so the reasoning lands in the UI token by token. The live demo is at build-small-hackathon/job-search-assistant https://huggingface.co/spaces/build-small-hackathon/job-search-assistant . The traces The entire Claude Code session that built this Space is published as an HuggingFace agent-traces dataset at build-small-hackathon/job-search-assistant-agent-trace https://huggingface.co/datasets/build-small-hackathon/job-search-assistant-agent-trace . Raw JSONL events, native HuggingFace trace viewer, every dead end and recovery on the record. Useful if you want to see how this thing actually came together rather than read the cleaned-up version of it. Try it Drop your resume at huggingface.co/spaces/build-small-hackathon/job-search-assistant https://huggingface.co/spaces/build-small-hackathon/job-search-assistant . Stop sifting. What I learned Two adapters beat one. I tried folding query generation and fit evaluation into a single LoRA. The model leaked formatting both ways, JSON on the query task and prose on the eval. Splitting them into two heads on the same base, hot-swapped per call, killed the whole class of bugs. The teacher's prompt mattered more than the student's size. Rewriting the teacher's labelling prompt to score against specific resume details "four years of Rust; the role asks for five" instead of "strong technical match" propagated through distillation. The student picked up the same habit.