cd /news/large-language-models/introducing-the-gemcod-r-sapphire · home topics large-language-models article
[ARTICLE · art-27215] src=discuss.huggingface.co ↗ pub= topic=large-language-models verified=true sentiment=↓ negative

Introducing the GemCod-R-Sapphire

Developer Bidram benchmarked the GemCod-R-Sapphire-270M code agent against the base Gemma 3 270M model and found it underperformed, with only 2 correct out of 13 MBPP-style tasks versus 3 for the base model. The model exhibited failures in dynamic programming, bitwise reasoning, and instruction following, leading Bidram to recommend a simpler, cleaner dataset. The model's creator acknowledged the issues and plans to refine the dataset.

read2 min publishedJun 14, 2026

Introducing the GemCod-R-Sapphire-270M code agent for all your snippet and explanation needs.

The GemCod family has been designed upon the versatile gemma-3-270m-it based model and are used to bring features usually found on frontier models to a 270M parameter agent.

The Sapphire is the latest model in the GemCod-R family with COT(Chain Of Thought) prompting and superior code generation abilities.

The model can be found [here](https://huggingface.co/DireDreadlord/GemCod-R-Sapphire-270M).

Also check out the main GemCod line over [here](https://huggingface.co/collections/DireDreadlord/gemcod-270m).

[Bidram](https://discuss.huggingface.co/u/Bidram)

2 Hello

I’d really like to test and benchmark the model. I’d also be happy to help improve it by identifying its weak spots, and tomorrow I plan to benchmark both your fine-tuned model and the original model. I’ll try to share the results with you as soon as possible.

Bidram 4 Hello again, and sorry for the late response. I ran into some issues while benchmarking the model, so before starting larger benchmark runs, I manually checked a sample of MBPP-style tasks and compared Gemcod against the base Gemma 3 270M.

From this sample, Gemcod appears to underperform even the base model on many tasks. In several cases, it produces syntactically plausible code, but the actual logic is incorrect, the requested algorithm is not followed, or the explanation does not match the implementation. Observed sample results:

Gemcod:

- Correct: 2 / 13
- Incorrect: 11 / 13

Gemma 3 270M:

- Correct: 3 / 13
- Incorrect: 10 / 13

Main failure patterns in Gemcod:

  • Dynamic programming failures
  • Bitwise reasoning failures
  • Sequence/math recurrence failures
- Instruction-following failures
- Output-format mistakes
  • Confident but incorrect explanations

From these examples, my impression is that the fine-tuning dataset may be too difficult or too heterogeneous for a 270M model to learn stable coding patterns effectively. Instead of improving task fidelity, the model often seems to fall back to repetitive or confused logic patterns. My recommendation would be to try a lighter and more carefully filtered dataset:

  • simpler Python tasks
- short function-level problems
- strongly unit-tested examples
  • consistent input/output formatting
  • fewer noisy explanations

I think a 270M model can improve on coding tasks, but it likely needs a narrower, cleaner, and more curriculum-like dataset rather than highly complex or mixed-difficulty samples.

Hmm thank you for the detailed response, it really helps in recognising the drawbacks of the current reasoning architecture in the GemCod-R family(though I suppose that could be expected with it being highly experimental). I think the issue could be with the dataset being a little too specialized or perhaps some formatting issues in the templating. I will follow your suggestions

Perhaps if you had the time you could maybe test out another one of my non-reasoning models? DireDreadlord/GemCod-Topaz-270M · Hugging Face

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/introducing-the-gemc…] indexed:0 read:2min 2026-06-14 ·