Fun Local LLM Comparisons with Gemma, Granite, and Qwen

Ekorbia v0.2 introduces a comparison-chat mode that runs two to three local large language models against the same prompt in parallel. Testing Gemma 4, IBM Granite 4.1, and Qwen 3.5 on a 32 GB M1 Max MacBook Pro revealed that Granite incorrectly selected Chicago deep-dish pizza as a runner-up to New York City's best pizza, while Gemma adopted a "grumpy librarian" voice when explaining Hacker News. The feature requires significant memory, as running three large models in parallel can exceed 32 GB and cause swapping that slows performance.

← All posts ./index.html Fun local LLM comparisons with Gemma, Granite, and Qwen Ekorbia v0.2 features a comparison-chat mode that runs 2-3 local models against the same prompt in parallel. Here are a few fun prompts running across Gemma 4 e2b , IBM Granite 4.1 8B , and Qwen 3.5 4B on my 32 GB M1 Max MacBook Pro. 1. The Pizza Question New York City is widely regarded as having the best pizza due to its iconic thin-crust style. The models were initially reluctant to give a single answer until I attached the following additional prompt: "Provide clear, concise, opinionated answers to comparison or 'best' questions. Each comparison should have a single winner and a runner-up with a short explanation." The New York City and Naples answers are acceptable but Granite is clearly wrong here with the runner-up of Chicago deep-dish pizza And no mention of New Haven style pizza anywhere? 2. Explain Hacker News It’s a sprawling, perpetually messy digital common room. Gemma is the most fun here, carrying the 'grumpy librarian' voice across multiple paragraphs while Granite and Qwen provide more serious answers with a sprinkling of grumpy librarian at the beginning and the end. 3. Will robots take over? There is no consensus among experts that unchecked AI growth will inevitably lead to a robot takeover of Earth. All three models take the question seriously and none think we are doomed to a Terminator like future. Things to watch for with Ekorbia comparison mode. - Memory matters. Three large models running in parallel can blow past 32 GB on my MacBook. Ollama will swap them in and out, which makes the "parallel" feel serial. - First-token latency varies wildly. A column that's still showing dots while another is mid-paragraph isn't broken — it's cold-loading. - Granite 4.1 8B is fast. It's worth a try if you've mostly been using Qwen or Gemma. Send us yours Got a prompt that produces a hilarious three-way disagreement? Open an issue https://github.com/ekorbia/ekorbia-desktop/issues with the prompt and the three outputs and we'll feature the best ones in a follow-up.