{"slug": "ai-travel-assistant-powered-by-gemma-4-with-streaming-image-input-and-visual", "title": "AI Travel Assistant Powered by Gemma 4; With Streaming, Image Input, and Visual Recommendation Cards", "summary": "The article describes the Gemma Travel Assistant, an AI-powered chat application built with Google's Gemma 4 31B Dense model that helps users plan trips through a single conversational interface. The app features streaming responses, native image input recognition, and visual recommendation cards that parse hotel, destination, and restaurant suggestions into easily scannable cards with ratings and price ranges. The developer chose the 31B Dense model over other variants for its balance of reasoning quality and coherence over long conversations, supported by a 128K context window that maintains trip planning details throughout extended interactions.", "body_md": "This is a submission for the Gemma 4 Challenge: Build with Gemma 4\nPlanning a trip used to mean bouncing between five browser tabs — one for flights, one for hotels, one for itineraries, one for Reddit threads, and one you forgot you opened. I wanted to collapse that into a single conversation.\nGemma Travel Assistant is an AI-powered chat app that helps you plan trips from scratch. Tell it your budget, your vibe, your dates. Ask follow-up questions. Upload a photo of somewhere you saw on Instagram and ask \"where is this, and what should I do there?\" It remembers everything you said earlier in the conversation and uses it to give you better answers.\nWhat makes it feel different from a plain chatbot:\nIt doesn't just write paragraphs. When Gemma recommends hotels or destinations, the app parses those recommendations out of the response and renders them as visual cards — name, location, type badge (hotel / destination / restaurant), star rating, price range. You can scan five options in three seconds instead of reading five bullet points.\nResponses stream token by token. You start reading the answer while Gemma is still writing it. For a full 5-day itinerary that can be 600+ words, this makes the experience feel instant instead of frozen.\nIt understands images natively. Drop in a photo — a landscape, a hotel lobby, a plate of food — and the model uses it as context. No extra vision pipeline, no OCR. Gemma 4 handles it directly.\nExample conversation:\nYou: Plan a 5-day trip to Kyoto in October, budget around $1500, I love temples and local food\nGemma: Here's a day-by-day itinerary for Kyoto in October — peak foliage season, so I've planned around the best viewing spots...\n(streams in, then suggestion cards appear below for ryokans and restaurants)You: (uploads a photo of a bamboo forest)\nGemma: That's Arashiyama Bamboo Grove in western Kyoto. It's already on day 3 of your itinerary — here are the best times to visit to beat the crowds...\nGitHub: https://github.com/mushahidmehdi/gemma-travel-assistant\nStack:\n| Layer | Choice |\n|---|---|\n| Framework | Next.js 16 (App Router) |\n| Model | Gemma 4 31B Dense via OpenRouter |\n| Styling | Tailwind CSS |\n| Markdown | ReactMarkdown |\n| Icons | Lucide React |\nProject structure:\nsrc/\n├── app/\n│ ├── api/chat/route.ts # Streaming SSE proxy → OpenRouter\n│ ├── layout.tsx\n│ └── page.tsx # Centered card layout\n└── components/\n├── ChatInterface.tsx # Input, image upload, message list\n├── ChatMessage.tsx # Bubble renderer + suggestion parser\n└── SuggestionCard.tsx # Hotel / destination / restaurant cards\nI went with Gemma 4 31B Dense (google/gemma-4-31b-it\n). Here's why that specific model, not the others:\nThe E2B / E4B models are designed for edge and mobile — brilliant for offline use, but I needed server-grade reasoning quality for multi-day itineraries with budget constraints, visa tips, and local context. A 2B model can hallucinate confidently about things it doesn't know well.\nThe 26B MoE model is optimized for throughput. For a travel assistant where a single user sends a message and waits for the reply, throughput wasn't the bottleneck. Quality and coherence over a long conversation were.\nThe 31B Dense hits the right balance: strong enough to produce well-structured, accurate travel advice, consistent enough to reliably follow formatting instructions (more on that below), and available on OpenRouter's free tier so anyone can clone the repo and run it without a credit card.\nThe 128K context window was the other deciding factor. Planning a real trip is a long conversation. By the time you've discussed your budget, chosen a region, rejected two hotel options, added a day trip, and asked about visa requirements, you've accumulated thousands of tokens of context. Smaller context windows start dropping earlier constraints. With 128K, nothing gets forgotten.\nThe API route doesn't buffer — it pipes OpenRouter's SSE stream directly to the browser:\n// src/app/api/chat/route.ts\nconst stream = new ReadableStream({\nasync start(controller) {\nconst reader = response.body!.getReader();\nconst decoder = new TextDecoder();\nwhile (true) {\nconst { done, value } = await reader.read();\nif (done) break;\nconst chunk = decoder.decode(value);\nconst lines = chunk.split('\\n').filter(line => line.startsWith('data: '));\nfor (const line of lines) {\nconst data = line.slice(6);\nif (data === '[DONE]') { controller.close(); return; }\ntry {\nconst parsed = JSON.parse(data);\nconst content = parsed.choices?.[0]?.delta?.content;\nif (content) controller.enqueue(new TextEncoder().encode(content));\n} catch { /* skip malformed chunks */ }\n}\n}\ncontroller.close();\n},\n});\nreturn new Response(stream, {\nheaders: { 'Content-Type': 'text/plain; charset=utf-8' },\n});\nOn the client, ChatInterface\nreads the stream chunk by chunk and appends to the last message in state, so React re-renders progressively as tokens arrive.\nI didn't use a formal structured output API. Instead, the system prompt tells Gemma to append a fenced suggestions\nblock at the end of any response that involves specific recommendations:\nWhen suggesting places, format your hotel/destination/restaurant recommendations\nas a JSON block at the end of your response:\nsuggestions\n[{\n\"name\": \"Nishiyama Onsen Keiunkan\",\n\"location\": \"Yamanashi, Japan\",\n\"type\": \"hotel\",\n\"rating\": 4.9,\n\"price\": \"$$$\",\n\"description\": \"The world's oldest hotel, operating since 705 AD...\"\n}]\ntypescript\nChatMessage\nthen does two things: strips that block from the visible text (so it doesn't appear as raw JSON in the bubble), and passes the parsed array to SuggestionCard\ncomponents:\nfunction parseSuggestions(content: string) {\nconst match = content.match(/```\n{% endraw %}\nsuggestions\\n([\\s\\S]*?)\n{% raw %}\n```/);\nif (!match) return { text: content, suggestions: [] };\nconst text = content.replace(/```\n{% endraw %}\nsuggestions\\n[\\s\\S]*?\n{% raw %}\n```/, '').trim();\ntry {\nreturn { text, suggestions: JSON.parse(match[1]) };\n} catch {\nreturn { text: content, suggestions: [] }; // graceful fallback\n}\n}\nIf Gemma omits the block — for a conversational reply like \"Great, let's add a day trip!\" — the component falls through cleanly and just shows the text bubble. No crashes, no empty card rows.\nImage uploads are encoded as base64 data URLs and injected into the last user message as an image_url\ncontent block — the format OpenRouter and Gemma 4 expect:\nif (msg.role === 'user' && image && isLastMessage) {\nreturn {\nrole: 'user',\ncontent: [\n{ type: 'text', text: msg.content },\n{ type: 'image_url', image_url: { url: image } }, // base64 data URL\n],\n};\n}\nGemma 4's native vision understands the image without any preprocessing on my end — no external OCR, no separate vision model call. The model sees both the image and the conversation history and responds in context.\nBuilding this made me appreciate how much the context window size and multimodal capability change what's actually possible in a single conversation. A travel assistant that forgets what you said three messages ago, or that can't look at a photo you found, is just a fancier search box. Gemma 4 31B makes it feel like talking to someone who's actually paying attention.", "url": "https://wpnews.pro/news/ai-travel-assistant-powered-by-gemma-4-with-streaming-image-input-and-visual", "canonical_source": "https://dev.to/developerontravel/ai-travel-assistant-powered-by-gemma-4-with-streaming-image-input-and-visual-recommendation-cards-5hk1", "published_at": "2026-05-23 06:42:24+00:00", "updated_at": "2026-05-23 07:02:25.354576+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "products"], "entities": ["Gemma 4", "Gemma Travel Assistant", "Kyot"], "alternates": {"html": "https://wpnews.pro/news/ai-travel-assistant-powered-by-gemma-4-with-streaming-image-input-and-visual", "markdown": "https://wpnews.pro/news/ai-travel-assistant-powered-by-gemma-4-with-streaming-image-input-and-visual.md", "text": "https://wpnews.pro/news/ai-travel-assistant-powered-by-gemma-4-with-streaming-image-input-and-visual.txt", "jsonld": "https://wpnews.pro/news/ai-travel-assistant-powered-by-gemma-4-with-streaming-image-input-and-visual.jsonld"}}