Building a Mixed-Reality Tour Guide with Android XR, the Geospatial API, and Gemini Google announced at I/O that the Geospatial API is now available as a preview in ARCore for Jetpack XR, enabling sub-meter accurate anchoring of digital content to the physical world. The company demonstrated the XR Geospatial Tour, a mixed-reality tour guide combining the Geospatial API, Gemini API, Google Maps Grounding, and Jetpack XR SDK for hands-free immersive walking tours. At this year's Google I/O https://www.youtube.com/watch?v=1KOO2lqsdaA , we announced an update for spatial experiences: the Geospatial API https://developer.android.com/reference/kotlin/androidx/xr/arcore/Geospatial is now available as a preview in ARCore for Jetpack XR https://developer.android.com/develop/xr/jetpack-xr-sdk/arcore . By bringing Google's Visual Positioning System VPS to Android XR, Android XR enables anchoring digital content to the physical world with sub-meter accuracy and precise orientation in supported areas. To explore what the Geospatial API could unlock, our team built a demo: the XR Geospatial Tour. Imagine walking into a new city, putting on a pair of wired XR glasses like the upcoming XREAL Project Aura , and instantly having a knowledgeable, local guide showing you around. You don't need to stare down at a 2D map—instead, 3D models gently guide your path, and an intelligent voice tells you about the historical landmarks right in front of you. We combined the Geospatial APIs https://developer.android.com/reference/kotlin/androidx/xr/arcore/Geospatial , Gemini API using Firebase AI Logic https://firebase.google.com/docs/ai-logic , Google Maps Grounding https://ai.google.dev/gemini-api/docs/maps-grounding , and Jetpack XR SDK https://developer.android.com/develop/xr/jetpack-xr-sdk to create a hands-free, immersive walking tour experience. Disclaimer: Video and Tour Guide application are for demonstration purposes only. Some sequences have been shortened. Any hardware depicted may be under development; final product details may differ. Let’s walk through the implementation details and show how we tied these APIs together to build a world-scale spatial experience. Enhance your navigation experience on XR by combining the power of GPS with the precision of VPS. The accuracy and precise orientation that comes with VPS allows 3D waypoints to align with the physical world. This is why the Geospatial API on Android XR can help you build custom experiences. By using advanced computer vision, VPS tries to provide a GeospatialPose https://developer.android.com/reference/kotlin/androidx/xr/runtime/math/GeospatialPose including latitude, longitude, and heading that is more accurate than GPS. Here's how we retrieve the user's Geospatial pose by mapping the device's orientation to a Geospatial coordinate: val result = geospatial.createGeospatialPoseFromPose arDevice.state.value.devicePose if result is CreateGeospatialPoseFromPoseSuccess { val pose = result.pose Log.d "VPS", "Accurate Location: ${pose.latitude}, ${pose.longitude}" } Because the entire experience relies on this accuracy, we monitor the horizontalAccuracy and orientationYawAccuracy until they meet our thresholds. If the user is indoors or in an unrecognized area, we prompt them to "walk to an outdoor public space and look around". Once we have a location, we use the Gemini API using Firebase AI Logic https://firebase.google.com/docs/ai-logic to prompt the Gemini model to act as a local tour guide. We pass the user's coordinates to the model and ask it to output a structured JSON response containing nearby walking tours: functionCallingConfig = null, retrievalConfig = retrievalConfig { latLng = FirebaseLatLng pose.latitude, pose.longitude languageCode = "en" } val responseJsonSchema = Schema.obj mapOf "locationIntro" to Schema.string , "tours" to Schema.array Schema.obj mapOf "title" to Schema.string , "description" to Schema.string , "stops" to Schema.array Schema.obj mapOf "name" to Schema.string , "detailedName" to Schema.string , "description" to Schema.string val model = Firebase.ai backend = GenerativeBackend.googleAI .generativeModel modelName = "gemini-3.5-flash", tools = listOf Tool.googleMaps , generationConfig = generationConfig { responseMimeType = "application/json" responseSchema = responseJsonSchema } val result = model.generateContent "The user is at latitude ${pose.latitude} and longitude ${pose.longitude}. Generate exactly 3 diverse tours near this location e.g., historical, food, nature . All tour ideas should be walking distance only." Large Language Models are great at generating rich descriptions, but they can sometimes hallucinate exact latitude/longitude coordinates. To solve this, we used Google Maps Grounding https://ai.google.dev/gemini-api/docs/maps-grounding to ground the AI. To make the tour guide feel truly present, we implemented dynamic voiceovers. Using the gemini-2.5-flash-tts model, we can configure our model generation config to natively return audio data instead of just text Here’s how you can request the ResponseModality.AUDIO: .generativeModel modelName = "gemini-2.5-flash-tts", generationConfig = generationConfig { // Instruct the model to return Audio responseModalities = listOf ResponseModality.AUDIO } val response = ttsModel.generateContent "Say in a neutral but positive voice:\n$prompt" // Extract the raw audio bytes from the responseval audioBytes = response.candidates.firstOrNull ?.content?.parts ?.filterIsInstance