HRI with the Open Duck Mini

A developer built a custom animation tool for the Open Duck Mini robot, pairing it with the Gemini Live API to enable human-robot interaction through gestures like nodding and shaking its head. The project explores how robot movements can trigger social expectations, drawing inspiration from Disney's character creation techniques. Code is available on GitHub.

I believe the next big leap in AI will be in the robotics space, taking the reasoning power of Large Language Models LLMs and putting it into physical bodies. While focusing on the utility of a robot, like having it perform tasks or chores, is a compelling area of development, we often overlook the interaction between the human user and the machine itself. Rather than focusing on what the robot can do, we should also look at how a robot's movements and actions can trigger specific expectations and social norms for the person interacting with it. Recently I’ve been experimenting with the Open Duck Mini https://github.com/apirrone/Open Duck Mini/tree/v2 , https://github.com/apirrone/Open Duck Mini/tree/v2 and while finally getting the robot to move its head, walk, and generally just do something is an amazing feeling all on its own, I wanted to take some time and explore the topics of human-robot interaction HRI and character development. To do this, I built a custom animation tool that helps define animations that the robot can run through, allowing the robot to move in ways that humans naturally recognize as conveying a message, such as nodding or shaking its head. With some animations defined, I paired them with the Gemini Live API, which serves as an excellent tool for HRI by allowing people to speak directly to the robot and trigger these gestures so the machine can interact with the world in an appropriate, recognizable way. All of the code for this project, including the animation tool client, Flask API server, and playback scripts, is available on GitHub. https://github.com/PaulTR/Open Duck Mini Animator Note: It’s worth noting that I recorded this with all durations set to 300ms, which is why it moves quickly and then directly into the next keyframe interpolation. You can slow down animations by extending the duration of the steps along with changing the easing functions, which I’ll cover more later in this article. For most of the middleware that I'll discuss here today, I was able to generate the code using Gemini and AI Studio http://ai.studio/ to get what I wanted done quickly, helping me bridge the gap between hardware control and high-level AI reasoning without getting bogged down in the setup. This let me go from defining the idea over a few days of casual thinking, generally while I was gardening or doing something around the house, to spending less than a day creating the end project. That said, I closely orchestrated what was generated and reviewed the code to avoid unexpected hiccups. We can learn a lot from the animators and imagineers at Disney, who I highly admire for all of the cool work they do with character creation and human-robot interaction if you’re unfamiliar with it, definitely look up their concept of Living Characters and what they’ve done in that space; it’s kind of chaotic, but I love it . Often the characters that Disney is building are already defined in a movie or TV show, but sometimes they’re brand new and need to be fleshed out for how they interact in a given situation, what physical quirks they have, how they speak, if there is an accent or specific dialect that’s appropriate, and a whole lot of other things that need to be considered. They establish the character’s personality and the specific ways the character expresses emotions like happiness, curiosity, or frustration, and then they engineer the physical robot to embody that character. While a robot does not strictly need a personality to function, defining these traits early in the design phase helps ease the interaction between the machine and the people around it. By establishing what personality the robot should project as it is being built, we can ensure its physical behaviors align with human social expectations. Disney's work, such as their autonomous walking Olaf robot, which they’ve released a bit on their process of getting a proper heel to toe gait, shows how combining artistic animation principles with robotics can create machines that feel alive and responsive, rather than merely mechanical. Along with their work in robotics, I’ve had a copy of The Illusion of Life on my shelf for years, as it’s an amazing reference for general animation I’ve been drawing as a hobby for a few years , and something I used regularly when working as an Android developer years ago. Some of the concepts, such as the various ways to express reality in cartoons, and especially the concepts around variations in movement speed that are needed when interpolating between key frames, were really useful for this project. As this open source robot platform is based on a Star Wars droid which, as a member of the 501st and Rebel Legion costuming charity groups, I knew I “needed” to make one , the personality can be focused on cute beeps and whirls instead of words, and with the way the two legs hold itself being similar to a duck hence the robot name , I took some inspiration from the overly exaggerated movements of early Donald Duck animations. https://youtu.be/2wt5ga4-r6E?si=-w9Ji5IiK0IwaLyd&t=173 Designing physical gestures requires a direct link between the animator's intent and the robot's hardware. Because public animation tools for custom-built hardware are practically non-existent, I needed to develop a custom pipeline to manually position the duck, record those motor configurations as keyframes, and compile them into structured action plans that could execute in sync with audio files that enrich my robot’s response personality. To achieve this quickly, I used AI Studio to generate a custom Flask API server for interfacing with the robot and a companion applet for changing settings on the key frames. Designing Gestures with the Animation ToolThe workflow centers around a companion applet running on a laptop, which communicates over the network with the Flask server hosted on the Raspberry Pi Zero 2W inside the robot. This applet allows me to read the live motor positions as I physically pose the robot focused on motor IDs 30, 31, 32, and 33, which are the neck and head motors on the Open Duck Mini , decide what other emotive features should be used, such as eye and projector lights and antenna orientations, and save the entire sequence as a JSON action script that can later be read by my robot’s python code. The reason this is using SSH instead of a direct USB connection to the robot is two fold: the Raspberry Pi Zero 2W has one USB port that I expanded with a small hub to get a USB microphone added I could have used an i2s microphone, but I wanted options , but cable management is kind of a nightmare, so I wanted less attached to my robot. That USB microphone also requires that the robot be in a USB host orientation, rather than peripheral, so instead of messing up my working configuration, I just went for an all software approach. I’m also able to take the robot away from home and work on it because I configured the Pi to connect to my home network, and if it doesn’t find that network, it turns on its own hotspot that I can connect to for creating a connection from my laptop. Here is an example of an action script generated by the tool with the pose in a neutral position, though each frame becomes another node in the keyframes array: { "keyframes": { "id": "mpzph7m3gpw97yqw2c", "durationMs": 300, "pauseMs": 200, "motors": { "30": 0.5905826033359718, "31": 0.3988350048502669, "32": -0.18867963690993372, "33": -0.15186409800067846 }, "lightsOn": false, "projectorOn": false, "interpolation": "bezier", "antennas": { "left": "center", "right": "center" } }, ... , "globalSound": "beep1.wav" } The script defines keyframes with target motor positions, lights, and antenna angles. It also specifies a globalSound like beep1.wav to play alongside the movement. The Flask API ServerThe API server running on the Raspberry Pi serves as the bridge between the high-level animation design and the physical hardware. It exposes endpoints to read the current motor positions during the recording phase and to execute the playbacks of the saved JSON files for testing there’s a separate Python code snippet for directly playing back an animation during regular robot use . Again, this code was generated, but I planned out what it should look like and reviewed it for quality, so let’s check it out in smaller chunks. 1. Initialization and SetupFirst, we set up the Flask application, enable Cross-Origin Resource Sharing CORS so the laptop applet can communicate with it, and initialize the hardware interface HWI for the motors that we’ll be reading. We also define the GPIO pins for the LEDs, projector/flashlight, speaker for audio playback, and small servos that control the duck's antennas. python from flask import Flask, request, jsonify from flask cors import CORS from mini bdx runtime.duck config import DuckConfig from mini bdx runtime.rustypot position hwi import HWI from gpiozero import LED, AngularServo import timeapp = Flask name CORS app config = DuckConfig hwi = HWI config motor ids = 30, 31, 32, 33 SPEAKER INDEX = 0led1 = LED 23 led2 = LED 24 projector = LED 25 servo left = AngularServo 12, min angle=-90, max angle=90 servo right = AngularServo 13, min angle=-90, max angle=90 Note on Imports: You’ll notice we are importing from2. Hardware Helper and Dampening Functions mini bdx runtime . This is the core runtime package from the original BDX robot project. It wraps the low-level motor communication using rustypot to talk to the Dynamixel Protocol 2.0 used by the Feetech servos , saving us from writing raw serial byte packets ourselves. Next, we define helper functions to control the physical components, a mathematical helper for Bezier interpolation, and hardware dampening functions. While Bezier interpolation in software handles the mathematical path, we also dynamically adjust the motors' internal hardware limits acceleration and velocity profiles using the rustypot library. This allows us to create effects like "viscous" sluggish, heavy or "clamped" snappy but smoothed movements by writing directly to the motor registers. python def set lights on : if on: led1.on led2.on else: led1.off led2.off def set projector on : if on: projector.on else: projector.off def get antenna angle pos str : if pos str == 'back': return -90 if pos str == 'forward': return 90 return 0def set antennas positions : servo left.angle = get antenna angle positions.get 'left', 'center' servo right.angle = get antenna angle positions.get 'right', 'center' ADDR PROFILE ACCEL = 108ADDR PROFILE VELOC = 112def set hardware dampening accel, veloc : try: for mid in motor ids: if hasattr hwi.io, 'write data' : hwi.io.write data mid, ADDR PROFILE ACCEL, accel, 4 hwi.io.write data mid, ADDR PROFILE VELOC, veloc, 4 except Exception as e: passdef apply interpolation dampening interp type : if interp type == 'linear': set hardware dampening 0, 400 elif interp type == 'bezier viscous': set hardware dampening 10, 200 elif interp type == 'bezier clamped': set hardware dampening 40, 500 else: bezier set hardware dampening 30, 400 def bezier interpolate t, type str : if type str == 'linear': return t if type str == 'bezier': return t t 3.0 - 2.0 t Standard smoothstep cubic Hermite if type str == 'bezier viscous': return t t t t t 6.0 - 15.0 + 10.0 Quintic ease-in-out smoother start/end if type str == 'bezier clamped': return 1.0 - 1.0 - t 3 Cubic ease-out fast start, slow end return t 3. Reading Motor PositionsTo record keyframes in the applet, we need to read the current state of the robot. The /read endpoint queries the hardware interface and returns the current positions of the four head motors. python @app.route '/read', methods= 'GET' def read pos : try: positions = hwi.io.read present position motor ids res = {str mid : pos for mid, pos in zip motor ids, positions } return jsonify res except Exception as e: return jsonify {"error": str e } , 500 4. The Playback Loop Gestures & Audio The /play endpoint is where the magic happens. It receives the JSON action script, triggers the audio playback assuming there’s an associated audio file asynchronously using sounddevice , and then steps through the keyframes, interpolating motor positions at a strict 30Hz update rate to prevent jitter. python @app.route '/play', methods= 'POST' def play macro : data = request.json keyframes = data.get 'keyframes', global sound = data.get 'globalSound', '' Handle Audio Playback Asynchronously if global sound: try: import sounddevice as sd import soundfile as sf import numpy as np import os sound path = f"assets/{global sound}" if os.path.exists sound path : audio data, fs = sf.read sound path, dtype='float32' if len audio data.shape == 1: audio data = audio data.reshape -1, 1 audio data = np.tile audio data, 1, 2 audio data = audio data 2.0 sd.play audio data, fs, device=SPEAKER INDEX else: print f"Sound file not found: {sound path}" except Exception as e: print f"Failed to play sound: {e}" Motor Sequencing Loop try: start positions = hwi.io.read present position motor ids current positions = {str mid : pos for mid, pos in zip motor ids, start positions } except: current positions = {str mid : 0 for mid in motor ids} try: hwi.io.enable torque motor ids except: pass for frame in keyframes: set lights frame.get 'lightsOn', False set projector frame.get 'projectorOn', False set antennas frame.get 'antennas', {} dur sec = frame.get 'durationMs', 1000 / 1000.0 steps = max 1, int dur sec 30 30Hz update rate interp = frame.get 'interpolation', 'linear' apply interpolation dampening interp target motors = frame.get 'motors', {} start frame pos = current positions.copy start time = time.time for step in range 1, steps + 1 : t = step / float steps eased t = bezier interpolate t, interp step targets = for mid in motor ids: s val = start frame pos.get str mid , current positions.get str mid , 0 end val = target motors.get str mid , target motors.get mid, s val val = s val + end val - s val eased t step targets.append float val current positions str mid = val try: hwi.io.write goal position motor ids, step targets except Exception as e: print f"HW error on step {step}: {e}" Strict timing to prevent jitter and stalls target time = start time + step dur sec / steps now = time.time if target time now: time.sleep target time - now pause sec = frame.get 'pauseMs', 0 / 1000.0 if pause sec 0: time.sleep pause sec Reset hardware to default state after animation set lights False set projector False set antennas {'left': 'center', 'right': 'center'} try: hwi.io.disable torque motor ids except: pass return jsonify {"success": True} if name == ' main ': app.run host='0.0.0.0', port=5000 Hardware Dampening vs. Software TimingYou might wonder: if we are already calculating Bezier curves in Python, why do we need to write to the motor's hardware registers? In robotics, the best results come from combining Software Timing and Hardware Dampening : Software Timing Duration & Easing : The Python loop computes precise, time-synchronized positions e.g., at 30Hz and sends them to the motors. This ensures that all joints like neck and head arrive at their targets at the exact same millisecond, keeping the motion coordinated. It also allows for complex easing curves like overshoot or anticipation that hardware cannot calculate on its own. Software Timing Duration & Easing : The Python loop computes precise, time-synchronized positions e.g., at 30Hz and sends them to the motors. This ensures that all joints like neck and head arrive at their targets at the exact same millisecond, keeping the motion coordinated. It also allows for complex easing curves like overshoot or anticipation that hardware cannot calculate on its own. Hardware Dampening Smoothing & Physics : The Feetech STS3215 motors have internal PID controllers and motion profiles. By writing to Profile Acceleration Address 108 and Profile Velocity Address 112 using the rustypot library, we tune how the hardware physically reacts. This acts as a physical low-pass filter, rounding off any micro-jitters caused by USB-to-Serial how the Raspberry Pi Zero connects to the motors communication latency and preventing the motors from buzzing or jerking. Hardware Dampening Smoothing & Physics : The Feetech STS3215 motors have internal PID controllers and motion profiles. By writing to Profile Acceleration Address 108 and Profile Velocity Address 112 using the rustypot library, we tune how the hardware physically reacts. This acts as a physical low-pass filter, rounding off any micro-jitters caused by USB-to-Serial how the Raspberry Pi Zero connects to the motors communication latency and preventing the motors from buzzing or jerking. For example, bezier viscous caps the hardware acceleration and velocity, forcing the motor to "drag" behind the software commands, creating a heavy, fluid, "underwater" feel. Conversely, bezier clamped allows aggressive tracking for sharp movements but maintains enough acceleration limit to protect the gears from grinding. With these different interpolators in play, you will need to be considerate of how long you specify the motors to take to reach their position, or provide enough of a post-keyframe wait period for things to catch up so that animations aren’t cut short during playback. To transform these static gestures into responsive behaviors, I integrated the Gemini Live API into a python script running on the robot. The Live API is particularly well-suited for human-robot interaction because it supports low-latency, low-overhead bidirectional streaming of audio, allowing the user to speak naturally to the machine. Instead of configuring the model to return voice responses, I structured the interaction so that the robot communicates entirely through physical actions. By utilizing system instructions and function calling, I instructed Gemini to act as a silent physical agent that translates the user's speech into gesture triggers.The model is configured with the following system instruction: "You are a physical robot. Do not speak. Respond ONLY using tools. Use trigger action with 'yes' to agree, 'no' to disagree, or 'beep1' to acknowledge neutrally." This setup ensures that when the user speaks, Gemini evaluates the input and selects the most appropriate gesture to trigger, reinforcing social norms, like nodding to show agreement, without relying on synthetic speech. In a more robust project, we could add dozens of animations for different emotions and moods to truly give the robot a specific personality. Let's look at how the Live API integration script is constructed. 1. Imports, Hardware, Audio, and Dampening SetupWe start by importing the necessary libraries including the google-genai SDK , configuring the audio parameters, and setting up the hardware dampening registers and helper functions. The script needs to handle both microphone input and speaker output, as well as managing physical motor dampening on the fly. python import asyncio import os import sys import json import time import pyaudio import numpy as np import soundfile as sf from google import genai from google.genai import typesfrom mini bdx runtime.duck config import DuckConfig from mini bdx runtime.rustypot position hwi import HWI from gpiozero import LED, AngularServoconfig hw = DuckConfig hwi = HWI config hw motor ids = 30, 31, 32, 33 MIC INDEX = 1 SPEAKER INDEX = 0led1, led2, projector = LED 23 , LED 24 , LED 25 servo left = AngularServo 12, min angle=-90, max angle=90 servo right = AngularServo 13, min angle=-90, max angle=90 FORMAT = pyaudio.paInt16 CHANNELS = 1 HW SAMPLE RATE = 48000 API SEND RATE = 16000 CHUNK SIZE = 1024 MIC FACTOR = 3pya = pyaudio.PyAudio audio queue mic = asyncio.Queue maxsize=5 audio queue output = asyncio.Queue Register addresses for DYNAMIXEL Protocol 2.0 Dampening used by Feetech STS3215 ADDR PROFILE ACCEL = 108ADDR PROFILE VELOC = 112def set hardware dampening accel, veloc : """Writes to motor registers to handle hardware-level smoothing.""" try: for mid in motor ids: if hasattr hwi.io, 'write data' : hwi.io.write data mid, ADDR PROFILE ACCEL, accel, 4 hwi.io.write data mid, ADDR PROFILE VELOC, veloc, 4 except Exception: passdef apply interpolation dampening interp type : """Sets different hardware snappiness based on the JSON interpolation.""" if interp type == 'linear': set hardware dampening 0, 400 elif interp type == 'bezier viscous': set hardware dampening 10, 200 elif interp type == 'bezier clamped': set hardware dampening 40, 500 else: standard bezier set hardware dampening 30, 400 def bezier interpolate t, type str : if type str == 'linear': return t if type str == 'bezier': return t t 3.0 - 2.0 t if type str == 'bezier viscous': return t t t t t 6.0 - 15.0 + 10.0 if type str == 'bezier clamped': return 1.0 - 1.0 - t 3 return t 2. Live API ConfigurationHere we initialize the Gemini client and define the LiveConnectConfig object. We pass our system instruction and define the trigger action tool, which tells the Gemini Live API what actions it is able to perform. api key = os.environ.get "GEMINI API KEY" client = genai.Client api key=api key, http options={'api version': 'v1alpha'} live config = types.LiveConnectConfig response modalities= "AUDIO" , system instruction=types.Content parts= types.Part text= "You are a physical robot. Do not speak. Respond ONLY using tools. " "Use trigger action with 'yes' to agree, 'no' to disagree, or 'beep1' to acknowledge neutrally." , tools= types.Tool function declarations= types.FunctionDeclaration name="trigger action", description="Triggers a robot animation and sound.", parameters=types.Schema type="OBJECT", properties={"action name": types.Schema type="STRING", description="Action name like 'yes' or 'beep1'" }, required= "action name" Note: You'll need to install the dependency via3. Audio Input Handling pip install -q -U google-genai and export your API key with export GEMINI API KEY="your key" . You can find more information on Gemini API keys here. These two asynchronous tasks handle capturing audio from the physical microphone and sending it to the active Gemini Live session. Since the API expects 16kHz audio, but my mic runs at 48kHz, I downsample the audio in listen audio by slicing the numpy array. As this is specific to my own hardware modification, your own project may need some modifications here. python async def listen audio : stream = await asyncio.to thread pya.open, format=FORMAT, channels=CHANNELS, rate=HW SAMPLE RATE, input=True, input device index=MIC INDEX, frames per buffer=CHUNK SIZE MIC FACTOR while True: data = await asyncio.to thread stream.read, CHUNK SIZE MIC FACTOR, exception on overflow=False audio array = np.frombuffer data, dtype=np.int16 resampled = audio array ::MIC FACTOR .tobytes await audio queue mic.put {"data": resampled, "mime type": "audio/pcm"} async def send realtime session : while True: msg = await audio queue mic.get await session.send realtime input audio=msg If you are running this on a Pi and get PyAudio errors, it is likely because MIC INDEX or SPEAKER INDEX doesn't match your hardware. For example, my USB microphone and speaker index swapped at some point when running two pieces of hardware, so I had to go back and update my code values. You can find your specific indexes by running a quick Python script on the Pi: Just run that, look for your USB Mic and Speaker names, and update the index numbers in the setup config If you are running this on a Pi and get PyAudio errors, it is likely because MIC INDEX or SPEAKER INDEX doesn't match your hardware. For example, my USB microphone and speaker index swapped at some point when running two pieces of hardware, so I had to go back and update my code values. You can find your specific indexes by running a quick Python script on the Pi: python import pyaudiop = pyaudio.PyAudio for i in range p.get device count : print f"Index {i}: {p.get device info by index i 'name' }" Just run that, look for your USB Mic and Speaker names, and update the index numbers in the setup config4. Receiving Responses and Dispatching Actions The receive and trigger task listens for responses from Gemini. When the model decides to trigger an action, it sends a tool call. The script intercepts this, spawns a background thread to play the animation JSON that was generated via the animation tool, and sends a success response back to Gemini. python async def receive and trigger session : while True: turn = session.receive async for response in turn: if response.tool call: for call in response.tool call.function calls: if call.name == "trigger action": action = call.args.get "action name" asyncio.create task asyncio.to thread play animation task, action await session.send tool response function responses= types.FunctionResponse id=call.id, name=call.name, response={"status": "ok"} sc = response.server content if sc and sc.input transcription: print f"You: {sc.input transcription.text}" 5. Playing Animations and Audio OutputWhen an action is triggered, play animation task loads the corresponding JSON file, handles the motor sequencing, and loads the audio file. Because the Live API connection is active, we use a shared speaker queue audio queue output and a unified play audio task to write the bytes to the speaker hardware. Because this is a longer bit of code, let’s break it down into its key components. A. Loading the Action PlanFirst, the function locates and loads the JSON file that defines the animation. If the file is missing, it exits early to prevent crashes. python def play animation task action name : """Parses JSON, triggers motors, and queues audio for the shared speaker task.""" try: json path = f"assets/{action name}.json" if not os.path.exists json path : print f"File not found: {json path}" return with open json path, 'r' as f: data = json.load f B. Audio Processing and QueuingIf the animation has an associated sound, we load the WAV file. Because our hardware might require a different sample rate than the file, we perform real-time resampling. We also apply digital normalization and a 3x gain boost so the duck can be heard over the motor noise before converting the audio to PCM bytes and pushing it to the output queue. global sound = data.get 'globalSound' if global sound: sound path = f"assets/{global sound}" if os.path.exists sound path : Read file as float32 for high-quality gain math audio data, samplerate = sf.read sound path, dtype='float32' if len audio data.shape 1: audio data = audio data :, 0 Resample to hardware 48kHz if samplerate = HW SAMPLE RATE: duration = len audio data / samplerate audio data = np.interp np.linspace 0, len audio data , int duration HW SAMPLE RATE , np.arange len audio data , audio data Digital Normalization + Volume Boost max val = np.max np.abs audio data if max val 0: audio data = audio data / max val audio data = audio data 3.0 Strong 3x Gain final pcm = audio data 32767 .clip -32768, 32767 .astype np.int16 audio queue output.put nowait final pcm.tobytes print f"ROBOT ACTION: {action name} Audio: {global sound} " C. Motor InitializationBefore we can move the motors, we need to enable torque and read their current positions so we know where to start the interpolation. keyframes = data.get 'keyframes', try: hwi.io.enable torque motor ids start pos = hwi.io.read present position motor ids current pos = {str mid : pos for mid, pos in zip motor ids, start pos } except: current pos = {str mid : 0 for mid in motor ids} D. The Sequencing LoopThis is the core loop. For each keyframe, we set the peripheral states lights, projector, antennas and calculate the intermediate motor positions using Bezier interpolation. The loop runs at a target rate, sleeping as necessary to maintain strict timing and prevent jitter. for frame in keyframes: set lights frame.get 'lightsOn', False set projector frame.get 'projectorOn', False set antennas frame.get 'antennas', {} interp = frame.get 'interpolation', 'linear' apply interpolation dampening interp dur = frame.get 'durationMs', 1000 / 1000.0 steps = max 1, int dur 30 target motors = frame.get 'motors', {} start frame pos = current pos.copy start t = time.time for step in range 1, steps + 1 : t = step / float steps eased t = bezier interpolate t, interp step targets = for mid in motor ids: s val = start frame pos.get str mid , current pos.get str mid , 0 e val = target motors.get str mid , s val val = s val + e val - s val eased t step targets.append float val current pos str mid = val try: hwi.io.write goal position motor ids, step targets except: pass target time = start t + step dur / steps now = time.time if target time now: time.sleep target time - now if frame.get 'pauseMs', 0 0: time.sleep frame 'pauseMs' / 1000.0 E. Cleanup and ResetOnce the animation completes, we turn off the lights and projector, center the antennas, and disable torque on the motors to prevent overheating. This is where you could change these defaults if your own application required it. Reset components after animation set lights False ; set projector False ; set antennas {'left': 'center', 'right': 'center'} try: hwi.io.disable torque motor ids except: pass except Exception as e: print f"Action Error: {e}" 6. The Unified Speaker Task and Main LoopFinally, the play audio task continuously monitors audio queue output and writes any outgoing sound bytes to the speaker hardware. The run function connects the session and orchestrates all the concurrent tasks. python async def play audio : """Unified speaker task for all robot sounds.""" stream = await asyncio.to thread pya.open, format=FORMAT, channels=CHANNELS, rate=HW SAMPLE RATE, output=True, output device index=SPEAKER INDEX while True: bytestream = await audio queue output.get await asyncio.to thread stream.write, bytestream async def run : try: async with client.aio.live.connect model="gemini-3.1-flash-live-preview", config=live config as live session: print "Robot Ready. Monitoring mic..." async with asyncio.TaskGroup as tg: tg.create task send realtime live session tg.create task listen audio tg.create task receive and trigger live session tg.create task play audio except Exception as e: print f"Error: {e}" finally: pya.terminate if name == " main ": try: asyncio.run run except KeyboardInterrupt: pass Future ExplorationsWhile the Open Duck Mini is a small-scale robot, the engineering challenges involved in this project, such as latency management, audio resampling, real-time synchronization, and hardware-software co-design, are ideal for learning about the challenges faced in large-scale robotics development. Using generative AI to develop the necessary middleware allowed me to build a complete HRI pipeline in a fraction of the time it would have taken traditionally. By combining flexible models like Gemini with custom physical gestures, we can start building robots that don't just execute commands, but participate in social spaces in a recognizable way. One thing I would like to do at some point is start modifying this robot to be more robust. I want to put neopixel lights in for the eyes to support various colors based on the robot’s ‘mood’, upgrade from the Raspberry Pi Zero 2w to a Raspberry Pi 5 or Jetson Orin Nano, potentially replace the motors with 12v motors including the necessary power upgrades . At that point though I may as well build a new BDX droid, https://www.youtube.com/live/WnNvsnqLglo?si=1TGR59Q3BpgAiyan which my wallet wouldn’t appreciate, but it’s so cool , so we’ll see : I'm looking forward to expanding this setup, and I hope this encourages other makers to explore the intersection of LLMs and physical embodiment. To this end, the next tools to explore would be the Gemini Robotics-ER model https://ai.google.dev/gemini-api/docs/robotics-overview for perception and task orchestration, as well as the MuJoCo https://mujoco.readthedocs.io/en/stable/overview.html simulation tool. Read more javascript:void 0