PILA Trains on Screen Input to Play PolyTrack

Developer tryfonaskam released PILA, an open-source imitation learning agent that learns to play the browser racing game PolyTrack by training on screen captures and human keyboard inputs. Implemented in PyTorch, the pipeline records state-action pairs, trains a supervised neural network, and runs real-time inference to issue keyboard commands from live frames. The project serves as an educational baseline for end-to-end perception-to-action agents.

End-to-end imitation learning from raw screen pixels is a low-friction prototyping path for perception-to-action agents: no simulator instrumentation, no reward engineering, just human demonstrations mapped to actions. Developer tryfonaskam demonstrated this concretely with PILA PolyTrack Imitation Learning AI , an open-source agent that learns to drive the browser racing game PolyTrack by observing screen captures and recorded human keyboard inputs. Implemented in PyTorch Python 3.11 , the pipeline records player controls alongside corresponding game frames, trains a supervised neural network on those state-action pairs, then runs real-time inference to issue keyboard commands from live frames. Released under Apache 2.0 on GitHub. Reported by Hackaday on June 28, 2026. For practitioners, PILA is a useful educational baseline that surfaces the practical engineering work of synchronizing frame capture with labeled actions and replaying inputs - details papers routinely omit.