This Open-Source Phone AI Agent Sees, Hears and Acts—All Without Touching the Cloud

Summary

Oppo’s AI team has developed X-OmniClaw, an open-source framework that turns Android phones into hands-free, context-aware AI assistants able to perform real tasks directly on the device. Unlike most mobile AI agents that operate in the cloud on virtual phones, X-OmniClaw runs locally, enabling it to access real cameras, photos, and files, and offering greater privacy and context. Its architecture relies on three core components: Omni Perception (integrating camera, screen, and voice input for scene understanding), Omni Memory (building long-term, structured semantic memory from user data to maintain context across tasks), and Omni Action (executing tasks using on-device interface analysis and behavior cloning). Cloud-based language models are used only for complex reasoning. Demonstrated abilities include identifying products through the camera and searching online, assisting with math exercises, and assembling highlight videos from user photos. X-OmniClaw adapts pioneering agent frameworks (like the desktop-based OpenClaw and Hermes Agent) for continuous, multimodal operation on smartphones. The source code is available on GitHub and will be continually updated.