Google Launches Gemini 2.5 Computer Use Model for Developers

AI Agents Gain Ability to Control User Interfaces

Google has unveiled the Gemini 2.5 Computer Use model, a new AI system designed to interact directly with graphical user interfaces (UIs) through clicks, typing, and scrolling. Built on the foundation of Gemini 2.5 Pro’s advanced visual reasoning, this model enables developers to create agents capable of performing digital tasks across websites and applications — such as filling forms or navigating dashboards — without relying solely on APIs. The model is now available via the Gemini API in Google AI Studio and Vertex AI.

According to Google, Gemini 2.5 Computer Use outperforms competing models on web and mobile automation benchmarks, including Online-Mind2Web, WebVoyager, and AndroidWorld, all while maintaining significantly lower latency. The company says these improvements mark a major step toward building general-purpose AI agents that can interact with computers much like humans do.

How the Model Works

Developers can access the model through the new computer_use tool within the Gemini API. The system operates in a continuous loop: the user provides a request, a screenshot of the current UI, and a history of recent actions. Gemini then analyzes these inputs and outputs an action — for instance, clicking a button or entering text. After execution, the environment sends a new screenshot back to the model, repeating the cycle until the task is completed or stopped by a safety mechanism.

The tool supports both standard and custom UI actions, allowing agents to handle interactive elements such as dropdown menus, login fields, and form submissions. While currently optimized for web browsers, Gemini 2.5 Computer Use has also demonstrated promising results in mobile UI control but is not yet tuned for desktop operating systems.

Performance and Benchmarks

Gemini 2.5 Computer Use delivers top-tier performance in browser automation. On Browserbase’s Online-Mind2Web benchmark, it achieved over 70% accuracy with approximately 225 seconds of latency — outperforming leading alternatives in both speed and precision. Google’s internal tests also confirmed superior results in mobile automation tasks, solidifying its position as one of the most efficient models in its category.

Early demos show the model executing complex, multi-step instructions, such as gathering data from websites and updating CRM records or organizing virtual sticky notes on collaborative platforms. These examples highlight the model’s ability to parse complex web layouts, interpret dynamic content, and complete workflows autonomously.

Safety and Responsible Deployment

Recognizing the potential risks of allowing AI agents to control computers, Google integrated multiple safety layers directly into the Gemini 2.5 Computer Use model. These include protection against malicious prompts, unauthorized actions, and attempts to bypass security mechanisms such as CAPTCHAs. The model also employs an external per-step safety service that evaluates each proposed action before execution.

Developers can customize safety policies through system instructions, specifying which actions require explicit user approval. Google recommends additional best practices for those integrating the tool into production environments, emphasizing that all implementations should undergo thorough safety testing prior to deployment.

Early Adoption and Use Cases

Google’s internal teams have already deployed the model for UI testing and workflow automation, notably within the Firebase Testing Agent and AI Mode in Search. Early external adopters, including Poke.com and Autotab, report major improvements in task reliability and execution speed — with some noting up to 50% faster results than other automation models.

One Google payments team found the model capable of “rehabilitating” over 60% of previously failed automated UI tests by autonomously identifying and correcting issues. These early successes demonstrate the potential of AI-driven UI control to accelerate testing, data processing, and digital workflow management across industries.

Availability

The Gemini 2.5 Computer Use model is now in public preview and can be accessed via Google AI Studio or Vertex AI. Developers can experiment in Browserbase’s demo environment or build their own automation loops using Playwright or a cloud virtual machine. Documentation and setup guides are available for both individual developers and enterprise users looking to integrate the system into larger production pipelines.

What's Hot

Poll: Americans Losing Confidence in Jobs and Economy Under Trump

DOJ Questions Comey’s Lawyer Over Possible Conflict in Classified Memo Case

Reports Claim Target and Walmart Are Removing Xbox Games

Google Launches Gemini 2.5 Computer Use Model for Developers

Reports Claim Target and Walmart Are Removing Xbox Games

Honor Unveils Plans for AI-Powered ‘Robot Phone’

Apple Shifts From Vision Pro to AI Smart Glasses

Amazon Unveils New Kindle Scribe Lineup with Color Display

Google launches AI mood board tool Mixboard

Apple’s iPhone Air sparks debate on minimalism

Latest Posts

Poll: Americans Losing Confidence in Jobs and Economy Under Trump

DOJ Questions Comey’s Lawyer Over Possible Conflict in Classified Memo Case

Reports Claim Target and Walmart Are Removing Xbox Games

What's Hot

Google Launches Gemini 2.5 Computer Use Model for Developers

AI Agents Gain Ability to Control User Interfaces

How the Model Works

Performance and Benchmarks

Safety and Responsible Deployment

Early Adoption and Use Cases

Availability

Keep Reading

Subscribe to Updates