Google Gemini 3.5 Flash Integrates ‘Computer Use’ as Built-in Feature, Accelerating AI Agent Competition

Photo of author

By Global Team

Google has integrated its “computer use” feature, which lets AI recognize screens and click, type, and scroll like a human, into its flagship model, Gemini 3.5 Flash. AI is moving beyond merely answering questions and into actually carrying out tasks.

Google announced on the 24th (local time) that it had added “computer use” as a built-in tool to its AI model Gemini 3.5 Flash. The AI can recognize screens and perform actions such as clicking, typing text, and scrolling on its own. It works not only in web browsers but also in smartphone and PC environments.

The feature itself is not new. Google first introduced the same capability as a separate model last October. What has changed is that it can now be used directly within the main model without having to be called up separately.

Developers no longer need to move between two models. Google DeepMind product manager Matteo Chiross said the integration allows Flash to see the screen, make judgments, and act accordingly.

◆ From chatbot to working AI

The core of this change is the AI agent. An agent refers to AI that receives human instructions and autonomously handles multiple steps of a task. It is a different concept from a chatbot that merely answers questions.

The use cases Google is emphasizing are repetitive-work automation. AI can test software and find errors without a person having to tap each screen manually. It can also be assigned to gather information across multiple websites, fill out forms, or extract data from internal systems.

Industry observers say AI has now moved from simply responding to actually replacing labor. This is seen as one part of the shift in the AI sector’s center of gravity from chatbots to “working assistants.”

◆ Google puts safety front and center

Google’s main focus in explaining the feature was not performance but safety. When AI begins directly manipulating screens in the real world, new risks arise. The most representative one is a “prompt injection” attack.

In simple terms, this is a trap command. If malicious instructions are secretly embedded in a web page or document, the AI at work may mistake them for real directions and behave in unintended ways. Security researchers have repeatedly demonstrated that AI agents can be manipulated in this manner.

Google said it conducted separate adversarial training to prepare for such attacks. It also introduced two enterprise safety measures. One requires human confirmation before irreversible actions such as submitting forms, making payments, or deleting data. The other automatically stops a task when a trap command is detected.

Neither feature is enabled by default. Developers must turn them on themselves for them to work. The company recommends layered defenses rather than relying on a single safeguard. Google itself noted in its documentation that no safety measure is sufficient on its own. Analysts say this stands in contrast to the more confident tone usually used when promoting other AI features.

◆ The competition has shifted toward safety

The company that opened up this market first was Anthropic. Its “Claude computer use” can handle not only web browsers but also PC operating systems and files. Google also added an automated search function this year to its enterprise version of Chrome, allowing it to perform multi-step tasks on its own. OpenAI has also entered the same market.

The competitive axis among the three companies is diverging. The issue is no longer who can click buttons better. The key question in tightly regulated enterprise environments has become who can operate more safely, analysts say.

The challenges left are also clear. Today’s AI handles familiar screens well, but it still struggles with unexpected pop-ups, CAPTCHAs, or unfamiliar screen layouts. Google’s decision to include the feature as a built-in capability rather than as a separate model reads as confidence that the technology has matured. At the same time, leaving the safety features as user options suggests the company still believes it is too early to hand over control without human supervision.

Google did not provide specific figures showing how much more accurate this feature is than previous models. It also did not disclose which companies are using it. If businesses are considering adoption, experts say it is just as important to design a structure with human intervention as it is to evaluate performance metrics. The door to the era of working AI has opened, but how to pass through it safely remains up to each organization.

Google’s Gemini 3.5 Flash has integrated a ‘computer use’ feature that can recognize screens and perform clicking, typing, and scrolling.
Google’s Gemini 3.5 Flash has integrated a ‘computer use’ feature that can recognize screens and perform clicking, typing, and scrolling.