Google folded computer use directly into Gemini 3.5 Flash on June 24, 2026. Until now, getting an AI agent to actually see a screen and click, type, or scroll its way through it meant calling a separate, standalone model. Not anymore. It’s just another tool Flash can reach for now, sitting right next to things like search and function calling.
What’s New
- Computer use used to need its own standalone Gemini 2.5 model. Now it’s built into Gemini 3.5 Flash itself, and that’s the recommended path going forward. The old standalone model still exists for legacy use, but developers don’t need a second model to get computer use anymore
- Agents can see, reason, and act across browsers, mobile apps, and desktop environments. Not just one platform anymore
- Google built this for long, complicated tasks. Think continuous software testing, multi-step browser work, or pulling data out of a dashboard
- Two safety features ship with the tool, and both are optional. One asks for your explicit confirmation before the agent does anything sensitive or hard to undo, like submitting a form or deleting something. The other stops the agent automatically if it spots a prompt injection attempt
- Google trained the model specifically against prompt injection. That’s the attack where instructions hidden inside a webpage trick an agent into doing something nobody asked it to do
- You can use it today through the Gemini API or the Gemini Enterprise Agent Platform. There’s also a live demo from Browserbase if you want to try it before building anything
Why It Matters
Here’s the part worth sitting with. A capability that needed its own dedicated model less than a year ago now lives inside a general-purpose one instead. That usually means something. It means the capability stopped being experimental and started being expected, the kind of thing developers assume is just there instead of something they go out of their way to bolt on.
The safety angle says something too. Both new safeguards are optional, and Google recommends combining them with sandboxing, human checks, and strict access controls rather than relying on either alone. That’s a “defense-in-depth” approach, not a single switch that solves the problem. It tells you Google still sees “let an AI click around inside real software” as something that needs layered defenses, not a problem that’s been fully solved.
Should You Care?
If you’re not building AI agents yourself, this won’t touch your day-to-day tools right away. Computer use lives at the developer and enterprise level for now, not inside a regular chat app. But it’s a useful read on where things are headed. Less “ask a question, get an answer.” More “let an agent run the software itself.”
Worth knowing too: this is still early. Agents using computer use can still trip over things like unexpected pop-ups, CAPTCHAs, or interfaces they’ve never seen before, and Google’s own documentation steers developers away from having an agent solve CAPTCHAs and toward keeping a human in the loop for anything high-risk. Bringing the capability into a general-purpose model is a sign of real confidence. It’s not a sign the tech is finished.
Source: Google: Introducing computer use in Gemini 3.5 Flash
