
As AI agents evolve from assistants into action takers, “computer use” automation is becoming one of the most powerful and risky capabilities in enterprise AI. When an agent can operate an OS and UI, it inherits the same threat landscape humans face, plus new failure modes unique to autonomous systems. Phishing through UI deception, destructive clicks, and silent data exfiltration are no longer edge cases.
To unpack what secure deployment looks like, we spoke with Andrew Persh, an ex-McKinsey London leader of digital and AI transformations who launched a GenAI accelerator in EMEA and now builds product-grade AI systems focused on measurable business impact. Andrew shares a practical reference architecture for securing OS and UI-capable agents, from sandboxing and least privilege to auditability and post-launch operations.
What is fundamentally different about the threat model when an agent can click, type, and navigate UIs instead of only generating text?
The key change is that the model stops being a source of information and becomes an operator that changes real systems. With text only, a human still decides and acts, so the main risks are wrong advice, hallucinations, or manipulation that leads a person to do something unsafe. With UI control, the agent makes a chain of decisions under ambiguity, like which account to pick, which dialog to confirm, or what a warning really means. That is where over-reliance becomes dangerous, because early results look impressive and teams may delegate more while paying less attention to each step. Then a single mistake can have immediate impact, like deleting data, sending a payment incorrectly, changing permissions, or leaking personal data.
What are the most realistic hostile input scenarios you expect to see in 2026 for computer use agents in enterprises?
The most realistic attacks will hide inside normal enterprise inputs such as emails, tickets, chat messages, documents, and internal pages that the agent reads before acting. A hostile actor can embed instructions that look like process guidance but actually steer the agent toward risky actions, data retrieval, or permission changes. UI deception will also be common, where the agent is led to a page that resembles a legitimate login, approval, or settings screen and it proceeds as if it were trusted. Another frequent scenario is destructive ambiguity, where the attacker only needs the agent to take the wrong branch in a workflow, confirm a destructive dialog, export the wrong dataset, attach the wrong file, or share something with the wrong audience. The core issue is that hostile input can shape both what the agent believes and what it clicks, and the consequences are silent until damage is done.
How should teams design sandboxing for these agents, and what must be isolated by default at the OS, browser, and network layers?
Teams should treat sandboxing as an enforced infrastructure property, not a set of instructions the model is expected to remember. At the OS level, agents should run in disposable, tightly locked environments with no access to host credentials and no persistent state unless explicitly needed. At the browser level, use a dedicated profile with no saved passwords, strict storage controls, limited downloads, and policies that reduce the chance of unsafe extensions or unexpected persistence. At the network level, default to allowlisted egress and restrict internal reachability through controlled gateways with strong identity and logging. This also connects to the earlier strategic decision of whether to use third-party models or internal models, because data residency, storage, and provider guarantees can dictate how you structure isolation and where inference can safely happen.
What does least privilege look like in practice for an agent that needs broad capability, and how do you prevent privilege creep over time?
Least privilege means the agent starts with minimal access for routine reading and navigation and only gets elevated rights for a specific action, in a specific system, for a limited time window, tied to a clear user request. Broad capability does not require broad standing permissions if you design permissions as temporary and scoped. To prevent privilege creep, every new connector, scope, or workflow should have an owner, a justification, and an expiry, and unused privileges should be removed automatically. The agent should also be constrained by policy at the platform level, so it cannot expand its own reach just because it can click into new areas of an application.
Which controls are non negotiable for sensitive actions like payments, permission changes, and customer facing communications, including approvals and step up authentication?
For high impact actions, the system must assume the agent is untrusted and require human consent in a form the agent cannot bypass. Payments and permission changes should always require explicit approval plus step-up authentication for the approver. Approvals should be separated from the agent’s operating surface, ideally in a trusted channel or privileged control that the agent cannot manipulate through UI automation. On top of that, enforce hard policy limits such as recipient allowlists, amount thresholds, restricted admin operations, and cooling-off rules for certain changes. For customer-facing communications, require approval for new templates or tone changes and automated checks to reduce leakage and wrong-recipient risks, because a single message can become a brand and privacy incident.
What should be logged to achieve true auditability and replayability, while still protecting sensitive data and user privacy?
Ideally every step and sub-step should be logged, because this technology is still evolving and model behavior changes across versions. You want an action trace that includes what the agent saw, what tools or functions it invoked, the inputs and outputs to those tools, timing, and the decision points that explain why it chose one path. This makes it possible to investigate, reproduce runs in a sandbox, and understand boundaries when you switch models for cost or performance reasons. At the same time, logs must not become a privacy risk, so sensitive data should be masked, tokenized, or stored in a separate controlled store with strict access. The goal is to answer what happened, who approved it, and what policy allowed it, without turning audit trails into another leak surface.
How do you operationalize agent security after launch, including red teaming, regression testing against UI changes, monitoring signals, and kill switch criteria?
Operationally, agents need ongoing security work because UIs change, workflows change, and models change. Red teaming should be continuous and focused on realistic hostile inputs like injected instructions in tickets, spoofed dialogs, and tricky confirmation flows. Regression tests must cover UI drift, because small layout changes can cause misclicks that would never appear in a normal integration test. Monitoring should focus on behavior patterns, such as unusual navigation, repeated retries, unexpected data access volume, and spikes in policy rejections or approval requests. A kill switch must be real and fast, with clear criteria like attempts at forbidden actions, signals of UI deception, abnormal access patterns, or suspected exfiltration paths. When triggered, autonomy should drop to a safe mode where the agent proposes actions for a human rather than executing them.
Featured Image: Freepik































