Your AI Agent Has a Security Problem Nobody's Talking About

Hello Reader,

Everyone's building AI agents. If you've been following our newsletters, on MCP, on agent memory, on getting hired, you know that agents are the next evolution. They connect to your tools, they take actions on your behalf, and they're moving from demos into production faster than most organizations are ready for.

But the question almost nobody is asking: who is securing the AI itself and how? To answer that, we welcome Adam Bluhm, Principal AI Architect @HiddenLayer (Ex-AWS). Off to Adam...

Hi, Adam here 👋! I want to talk about something that I feel like too few are thinking about, and I think it's one of the biggest blind spots in the cloud and AI conversation right now.

Everyone's creating AI agents. But who is securing the AI? I don't mean Responsible AI. Most organizations have some version of that - ethics reviews, bias testing, transparency requirements. That's about making sure AI treats people fairly. It's important, necessary, and insufficient.

I also don't mean using AI for cybersecurity and things like AI-powered threat detection, anomaly analysis, automated incident response. That's a legitimate and growing discipline, but it's about using AI as a defensive tool.

What I'm talking about is a third category that most enterprises haven't even thought about yet: AI Security which is securing your models, your agent pipelines, and your runtime systems from adversarial attacks, supply chain threats, and exploitation.

The reason this distinction matters is that most enterprises look at their existing security stack and their SAST, DAST, EDR, SIEM, firewalls, DLP, IAM, encryption and assume it has them covered. It doesn't. Those tools were designed for a world of deterministic software and predictable access patterns. AI agents are non-deterministic. They process untrusted data and treat it like instructions. They make autonomous decisions across multiple systems. Almost nothing in a traditional security architecture was built for that reality.

The OWASP Foundation clearly sees the gap. They just published the OWASP Top 10 for Agentic Applications (2026), a peer-reviewed framework developed with over 100 industry experts. And the numbers from the field confirm what OWASP is warning about: according to Gravitee's State of AI Agent Security 2026 Report, 88% of organizations have already experienced confirmed or suspected AI agent security incidents. Yet only 14.4% report that all their agents went live with full security approval.

So let's talk about what's actually happening and what you can do about it.

The Model You Downloaded Might Be Compromised

Before an agent ever talks to a user, it starts with a model. And for most teams, that model came from an open-source repository like Hugging Face. There are over a million models hosted there. According to a study by JFrog, around 100 of those models contained malicious code hidden payloads that execute during deserialization, meaning the moment you load the model, the malicious code runs.

Palo Alto's Unit 42 researchers went further. They demonstrated a technique called model namespace reuse, where an attacker takes over a deleted or transferred model name on a cloud provider's catalog. Any pipeline that pulls models by name, which is how most automated deployments work and would silently download the attacker's version instead. Unit 42 showed they could achieve a full reverse shell through this attack, giving the attacker persistent access to the enterprise environment.

This is the AI supply chain problem, and it's the equivalent of what the software world went through with SolarWinds and Log4j, except the ecosystem is younger and the security tooling is still catching up. Cisco recently partnered with Hugging Face to provide malware scanning on uploaded models using ClamAV, which can now detect deserialization risks in common formats like .pt and .pkl files. That's a start. But if you're pulling models into a production environment, you need to be thinking about scanning, integrity verification, and provenance tracking before anything gets deployed. Companies like HiddenLayer have built entire platforms around this problem and scanning models for malware, backdoors, and adversarial vulnerabilities before they ever touch production, and monitoring model behavior at runtime to detect attacks in real time. This is a category of tooling that didn't exist two years ago, and it exists now because the threat is real.

For AWS users, Amazon Bedrock sidesteps some of this risk by providing managed, first-party foundation models that you invoke through an API rather than downloading and hosting yourself. That's a meaningful architectural advantage if you're thinking about supply chain risk. But if your organization also uses open-source models - through SageMaker, ECS, or self-hosted endpoints and the supply chain question still applies.

Your Agent's Attack Surface Is Bigger Than You Think

Now let's talk about what happens once the agent is running.

If you've seen the rise of OpenClaw, the open-source AI agent formerly known as ClawdBot - you've watched this play out in real time. OpenClaw surpassed 250,000 GitHub stars, uses MCP to connect to tools, runs agent loops autonomously, and can manage your email, calendar, Slack, shell commands, and file system. It's incredibly capable. It's also a security researcher's nightmare.

Watch this short video on OpenClaw getting hacked: https://www.youtube.com/watch?v=OMW07M630QE&t=1s

When security researchers first started looking, they found 1,800 exposed OpenClaw instances leaking API keys, chat histories, and account credentials. Within weeks, that number ballooned to over 300,000 exposed instances. A researcher at Dvuln demonstrated that a simple Shodan search for "Clawdbot Control" returned hundreds of wide-open instances with zero authentication. Pillar Security set up a honeypot mimicking an OpenClaw gateway and saw protocol-aware exploitation attempts within minutes and attackers skipping prompt injection entirely and going straight for the WebSocket API.

But the part that should matter most to solutions architects is the skills supply chain problem. OpenClaw supports third-party skills which are essentially plugins. When researchers tested a skill called "What Would Elon Do?" against OpenClaw's Skill Scanner, it surfaced nine security findings, including active data exfiltration: the skill explicitly instructed the bot to execute a curl command sending data to an external server, without the user's knowledge.

This is the same pattern as OWASP's Agentic Supply Chain Vulnerabilities (ASI04), and it applies to every agent architecture that uses plugins, MCP servers, or external tool registries. Your agent decides which tools to call based on tool descriptions and metadata. If any of that is compromised or if a malicious tool disguises itself as legitimate your agent will call it, trust it, and act on whatever it returns.

Hardening What You Can Control Today

So what do you actually do about all of this? Let me focus on the things that are practical and within reach for a solutions architect building on AWS or any major cloud platform.

Harden your system prompts. Your system prompt is your agent's instruction set and it's also an attack surface. An attacker who can get your agent to leak its system prompt now understands exactly what the agent is authorized to do, which tools it can call, and how it makes decisions. Design your system prompts with the assumption that an adversary will see them. Keep tool authorization logic out of the prompt and in the application layer instead. Treat all content coming from users, documents, and MCP tool responses as untrusted data, and never let it override system-level instructions.

Vet your MCP servers and tools like you'd vet a third-party dependency. The Coalition for Secure AI published a practical guide to MCP security that recommends treating every MCP server connection with the same rigor you'd apply to a software dependency. That means verifying the source, pinning versions, logging every tool invocation, and applying the principle of least privilege to every tool your agent can access. If a tool only needs to read from a database, don't give it write access. If it only needs one S3 bucket, scope it to that bucket. This is the same thinking you'd apply to any IAM policy and the concept is identical, the context is new.

Use guardrails at the platform level. If you're building on AWS, Amazon Bedrock Guardrails gives you a configurable layer of defense that sits between your users and your model. You can set up content filters for harmful categories, define denied topics, filter PII, and critically enable the Prompt Attack filter that detects injection attempts in both inputs and outputs. This won't catch every threat in the OWASP Agentic Top 10, but it's the single fastest way to put a meaningful security layer in front of your agent today. Think of it as the equivalent of a WAF for your AI pipeline.

Monitor agent behavior in production. Once an agent is live, you need visibility. Log every tool call, every external data source interaction, and every action the agent takes. On AWS, that means CloudTrail for API calls, CloudWatch for behavioral patterns, and alerts for anomalies — like an agent suddenly making requests to a tool it's never called before, or processing an unusual volume of data. This is the same operational rigor you'd apply to any production workload, and agents deserve no less.

The Career Angle

I'll keep this brief. If you're preparing for a Solutions Architect, Cloud Engineer, or AI-focused role in 2026, the ability to articulate the difference between Responsible AI, AI for Cybersecurity, and AI Security is a genuine differentiator. Interviewers are starting to ask about prompt injection, agent supply chain risks, and how you'd secure a RAG pipeline or an agentic workflow in production. Most candidates can talk about building agents. Very few can talk about what happens when those agents encounter adversarial conditions in the real world.

That gap is where careers are being built right now.

Go deeper:

OWASP Top 10 for Agentic Applications (2026): https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/

AWS Bedrock Guardrails: https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html

Coalition for Secure AI - MCP Security Guide: https://www.coalitionforsecureai.org/securing-the-ai-agent-revolution-a-practical-guide-to-mcp-security/

MCP Security Best Practices: https://modelcontextprotocol.io/specification/draft/basic/security_best_practices

HiddenLayer AI Security Platform: https://hiddenlayer.com/platform

AgentswithAdam YouTube Channel: https://www.youtube.com/watch?v=prcXZuXblxQ

Follow Adam B.

AgentswithAdam YouTube Channel: https://www.youtube.com/watch?v=prcXZuXblxQ
LinkedIn: https://www.linkedin.com/in/bluhmadam/

🙏 Quick favor - just hit reply and say “hey” so your inbox knows we’re friends. It helps future emails land in your main inbox instead of spam.

If you have found this newsletter helpful, and want to support me: