Internal Agent Platform Case Study | TWSS Custom Agents

Context

A mid-market enterprise operations team had a shopping-list problem: a dozen candidate AI workflows across customer operations, vendor management, internal reporting, and engineering ops. Building each one as a standalone project would have consumed 18 months of engineering time and produced a dozen fragmented stacks, each with its own security posture, its own observability, and its own one-vendor LLM dependency.

The client's engineering leadership wanted the opposite: a single platform every team could build agents on, with governance and audit baked in, and the flexibility to swap the underlying LLM as the vendor landscape evolved.

Challenge

The specific requirements that drove architecture:

Sandboxed runtime. Every agent had to run in an isolated environment with constrained tool access. No agent could touch anything not explicitly granted in its definition.
Approval gates for destructive actions. Sending emails, moving money, or writing to production systems required documented approval from a designated reviewer.
Multi-LLM with no application rewrite. Swapping OpenAI for Claude, or either for a local Llama, should not require touching agent code.
Full audit log per run. Every agent invocation — the input, the plan, each tool call and result, the final output, who approved what — captured and retrievable.

Approach

Thoughtwave deployed TWSS AI Custom Agents — our production agent platform — as the foundation. The architecture maps cleanly onto the four requirements:

Agent SDK with sandboxed runtime. Agents are defined as code (Python) with a declared goal, tool list, and approval-gate configuration. The runtime executes each agent in a Docker sandbox with the declared tools and nothing else.
Tool and data connector library. Pre-built connectors for Slack, web APIs, internal databases, file shares, and major LLMs. Client-specific tools are added via the SDK.
Approval workflows. Destructive actions pause execution and route an approval prompt to Slack or the platform web UI. The agent resumes only after explicit approval, with the approver's identity and reason captured in the trace.
Multi-LLM router. Each agent specifies its preferred model plus a fallback list. The router selects per-call based on availability, cost policy, or task-specific quality criteria.

The engagement arc:

Platform setup (3 weeks). Stood up the platform on the client's infrastructure, integrated Slack for approvals, connected the initial tool catalog.
First agent (2 weeks). Built an operations triage agent that classifies incoming ops requests, routes to the right team, and drafts a response — as the proof that the platform works end-to-end.
Agent factory (ongoing). Trained the client's engineering teams on the SDK; the platform now ships an average of one new agent every 1-2 weeks, each passing the same governance bar as the first.

What we built

The production platform has five components:

Python agent SDK. Declarative agent definitions with typed tool interfaces and approval-gate configuration.
Sandboxed Docker runtime. Per-run isolation; agents cannot escape declared tools or reach resources not in the sandbox.
MCP tool protocol layer. Standard protocol for tool definitions, making tools reusable across agents and portable between this platform and other MCP-compatible frameworks.
Multi-LLM router. Cloud (OpenAI, Claude, Gemini) + local (Ollama) with per-agent model preference.
Slack and web entry points. Agents can be triggered from Slack commands, from the web UI, from scheduled jobs, or from webhook events.

Outcomes

Build agents in days, not months. The first agent took two weeks; agents 2 through 10 have shipped in an average of 3-5 days each because the platform components carry over.
Governed, auditable enterprise deployment. Every run captured. Every destructive action gated. Every approval logged.
Multi-LLM portability. When the client's AI policy shifted mid-deployment, a model swap was a configuration change, not an engineering rewrite.
Zero vendor lock-in. The platform runs on client infrastructure; the agent definitions are client-owned code; the model layer is switchable.

What's next

The next phase extends the platform with automated evaluation: every production agent run feeds a regression suite, and model or prompt changes are tested against the full history before deployment. The client is also standing up a cross-team agent registry so other business units can adopt proven agent patterns without rebuilding.

For the broader portfolio of Thoughtwave production AI solutions that run on this and related platforms, see our accelerators portfolio.

Why a platform beats per-workflow tooling at scale

Most enterprises we engage with can name five candidate AI workflows. Some can name fifteen. Very few can name just one. The question is not whether to adopt AI agents — it is whether to adopt them one at a time (each as its own project, with its own security review, its own observability setup, its own vendor relationship) or on a platform that ships the first agent in weeks and every subsequent agent in days. The math favors the platform once the agent count reaches three or four, and it dominates once the agent count reaches ten.

The harder truth is that governance cannot be bolted on after the fact. Agents that read internal data and call external systems have an audit and security posture that has to exist from the first deployment. Retrofitting a sandbox, an approval gate, and a trace log onto ten in-production agents is an order of magnitude more work than building them into the platform once. The clients that wait on governance often have to halt their agent programs after the first security incident — not because the incident was catastrophic, but because there was no framework in place to decide how serious it was.

Internal agent platform for an enterprise operations team

Context

Challenge

Approach

What we built

Outcomes

What's next

Why a platform beats per-workflow tooling at scale

Frequently asked questions

Related Services

Industries

Case Study

Next Step

Context

Challenge

Approach

What we built

Outcomes

What's next

Why a platform beats per-workflow tooling at scale

Frequently asked questions

Related resources

Related Services

Industries

Case Study

Next Step