Our AI Stack in Practice.
No marketing version. The real stack behind 120,000 users and 50+ AI models in production.
Here's our stack, as of March 2026. No marketing version. The real one.
Why Provider Agnosticism Isn't Optional
Our stack starts with a foundational decision we made in late 2023 with SiemensGPT: No lock-in to a single AI provider.
The reasoning was architectural at the time. Today it's existential. In 18 months, the model landscape changed fundamentally three times. GPT-4 was dominant, then Claude came along and was better for certain tasks. Then open-source models reached enterprise-grade quality. Then multimodal models. Then reasoning models.
Anyone who locked into OpenAI in 2023 had a problem in 2024. Anyone who locked into Anthropic in 2024 misses the advantages of Gemini for certain workloads in 2026.
Our solution: LiteLLM as the abstraction layer. A unified interface for all providers. Claude, GPT, Gemini, LLaMA, Mistral. Model switching in configuration, not in code. New providers integrated in hours, not weeks.
This comes at a cost: We can't immediately use every provider-specific feature. But that cost is significantly lower than a rewrite when the market shifts.
The Layers of Our Stack
Layer 1: Foundation Models
We currently run models from five providers in production:
| Provider | Models | Use Case |
|---|---|---|
| Anthropic | Claude Opus, Sonnet, Haiku | Complex reasoning tasks, code generation, analysis |
| OpenAI | GPT-4o, o1 | Multimodal tasks, backward compatibility |
| Gemini Pro, Flash | Long contexts, multimodal processing | |
| Meta | LLaMA 3 | Self-hosted for data-sensitive workloads |
| AWS | Bedrock (all providers) | Managed inference, enterprise compliance |
Model selection isn't a matter of belief. It's an engineering decision. Claude is better for complex reasoning tasks. Gemini is better for very long documents. LLaMA is better when data must not leave the organization. We choose per task, not per project.
Layer 2: Agent Framework
Agents aren't chatbots with tools. Agents are software systems that independently execute tasks, make decisions, and interact with other systems.
Our agent framework is based on three principles:
Tool-based architecture. Every agent has a defined set of tools (API calls, database access, code execution). The tools define what the agent can do. The LLM decides when and how to use them.
Guardrails, not hope. Every agent has defined boundaries: maximum execution time, budget limits per invocation, whitelist for permitted actions. No agent can independently delete resources or send data to external services unless explicitly authorized.
Observability by default. Every agent invocation is logged: input, reasoning steps, tool calls, output. Not for debugging. For traceability. Enterprise clients need audit trails. We deliver them.
Frameworks we use: LangGraph for complex workflows, Google ADK for agent-to-agent communication, Microsoft Agent Framework for certain enterprise integrations. No single framework for everything, but the right tool for the right task.
Layer 3: Orchestration
A single agent solves a single problem. Productive enterprise systems need orchestration: agents that call other agents. Workflows that branch based on results. Fallback logic when an agent fails.
Our orchestration runs on AWS Step Functions. Every workflow is a state machine. Visually traceable, testable, versioned.
Example HR Data Hub: A user describes the desired data format in the self-service data shop. The orchestrator starts a Data Transform Agent that generates Python code. The code is tested in a sandbox. If the tests pass, it's deployed as a Lambda. Automatically. No human involved.
Layer 4: Infrastructure
Everything runs on AWS. Not on principle, but because our enterprise clients use AWS and have compliance requirements that don't become simpler with on-premise or multi-cloud.
| Component | Service | Why |
|---|---|---|
| Agent runtime | Fargate / Lambda | Serverless, auto-scales, pay-per-use |
| Workflow engine | Step Functions | State machines, visual debugging |
| Data store | S3, DynamoDB | Serverless, event-driven |
| Monitoring | CloudWatch + custom agents | Self-healing (agent analyzes logs) |
| Security | IAM, KMS, VPC | Enterprise standard, cross-account |
| CI/CD | CDK, GitHub Actions | Infrastructure as Code |
Layer 5: Self-Healing
The layer that holds everything together: agents that monitor other agents.
Our AWS Architect Agent monitors CloudWatch logs across all running services. Detects anomalies (latency spikes, error rates, unusual access patterns). Creates root cause analyses. Posts the results to Slack. Simple problems (timeout from cold start, temporary rate limit) it resolves independently.
This isn't a monitoring dashboard with alert rules. It's an AI agent that reads, understands, and acts on logs. The difference: Rules detect known problems. Agents detect unknown problems.
What This Means for Client Projects
We don't just build this stack for ourselves. We build it for and with our clients.
At SiemensGPT, the entire stack runs in the Siemens AWS environment. Provider-agnostic, self-healing, 50 models, 10,000+ agents. Siemens engineers work with this stack daily. It belongs to them.
At the Siemens Energy AI Platform, we implemented the stack as a self-service platform. Teams create their own agents, choose models, define tools. Without FNTIO involvement. Forward-deployed engineering means: We show how it works, then the teams do it themselves.
That's the difference between a stack you buy and a stack you understand.
Jannik Frisch designed the provider-agnostic AI stack as Technical Advisor, running 50+ models and thousands of agents in enterprise production.
View profile →