Services

Our AI Stack in Practice.

No marketing version. The real stack behind 120,000 users and 50+ AI models in production.

Jannik Frisch, Technical Advisor6 min read

Here's our stack, as of March 2026. No marketing version. The real one.

Why Provider Agnosticism Isn't Optional

Our stack starts with a foundational decision we made in late 2023 with SiemensGPT: No lock-in to a single AI provider.

The reasoning was architectural at the time. Today it's existential. In 18 months, the model landscape changed fundamentally three times. GPT-4 was dominant, then Claude came along and was better for certain tasks. Then open-source models reached enterprise-grade quality. Then multimodal models. Then reasoning models.

Anyone who locked into OpenAI in 2023 had a problem in 2024. Anyone who locked into Anthropic in 2024 misses the advantages of Gemini for certain workloads in 2026.

Our solution: LiteLLM as the abstraction layer. A unified interface for all providers. Claude, GPT, Gemini, LLaMA, Mistral. Model switching in configuration, not in code. New providers integrated in hours, not weeks.

This comes at a cost: We can't immediately use every provider-specific feature. But that cost is significantly lower than a rewrite when the market shifts.

The Layers of Our Stack

Layer 1: Foundation Models

We currently run models from five providers in production:

ProviderModelsUse Case
AnthropicClaude Opus, Sonnet, HaikuComplex reasoning tasks, code generation, analysis
OpenAIGPT-4o, o1Multimodal tasks, backward compatibility
GoogleGemini Pro, FlashLong contexts, multimodal processing
MetaLLaMA 3Self-hosted for data-sensitive workloads
AWSBedrock (all providers)Managed inference, enterprise compliance

Model selection isn't a matter of belief. It's an engineering decision. Claude is better for complex reasoning tasks. Gemini is better for very long documents. LLaMA is better when data must not leave the organization. We choose per task, not per project.

Layer 2: Agent Framework

Agents aren't chatbots with tools. Agents are software systems that independently execute tasks, make decisions, and interact with other systems.

Our agent framework is based on three principles:

Tool-based architecture. Every agent has a defined set of tools (API calls, database access, code execution). The tools define what the agent can do. The LLM decides when and how to use them.

Guardrails, not hope. Every agent has defined boundaries: maximum execution time, budget limits per invocation, whitelist for permitted actions. No agent can independently delete resources or send data to external services unless explicitly authorized.

Observability by default. Every agent invocation is logged: input, reasoning steps, tool calls, output. Not for debugging. For traceability. Enterprise clients need audit trails. We deliver them.

Frameworks we use: LangGraph for complex workflows, Google ADK for agent-to-agent communication, Microsoft Agent Framework for certain enterprise integrations. No single framework for everything, but the right tool for the right task.

Layer 3: Orchestration

A single agent solves a single problem. Productive enterprise systems need orchestration: agents that call other agents. Workflows that branch based on results. Fallback logic when an agent fails.

Our orchestration runs on AWS Step Functions. Every workflow is a state machine. Visually traceable, testable, versioned.

Example HR Data Hub: A user describes the desired data format in the self-service data shop. The orchestrator starts a Data Transform Agent that generates Python code. The code is tested in a sandbox. If the tests pass, it's deployed as a Lambda. Automatically. No human involved.

Layer 4: Infrastructure

Everything runs on AWS. Not on principle, but because our enterprise clients use AWS and have compliance requirements that don't become simpler with on-premise or multi-cloud.

ComponentServiceWhy
Agent runtimeFargate / LambdaServerless, auto-scales, pay-per-use
Workflow engineStep FunctionsState machines, visual debugging
Data storeS3, DynamoDBServerless, event-driven
MonitoringCloudWatch + custom agentsSelf-healing (agent analyzes logs)
SecurityIAM, KMS, VPCEnterprise standard, cross-account
CI/CDCDK, GitHub ActionsInfrastructure as Code

Layer 5: Self-Healing

The layer that holds everything together: agents that monitor other agents.

Our AWS Architect Agent monitors CloudWatch logs across all running services. Detects anomalies (latency spikes, error rates, unusual access patterns). Creates root cause analyses. Posts the results to Slack. Simple problems (timeout from cold start, temporary rate limit) it resolves independently.

This isn't a monitoring dashboard with alert rules. It's an AI agent that reads, understands, and acts on logs. The difference: Rules detect known problems. Agents detect unknown problems.

What This Means for Client Projects

We don't just build this stack for ourselves. We build it for and with our clients.

At SiemensGPT, the entire stack runs in the Siemens AWS environment. Provider-agnostic, self-healing, 50 models, 10,000+ agents. Siemens engineers work with this stack daily. It belongs to them.

At the Siemens Energy AI Platform, we implemented the stack as a self-service platform. Teams create their own agents, choose models, define tools. Without FNTIO involvement. Forward-deployed engineering means: We show how it works, then the teams do it themselves.

That's the difference between a stack you buy and a stack you understand.


Jannik Frisch designed the provider-agnostic AI stack as Technical Advisor, running 50+ models and thousands of agents in enterprise production.

View profile
Discuss your project
Case StudySiemensGPTThis stack in action. 120,000 users, 50 models.Perspective50 AI modelsWhat we learned from 50 models in production.Tech Deep DiveAWS Well-ArchitectedThe infrastructure under the AI stack.