Our AI Stack in Practice

Here's our stack, as of March 2026. No marketing version. The real one.

Why Provider Agnosticism Isn't Optional

Our stack starts with a foundational decision we made in late 2023 with SiemensGPT: No lock-in to a single AI provider.

The reasoning was architectural at the time. Today it's existential. In 18 months, the model landscape changed fundamentally three times. GPT-4 was dominant, then Claude came along and was better for certain tasks. Then open-source models reached enterprise-grade quality. Then multimodal models. Then reasoning models.

Anyone who locked into OpenAI in 2023 had a problem in 2024. Anyone who locked into Anthropic in 2024 misses the advantages of Gemini for certain workloads in 2026.

Our solution: LiteLLM as the abstraction layer. A unified interface for all providers. Claude, GPT, Gemini, LLaMA, Mistral. Model switching in configuration, not in code. New providers integrated in hours, not weeks.

This comes at a cost: We can't immediately use every provider-specific feature. But that cost is significantly lower than a rewrite when the market shifts.

The Layers of Our Stack

Layer 1: Foundation Models

We currently run models from five providers in production:

Provider	Models	Use Case
Anthropic	Claude Opus, Sonnet, Haiku	Complex reasoning tasks, code generation, analysis
OpenAI	GPT-4o, o1	Multimodal tasks, backward compatibility
Google	Gemini Pro, Flash	Long contexts, multimodal processing
Meta	LLaMA 3	Self-hosted for data-sensitive workloads
AWS	Bedrock (all providers)	Managed inference, enterprise compliance

Model selection isn't a matter of belief. It's an engineering decision. Claude is better for complex reasoning tasks. Gemini is better for very long documents. LLaMA is better when data must not leave the organization. We choose per task, not per project.

Layer 2: Agent Framework

Agents aren't chatbots with tools. Agents are software systems that independently execute tasks, make decisions, and interact with other systems.

Our agent framework is based on three principles:

Tool-based architecture. Every agent has a defined set of tools (API calls, database access, code execution). The tools define what the agent can do. The LLM decides when and how to use them.

Guardrails, not hope. Every agent has defined boundaries: maximum execution time, budget limits per invocation, whitelist for permitted actions. No agent can independently delete resources or send data to external services unless explicitly authorized.

Observability by default. Every agent invocation is logged: input, reasoning steps, tool calls, output. Not for debugging. For traceability. Enterprise clients need audit trails. We deliver them.

Frameworks we use: LangGraph for complex workflows, Google ADK for agent-to-agent communication, Microsoft Agent Framework for certain enterprise integrations. No single framework for everything, but the right tool for the right task.

Layer 3: Orchestration

A single agent solves a single problem. Productive enterprise systems need orchestration: agents that call other agents. Workflows that branch based on results. Fallback logic when an agent fails.

Our orchestration runs on AWS Step Functions. Every workflow is a state machine. Visually traceable, testable, versioned.

Example HR Data Hub: A user describes the desired data format in the self-service data shop. The orchestrator starts a Data Transform Agent that generates Python code. The code is tested in a sandbox. If the tests pass, it's deployed as a Lambda. Automatically. No human involved.

Layer 4: Infrastructure

Everything runs on AWS. Not on principle, but because our enterprise clients use AWS and have compliance requirements that don't become simpler with on-premise or multi-cloud.

Component	Service	Why
Agent runtime	Fargate / Lambda	Serverless, auto-scales, pay-per-use
Workflow engine	Step Functions	State machines, visual debugging
Data store	S3, DynamoDB	Serverless, event-driven
Monitoring	CloudWatch + custom agents	Self-healing (agent analyzes logs)
Security	IAM, KMS, VPC	Enterprise standard, cross-account
CI/CD	CDK, GitHub Actions	Infrastructure as Code

Layer 5: Self-Healing

The layer that holds everything together: agents that monitor other agents.

Our AWS Architect Agent monitors CloudWatch logs across all running services. Detects anomalies (latency spikes, error rates, unusual access patterns). Creates root cause analyses. Posts the results to Slack. Simple problems (timeout from cold start, temporary rate limit) it resolves independently.

This isn't a monitoring dashboard with alert rules. It's an AI agent that reads, understands, and acts on logs. The difference: Rules detect known problems. Agents detect unknown problems.

What This Means for Client Projects

We don't just build this stack for ourselves. We build it for and with our clients.

At SiemensGPT, the entire stack runs in the Siemens AWS environment. Provider-agnostic, self-healing, 50 models, 10,000+ agents. Siemens engineers work with this stack daily. It belongs to them.

At the Siemens Energy AI Platform, we implemented the stack as a self-service platform. Teams create their own agents, choose models, define tools. Without FNTIO involvement. Forward-deployed engineering means: We show how it works, then the teams do it themselves.

That's the difference between a stack you buy and a stack you understand.

This stack in action. 120,000 users, 50 models.
What we learned from 50 models in production.
How we build the infrastructure under the AI stack.

Our AI Stack in Practice.