Pillar page

Multi-Agent AI Systems

A team of specialized AI agents instead of a single general chatbot. Orchestration, multi-tier LLM routing, episodic memory, cost control and a complete audit trail. Internally we use the HybridCrew platform to deliver services to clients.

A single ChatGPT-style chatbot is a general-purpose tool. It understands language, generates text, answers questions — but the moment a task requires a sequence of actions, access to company databases, memory of previous interactions, or quality verification, its limits show.

A multi-agent AI system is a different architecture: a team of specialized agents, each with its own role, tools, memory, and operating strategy. The CEO assistant classifies email. The financial controller generates reports. The security reviewer scans code. The content writer drafts marketing copy. Everything is coordinated by an orchestrator that decides who gets which task.

Why multi-agent systems win

Specialization in AI works the same way as in business. Instead of one person who „knows a bit of everything", a team of specialists delivers better results. An agent focused on one task type — with optimized prompts, the right LLM model, access to the right tools — does the job better and cheaper than a generalist model trying to guess the context from scratch.

Second advantage: cost control. Most tasks do not require the most powerful LLM model. Simple classifications, generating templated content, extracting data from structured documents — all of that can be done by local, free models running on the client's GPU. Only the most complex decisions go to the strongest cloud models. Typical operating cost: a fraction of what uniform use of the most powerful models would cost.

Third: compliance and security. Every agent has least-privilege permissions. Every interaction is logged (audit trail). Personal data is anonymized before being sent to external models (Anoxy microservice). The whole architecture is designed in line with GDPR and the EU AI Act from line one of the code.

Components of an enterprise-grade multi-agent system

Nine elements that must work together for a multi-agent system to be production-ready inside a company.

Specialized agents

Every agent has one responsibility: CEO assistant, financial controller, security reviewer, backend developer, content writer. Specialization produces better outcomes than a single general chatbot.

Orchestrator

The central layer that decides which agent gets which task. Based on intent classification, agent availability, LLM cost, and business context.

Multi-tier LLM routing

Small tasks → local model (Ollama, $0 cost). Medium → cheaper cloud model. Complex → most powerful cloud models. Drastic cost reduction without quality loss.

Episodic memory

Agents remember what they did before, what the outcomes were, what worked. Over time they get better at repetitive tasks — they learn from every interaction.

Semantic memory

Vector database of domain knowledge (Qdrant, pgvector). Agents can quickly find similar past cases, reference documents, company policies.

PII anonymization (Anoxy)

Before content reaches external LLMs, the dedicated Anoxy microservice scans and anonymizes personal data. GDPR compliance with no functional trade-offs.

Audit trail

Every interaction between agents is recorded: who, to whom, what was asked, what answer was given, which LLMs were used, what the cost was. Full observability.

Monitoring and cost control

Limits per agent, per user, per organization. Real-time cost dashboard. Alerts on unusual usage spikes. Routing optimization based on data.

Human escalation

Low confidence score, critical financial or legal decision, edge case → automatic escalation to a human operator with full context.

Applications inside a company

Six areas where multi-agent AI systems deliver measurable business value. Each is rolled out as a 4-8 week pilot.

CEO assistant

Classifies and answers emails, books meetings, prepares briefs before calls, summarizes long documents, monitors deadlines. Typically saves the CEO 10-15 hours of admin per week.

Compliance and legal monitoring

Continuous monitoring of legal changes, classification of impact on the company, alerts on new obligations. Generating initial GDPR, EU AI Act, ISO 27001 reports. Drafts of policies and procedures.

Software development

Code review, test generation, documentation writing, refactoring, database migration generation. Two or three people with agents deliver the value of an 8-10 person team.

Customer service

Ticket classification, automatic answers to repeatable questions (based on the knowledge base), escalation to humans for complex cases. First-response time cut from hours to minutes.

Document analysis

Extracting data from contracts, invoices, quotes. Comparing commercial terms. Detecting inconsistencies and risks. Generating summaries and reports for the legal team.

Sales and marketing

Social media and brand mention monitoring, sentiment classification, generating responses (reviewed by humans before publishing), drafting marketing content.

Chatbot vs. multi-agent system

AspectSingle chatbot (ChatGPT/Copilot)Multi-agent system
SpecializationGeneral model, „knows a bit of everything"Specialized agents per domain
Access to company dataLimited (copy-paste into the chat window)Native (integration with CRM, ERP, databases)
MemoryChat session (typically 1-2 hours)Episodic + semantic memory (persistent)
Cost routingOne model for all tasksMulti-tier (local → cloud → premium)
Action executionGenerates text, does not perform actionsCalls APIs, writes to databases, sends emails
Audit trailNone (or rudimentary)Complete — every interaction recorded
PII anonymizationDepends on the userEnforced, automatic (Anoxy)
Compliance (GDPR, EU AI Act)Hard to proveBuilt into the architecture

Reference platform: HybridCrew

HybridCrew is an internal ESKOM AI platform that we use to deliver services to clients. It orchestrates dozens of specialized AI agents — each with its own role (e.g. organization assistant, financial controller, project manager, backend developer, security reviewer), a Polish-language interface, access to tools, and integrations with business systems.

Key technical features:

  • Multi-tier LLM routing — from free local models (Ollama) to the most powerful cloud models. Model selection is automatic, based on task complexity.
  • Wide integrations — Gmail, Slack, Jira, Confluence, Microsoft Graph, Salesforce, Airtable, and many more. We can connect any client API.
  • Email Intelligence — automatic classification of CEO email, intent recognition, generating answers for approval.
  • Anoxy — PII anonymization — a dedicated microservice that anonymizes personal data before it is sent to external models. GDPR compliance with no compromises.
  • Episodic and semantic memory — agents learn from experience and can reach into domain knowledge in the vector database.
  • Cost monitoring — real-time cost dashboard per agent, per user, per organization. Limits and alerts on unusual spikes.
  • EU AI Act compliance — the system is classified as limited-risk AI, with the full transparency obligations of Art. 50: an AI banner, marking of generated content, export metadata.

Frequently asked questions

What is a multi-agent system?
A multi-agent AI system is an architecture where a few or several dozen specialized AI agents work together to solve tasks. Each agent has its own role (e.g. CEO assistant, financial controller, security reviewer, backend developer), its own tools (APIs, database access, the internet), memory (episodic — what it did before, semantic — domain knowledge), and operating strategy. Instead of a single general chatbot, the company gets an AI team with a clear division of responsibilities.
How is this different from a single chatbot like ChatGPT?
A single chatbot handles simple text tasks well, but the moment a task requires access to company databases, integration with business systems (CRM, ERP, email), executing a sequence of steps, memory of previous interactions, or quality verification — the chatbot is no longer enough. A multi-agent system solves this with specialization (the finance agent knows accounting, the legal agent knows GDPR), collaboration (agents can consult each other), and orchestration (a mechanism that decides which agent gets which task).
What tasks can be delegated to a multi-agent system?
In practice: managing the CEO's calendar and inbox, classifying and answering customer emails, monitoring legal changes, preparing financial reports, code review of pull requests, generating documentation, automating employee onboarding, handling support tickets, document analysis (contracts, invoices, quotes), social media and brand mention monitoring, generating marketing content. The more repeatable and procedural — the better suited it is for automation.
Are multi-agent systems expensive to operate?
It depends on the cost architecture. If every agent uses the most powerful LLM for every task, monthly cost ramps up quickly. That is why we apply multi-tier LLM routing: small tasks go to local models (Ollama on the client's GPU — operating cost close to zero), medium tasks go to cheaper cloud models, only the most complex decisions go to the most powerful models. Thanks to that, a typical client pays a fraction of what uniform use of the most powerful models would cost.
How do agents communicate with each other?
Two main paths: synchronous (agent A asks agent B a question and waits for the answer) and asynchronous (agent A pushes a task to a queue, agent B processes it at its own pace, agent A gets notified of the result). The central orchestration platform manages routing, preserves conversation history (audit trail), and controls cost (token limits per agent, per user). All communication is logged — every interaction between agents can be replayed and the path to a specific decision can be inspected.
What about data security in a multi-agent system?
Three layers of protection. First: PII anonymization (personal data, account numbers, tax IDs, addresses) before sending to external LLM models — we use the dedicated Anoxy microservice that scans content before it leaves. Second: agent isolation — every agent has least-privilege permissions and cannot see data outside its domain. Third: the option to run on the client's infrastructure — LLM models can run locally (Ollama on GPU), with no data leaving the client's network. GDPR-compliant and aligned with EU AI Act guidance.
Can agents make mistakes? What then?
Yes — every LLM can hallucinate, make logical errors, or misinterpret context. Mitigation strategies: 1) result validation (e.g. the finance agent must return numbers in a specific format, a validator checks compliance); 2) double-checking for critical decisions (a second agent independently verifies the first one's result); 3) human escalation (on low confidence score or unusual cases); 4) audit trail (every decision recorded — can be undone, analyzed, prompt improved). Critical financial and legal decisions are never autonomous — they require human approval.
What does a multi-agent rollout in a company look like?
Typically four phases. 1) Discovery (2-4 weeks): identifying processes for automation, ROI assessment for each, picking 2-3 pilot candidates. 2) Pilot (4-8 weeks): deploying the first agents for selected processes, measuring impact, fine-tuning. 3) Scaling (3-6 months): expanding to more processes and departments, integration with existing systems. 4) Optimization (continuous): refining agents based on production data, adding new roles, reducing LLM model cost.
Will a multi-agent system replace employees?
It replaces specific tasks, not people. The most common outcome: employees reclaim time (typically 30-50% in administrative departments), which they can spend on tasks that require human judgment, creativity, relationship-building. Companies do not lay off — on the contrary, they more often grow faster (more projects handled by the same team). The exception: repetitive, low-value tasks (e.g. manually copying data between systems) — those disappear, and nobody misses them.
What technologies power multi-agent systems?
Most common frameworks: Microsoft AutoGen, CrewAI, LangGraph, Haystack Agents. LLM models: Anthropic Claude, OpenAI GPT, local Llama and Mistral, Polish Bielik. Vector databases for semantic memory: Qdrant, Weaviate, pgvector. Message queues for async: Redis, RabbitMQ, Kafka. Monitoring: Prometheus + Grafana, Sentry, OpenTelemetry. At ESKOM AI we combine all of this into a single internal platform (HybridCrew) with full observability, cost control and compliance.

First pilot in 4-8 weeks

We pick 2-3 business processes with the highest ROI potential and roll out pilot agents. We measure impact, fine-tune, and decide on scaling.