Pillar page

AI Software Development

A proven process for automating software development with a team of AI agents — from requirements analysis, through coding and multi-layer testing (unit, integration, E2E, security, performance), all the way to production deployment with a complete audit trail.

We operate a team of specialized AI agents that participates in every stage of the software development lifecycle — from requirements analysis, through architecture design, coding and multi-layer testing, all the way to code review, documentation and deployment with a complete audit trail.

This is how we build our own ESKOM AI products — the HybridCrew multi-agent platform, the Compliance audit system, the KRS+CRBR microservice and a portfolio of integrations. We apply the same process in client projects: both for greenfield microservices and for legacy system modernization.

This article describes how it works in practice: which tasks the agents take over, which remain with humans, what tests we run, and why this process is repeatable across project types.

Why automate software development?

A classic software development cycle (analysis → code → tests → review → deploy) typically takes 2-4 weeks for a medium-sized feature in a mature team. Most of that time goes to repetitive tasks: writing boilerplate, generating unit tests, reviewing changes, updating documentation, generating database migrations. All of them are automation-friendly.

The goal of our process is simple: two or three people working with AI agents deliver the value of an 8-10 person team — without burnout, with higher quality (more tests, better code review, complete documentation) and shorter time-to-market.

This is not „AI will replace developers". It is „developers with AI will replace developers without AI". Experienced engineers remain essential — they design architecture, make strategic decisions, review complex changes. AI agents take over the routine.

The six-stage process

The pipeline from requirements to production. Each stage is executed by specialized AI agents, while humans supervise and approve key decisions.

1

Requirements analysis and architecture

AI agents analyze business documentation, customer conversations (from transcripts), and existing code. They propose a microservice architecture, database schema, endpoint list, and permission model. A human (CTO/architect) reviews and approves the proposal before coding starts.

2

Writing code (TDD)

Tests first, then implementation. A backend agent writes APIs in FastAPI/Express, a frontend agent writes React components. Every change is a separate pull request with a clean commit message. Coding standards (Black, ESLint, Prettier) are enforced automatically.

3

Multi-layer testing

Unit (pytest, Jest), integration (testcontainers with real PostgreSQL), E2E (Playwright), UI snapshot, security (OWASP, gitleaks, bandit), performance (k6/locust), accessibility (axe). Every PR runs the full pipeline — a failing test blocks the merge.

4

AI code review

A SecurityReviewer agent scans for OWASP Top 10 issues, a QualityReviewer agent checks readability and patterns, an ArchitectureReviewer agent verifies consistency with the rest of the system. Edge cases are escalated to humans.

5

Documentation and CHANGELOG

Every change in logic = version bump + entry in CHANGELOG.md in Keep a Changelog format. API documentation (OpenAPI/Swagger) is generated automatically. CLAUDE.md is updated after every session with new lessons learned.

6

Deployment with Change Request

Deployment always goes through Git (NEVER direct scp). First the test environment with Playwright verification, then production after CR approval. The deploy script includes a rollback plan (<5 min) and health checks.

What does the company gain?

Thousands of automated tests

Every production project has from several thousand up to tens of thousands of tests — unit, integration, E2E, security, performance. Regressions are caught in CI before they reach users.

Complete audit trail

Every change in code, database, or configuration is recorded: Git, audit log in the database, CHANGELOG, Change Request. Meets ISO 27001, EU AI Act and GDPR requirements.

Team scalability

Two or three people with AI agents deliver the value of an 8-10 person team. Without burnout, with higher quality and shorter timelines.

Escalation to stronger models

LLM routing picks the right model for each task: minor changes — local Ollama (zero cost), complex architecture — Claude Opus. Cost and quality optimization in one.

Repeatability and standards

Every project follows the same standards: feature branch workflow, squash merge, Conventional Commits, CHANGELOG, EU AI Act, GDPR. A new developer understands the structure on day one.

Security by default

Gitleaks on pre-commit + CI, secrets in HashiCorp Vault, private repositories, Keycloak SSO, Tailscale VPN for internal services. No trade-offs against speed.

Multi-layer testing — the foundation of quality

Every change in production code passes through a complete test pipeline. No exceptions — even fixing a typo in a comment triggers CI, because the test pipeline is enforced by a Git hook, not by a developer's political decision.

  • Unit tests: pytest, Jest, vitest. Cover individual functions and classes. >80% coverage on critical code.
  • Integration tests: testcontainers with real instances of PostgreSQL, Redis, Vault. Mocks only for third-party external APIs.
  • End-to-end (E2E) tests: Playwright in Firefox (default), Chrome (optional). Simulate full user paths: login → action → verification.
  • UI tests (snapshot, accessibility): Playwright + axe-core. WCAG 2.0 AA as the baseline, Lighthouse 100/100/100/100 as the target.
  • Security tests: OWASP Top 10 (semgrep, bandit, eslint-plugin-security), gitleaks (secret scanning on pre-commit and CI), trivy (Docker image scanning).
  • Performance tests: k6 or locust for load tests, checking p95/p99 response times under stress.
  • Regression tests: the full suite runs before every production deploy. Every reported bug becomes a regression test.
  • Smoke tests: a minimal set of 5-10 tests executed after the production deploy (did the application actually come up).
  • Acceptance tests: business tests (Cucumber/Gherkin) confirming the requirement has been met.

A failing test = blocked merge. No exceptions. If a test is „flaky" (unstable), a diagnostic agent analyzes the root cause and fixes the test or the code, but never removes the test without a human decision.

Typical use cases

The patterns we apply most often. Each comes with its own set of agents, tools and templates. Time-to-value measured in weeks, not months.

Legacy system modernization

  • Old monolithic application (PHP/.NET, no tests, hard to maintain)
  • Agents decompose the monolith into microservices (incremental, no downtime)
  • Generate characterization tests (capturing current behavior) before refactoring
  • Data migration with a full audit trail and rollback plan

New enterprise microservice

  • Specification on input (Jira ticket, PRD, meeting transcript)
  • Architecture → code → tests → review → deploy in 2-3 weeks
  • Integration with existing SSO (Keycloak), audit log, monitoring
  • Full EU AI Act and GDPR compliance from day one

System integration

  • Connecting ERP, CRM, KRS, Microsoft Graph, IBM, Cisco, external partners
  • Agents write adapters, mappings, retry/backoff, idempotency
  • Integration tests on real endpoints (sandbox APIs)
  • Monitoring (Prometheus + Grafana) and alerts (Sentry) wired in automatically

Multi-tenant platforms

  • Multi-client SaaS with full data isolation (per-tenant schema or row-level security)
  • Automated client onboarding (Keycloak provisioning, database, roles)
  • Billing based on SSO Billing SDK (token usage tracking, fail-open)
  • Compliance: GDPR, ISO 27001, EU AI Act audit-ready

Comparison: classic team vs. AI-driven process

AspectClassic team (8-10 people)Team with AI agents (2-3 people)
Time-to-market (average feature)2-4 weeks3-7 days
Test coverage40-60% (when the team has time)>80% by default (tests generated alongside code)
Code review1 person, average 30-60 min3 agents (security, quality, architecture) + human for complex changes
DocumentationOften incomplete, „added later"Generated alongside code (OpenAPI, README, CHANGELOG)
Audit trailGit historyGit + audit log in database + CHANGELOG + Change Request
ScalingLinear (more people = higher communication cost)Non-linear (more agents = same number of supervising people)
Compliance (EU AI Act, GDPR, ISO 27001)Often external audit after the factBuilt into the process from day one

Frequently asked questions

What is automated AI software development?
It is a process where specialized AI agents participate in every stage of the software development lifecycle: from requirements analysis, through architecture design, coding, automated tests (unit, integration, E2E, security, performance, regression), to code review and production deployment. Humans still supervise the process and make key decisions, but routine tasks (writing code, generating tests, refactoring, documentation) are executed by AI agents while preserving the agreed-upon quality standards.
How is this different from classic programming with Copilot?
Copilot is autocomplete — it helps write individual lines of code. AI software development is full orchestration: one agent plans the architecture, another writes the code, a third writes the tests, a fourth does code review, a fifth deploys. Each has its own specialization, episodic memory (it learns from prior projects), tools and context. The result: a much larger scale of automation than with a single Copilot, while keeping enterprise standards (tests, security, audit trail).
What types of tests does this process run?
Every kind of test that mature development teams use: unit, integration, end-to-end (E2E), UI (Playwright), security (OWASP Top 10, gitleaks), performance (load), regression, smoke and acceptance. Tests are written before or alongside the code (TDD), and every change must pass the full pipeline.
Does AI deploy code to production on its own?
No — not automatically. Production deployments require an approved Change Request (CR) and a human decision. AI agents prepare change documentation, run regression tests, generate deploy scripts with rollback plans, but the final production rollout requires operator approval. This rule is deliberate — it minimizes the risk of unexpected outcomes and preserves a complete audit trail.
Does this process work for enterprise projects?
Yes. We use it on our own products, including the HybridCrew multi-agent platform, the consulting platform with SSO, PostgreSQL-backed microservices, integrations with external systems (KRS, MS Graph, IBM, Keycloak). Every project has its own CI/CD pipeline, dev/test/prod environments, monitoring and audit log. The process scales from a single microservice to a multi-container platform.
How long does it take to roll this process out in our company?
It depends on the context. For a small team (1-3 developers), integration with the existing repository and CI/CD pipeline typically takes 2-4 weeks: audit, agent configuration, alignment with coding standards, training. For larger organizations, pilot projects (one team, one microservice) take 6-8 weeks, followed by gradual expansion to additional teams.
What about source code security?
Client repositories never leave for external services without explicit consent. By default, the entire process (AI agents, LLM models, vector database, audit log) runs in the client's infrastructure or in the ESKOM AI private cloud with full isolation. Secrets are managed via HashiCorp Vault, code is scanned by gitleaks before every commit, and all repositories are private by default.
Will you replace our development team?
No. Experienced developers are essential — they design architecture, make decisions, review complex changes, solve unusual problems. AI agents take over repetitive, automation-friendly tasks: writing boilerplate, generating tests, documentation, refactoring, first-pass code review. The goal: two or three people with AI deliver the value of an 8-10 person team — without burnout, with higher quality and a full audit trail.
How much does AI software development cost?
Pricing is always project-specific and depends on scale, billing model (platform subscription vs. dedicated project), required integrations, and whether the agents run on local LLM models (Ollama on the client's GPU — lower operating cost) or in the cloud (Anthropic, OpenAI — higher flexibility). In pilots, we aim to achieve return on investment within the first quarter after full launch.
What are the typical signals that a company is ready for this process?
The best results come from teams that already have: a version-controlled repository (Git), defined coding standards, a basic CI/CD pipeline, clearly documented requirements (Jira/Linear/your own), and a code review culture. Missing one of these does not block the rollout — we start with an audit and foundational work. The least mature organizations are those without version control or with production code that nobody tests.

Ready for a pilot?

We start with an audit of the existing process and a pilot on a selected microservice. First results visible within 2-4 weeks. No long-term contracts required.