Pillar page
AI Software Development
A proven process for automating software development with a team of AI agents — from requirements analysis, through coding and multi-layer testing (unit, integration, E2E, security, performance), all the way to production deployment with a complete audit trail.
We operate a team of specialized AI agents that participates in every stage of the software development lifecycle — from requirements analysis, through architecture design, coding and multi-layer testing, all the way to code review, documentation and deployment with a complete audit trail.
This is how we build our own ESKOM AI products — the HybridCrew multi-agent platform, the Compliance audit system, the KRS+CRBR microservice and a portfolio of integrations. We apply the same process in client projects: both for greenfield microservices and for legacy system modernization.
This article describes how it works in practice: which tasks the agents take over, which remain with humans, what tests we run, and why this process is repeatable across project types.
Why automate software development?
A classic software development cycle (analysis → code → tests → review → deploy) typically takes 2-4 weeks for a medium-sized feature in a mature team. Most of that time goes to repetitive tasks: writing boilerplate, generating unit tests, reviewing changes, updating documentation, generating database migrations. All of them are automation-friendly.
The goal of our process is simple: two or three people working with AI agents deliver the value of an 8-10 person team — without burnout, with higher quality (more tests, better code review, complete documentation) and shorter time-to-market.
This is not „AI will replace developers". It is „developers with AI will replace developers without AI". Experienced engineers remain essential — they design architecture, make strategic decisions, review complex changes. AI agents take over the routine.
The six-stage process
The pipeline from requirements to production. Each stage is executed by specialized AI agents, while humans supervise and approve key decisions.
Requirements analysis and architecture
AI agents analyze business documentation, customer conversations (from transcripts), and existing code. They propose a microservice architecture, database schema, endpoint list, and permission model. A human (CTO/architect) reviews and approves the proposal before coding starts.
Writing code (TDD)
Tests first, then implementation. A backend agent writes APIs in FastAPI/Express, a frontend agent writes React components. Every change is a separate pull request with a clean commit message. Coding standards (Black, ESLint, Prettier) are enforced automatically.
Multi-layer testing
Unit (pytest, Jest), integration (testcontainers with real PostgreSQL), E2E (Playwright), UI snapshot, security (OWASP, gitleaks, bandit), performance (k6/locust), accessibility (axe). Every PR runs the full pipeline — a failing test blocks the merge.
AI code review
A SecurityReviewer agent scans for OWASP Top 10 issues, a QualityReviewer agent checks readability and patterns, an ArchitectureReviewer agent verifies consistency with the rest of the system. Edge cases are escalated to humans.
Documentation and CHANGELOG
Every change in logic = version bump + entry in CHANGELOG.md in Keep a Changelog format. API documentation (OpenAPI/Swagger) is generated automatically. CLAUDE.md is updated after every session with new lessons learned.
Deployment with Change Request
Deployment always goes through Git (NEVER direct scp). First the test environment with Playwright verification, then production after CR approval. The deploy script includes a rollback plan (<5 min) and health checks.
What does the company gain?
Thousands of automated tests
Every production project has from several thousand up to tens of thousands of tests — unit, integration, E2E, security, performance. Regressions are caught in CI before they reach users.
Complete audit trail
Every change in code, database, or configuration is recorded: Git, audit log in the database, CHANGELOG, Change Request. Meets ISO 27001, EU AI Act and GDPR requirements.
Team scalability
Two or three people with AI agents deliver the value of an 8-10 person team. Without burnout, with higher quality and shorter timelines.
Escalation to stronger models
LLM routing picks the right model for each task: minor changes — local Ollama (zero cost), complex architecture — Claude Opus. Cost and quality optimization in one.
Repeatability and standards
Every project follows the same standards: feature branch workflow, squash merge, Conventional Commits, CHANGELOG, EU AI Act, GDPR. A new developer understands the structure on day one.
Security by default
Gitleaks on pre-commit + CI, secrets in HashiCorp Vault, private repositories, Keycloak SSO, Tailscale VPN for internal services. No trade-offs against speed.
Multi-layer testing — the foundation of quality
Every change in production code passes through a complete test pipeline. No exceptions — even fixing a typo in a comment triggers CI, because the test pipeline is enforced by a Git hook, not by a developer's political decision.
- Unit tests: pytest, Jest, vitest. Cover individual functions and classes. >80% coverage on critical code.
- Integration tests: testcontainers with real instances of PostgreSQL, Redis, Vault. Mocks only for third-party external APIs.
- End-to-end (E2E) tests: Playwright in Firefox (default), Chrome (optional). Simulate full user paths: login → action → verification.
- UI tests (snapshot, accessibility): Playwright + axe-core. WCAG 2.0 AA as the baseline, Lighthouse 100/100/100/100 as the target.
- Security tests: OWASP Top 10 (semgrep, bandit, eslint-plugin-security), gitleaks (secret scanning on pre-commit and CI), trivy (Docker image scanning).
- Performance tests: k6 or locust for load tests, checking p95/p99 response times under stress.
- Regression tests: the full suite runs before every production deploy. Every reported bug becomes a regression test.
- Smoke tests: a minimal set of 5-10 tests executed after the production deploy (did the application actually come up).
- Acceptance tests: business tests (Cucumber/Gherkin) confirming the requirement has been met.
A failing test = blocked merge. No exceptions. If a test is „flaky" (unstable), a diagnostic agent analyzes the root cause and fixes the test or the code, but never removes the test without a human decision.
Typical use cases
The patterns we apply most often. Each comes with its own set of agents, tools and templates. Time-to-value measured in weeks, not months.
Legacy system modernization
- •Old monolithic application (PHP/.NET, no tests, hard to maintain)
- •Agents decompose the monolith into microservices (incremental, no downtime)
- •Generate characterization tests (capturing current behavior) before refactoring
- •Data migration with a full audit trail and rollback plan
New enterprise microservice
- •Specification on input (Jira ticket, PRD, meeting transcript)
- •Architecture → code → tests → review → deploy in 2-3 weeks
- •Integration with existing SSO (Keycloak), audit log, monitoring
- •Full EU AI Act and GDPR compliance from day one
System integration
- •Connecting ERP, CRM, KRS, Microsoft Graph, IBM, Cisco, external partners
- •Agents write adapters, mappings, retry/backoff, idempotency
- •Integration tests on real endpoints (sandbox APIs)
- •Monitoring (Prometheus + Grafana) and alerts (Sentry) wired in automatically
Multi-tenant platforms
- •Multi-client SaaS with full data isolation (per-tenant schema or row-level security)
- •Automated client onboarding (Keycloak provisioning, database, roles)
- •Billing based on SSO Billing SDK (token usage tracking, fail-open)
- •Compliance: GDPR, ISO 27001, EU AI Act audit-ready
Comparison: classic team vs. AI-driven process
| Aspect | Classic team (8-10 people) | Team with AI agents (2-3 people) |
|---|---|---|
| Time-to-market (average feature) | 2-4 weeks | 3-7 days |
| Test coverage | 40-60% (when the team has time) | >80% by default (tests generated alongside code) |
| Code review | 1 person, average 30-60 min | 3 agents (security, quality, architecture) + human for complex changes |
| Documentation | Often incomplete, „added later" | Generated alongside code (OpenAPI, README, CHANGELOG) |
| Audit trail | Git history | Git + audit log in database + CHANGELOG + Change Request |
| Scaling | Linear (more people = higher communication cost) | Non-linear (more agents = same number of supervising people) |
| Compliance (EU AI Act, GDPR, ISO 27001) | Often external audit after the fact | Built into the process from day one |
Frequently asked questions
What is automated AI software development?
How is this different from classic programming with Copilot?
What types of tests does this process run?
Does AI deploy code to production on its own?
Does this process work for enterprise projects?
How long does it take to roll this process out in our company?
What about source code security?
Will you replace our development team?
How much does AI software development cost?
What are the typical signals that a company is ready for this process?
Ready for a pilot?
We start with an audit of the existing process and a pilot on a selected microservice. First results visible within 2-4 weeks. No long-term contracts required.