Technical Architect's Field Manual

Indrajith's — Reference Documents — April 2026

A practical guide covering solution design, non-functional requirements, governance, cloud readiness, estimation, and GenAI architecture.

Preface

This book is written for a Technical Architect who needs to be prepared, not just informed. Being informed means you have read about architectural concepts. Being prepared means you can walk into a room, face a real problem, and produce a structured, defensible approach under pressure.

Every module follows the same structure: why it matters, the core concepts explained plainly, real-world case studies, and practice exercises. Read a module, do the exercise, then move on. Do not read this cover to cover in one sitting.

Note: This book references Robert C. Martin's Clean Architecture throughout relevant modules. These references are not required reading. This book stands on its own.

What Makes a Good Technical Architect?

They think in trade-offs, not solutions. There is no perfect architecture. Every decision gives something up to gain something else.
They communicate at the right altitude. To an executive: cost and timeline impact. To a developer: patterns and interfaces. Same decision, different altitude.
They make the implicit explicit. The most dangerous architectural decisions are the ones nobody realised were decisions.
They are comfortable with uncertainty. Architecture is practised under incomplete information. The skill is making sound decisions in spite of it.

Module 01 — Solution Design

When a stakeholder brings you a problem, your first job is not to produce a diagram. Your first job is to understand the problem well enough that you could explain it back more clearly than they explained it to you.

Architectural Styles

An architectural style is a named, well-understood approach to organising a system. It is a starting point, not a solution.

Layered (N-Tier) Architecture

The system is divided into horizontal layers, each with a specific responsibility. Each layer only communicates with the layer directly below it.

flowchart TD A["Presentation Layer — UI, API Controllers"] B["Business Layer — Business rules, use cases"] C["Persistence Layer — Repositories, data access"] D["Database Layer — SQL / NoSQL storage"] A --> B --> C --> D

Works well: Line-of-business applications with clear CRUD operations, small-to-medium teams, well-understood requirements upfront.

Breaks down: At scale, layers become monolithic slabs. Business logic leaks into the presentation layer. This is the big ball of mud anti-pattern.

Warning — The Sinkhole Anti-Pattern: If more than 20% of your requests pass straight through every layer without transformation, reconsider whether you need all the layers you have defined.

Microservices Architecture

The system is divided into small, independently deployable services, each owning its own data and communicating via well-defined APIs.

flowchart LR Client --> GW["API Gateway\nrouting, auth, rate limiting"] GW --> US["User Service"] GW --> OS["Order Service"] GW --> PS["Payment Service"] US --- DB1[("Users DB")] OS --- DB2[("Orders DB")] PS --- DB3[("Payments DB")]

Works well: Large organisations where multiple teams need to deploy independently. Systems where components have very different scaling needs.

Breaks down: Small teams where operational overhead outweighs the independence gained. Microservices sharing a database are not microservices — they are a distributed monolith, which is the worst of both worlds.

Event-Driven Architecture

Components communicate by producing and consuming events through a message broker. Producers do not know who consumes their events.

flowchart LR OS["Order Service"] -- "OrderPlaced event" --> MB["Message Broker\nKafka / RabbitMQ"] MB --> ES["Email Service"] MB --> IS["Inventory Service"] MB --> AS["Analytics Service"]

Works well: Workflows where multiple things must happen in response to one action. Systems that need to absorb bursty load. Audit logging where every event is a record of what happened.

Breaks down: Systems requiring strong consistency. Debugging is significantly harder than in synchronous systems.

Hexagonal Architecture (Ports and Adapters)

The application core (domain logic) sits at the centre, completely ignorant of the outside world. It defines ports (interfaces). Adapters translate between the outside world and those interfaces.

flowchart LR subgraph "External Drivers" REST & CLI & GRPC end subgraph "Application Core" IP["Input Port (interface)"] --> LOGIC["Domain Logic"] LOGIC --> OP["Output Port (interface)"] end subgraph "External Driven" DB["DB Adapter"] & MQ["MQ Adapter"] end REST & CLI & GRPC --> IP OP --> DB & MQ

A port is an interface defined by the application core, such as IUserRepository. An adapter is a concrete implementation of that interface, such as PostgresUserRepository. The core defines what it needs; adapters fulfil those needs without the core knowing how.

The C4 Model

Created by Simon Brown, the C4 model provides four levels of zoom for communicating architecture. The most common failure is altitude mismatch: explaining infrastructure details to a CEO, or system context to a developer who needs to know which class to modify.

flowchart TD L1["Level 1 — System Context\nAudience: Everyone including non-technical\nShows: Your system, external users, external systems"] L2["Level 2 — Container\nAudience: Technical stakeholders, architects\nShows: Web app, API, database, message queue"] L3["Level 3 — Component\nAudience: Developers on that container\nShows: Major components inside one container"] L4["Level 4 — Code\nAudience: Individual developers\nShows: Classes, functions, interfaces"] L1 --> L2 --> L3 --> L4

Design Principles

Coupling and Cohesion: Coupling is the degree to which one component depends on another. Cohesion is the degree to which elements within a component belong together. The goal is always high cohesion, low coupling.

The Dependency Rule: Source code dependencies must always point inward — from outer layers (frameworks, databases, UI) toward inner layers (use cases, entities). The business rules at the centre must know nothing about the database, the web framework, or the UI library.

Decision-Making Framework:

Frame the problem. What constraint forces this decision? What business goals must it serve?
List the options, including the option of doing nothing. Evaluate at minimum three options.
Identify the trade-offs. What does each option gain? What does it give up?
Make the decision explicit. Name the decision, the chosen option, and the rationale.
Document the context. Future architects need to know why this was correct at the time.

Case Study — Uber's Architecture Evolution

In 2014, Uber was a monolithic Python application using the Flask framework, backed by a single PostgreSQL database. As it expanded to new cities, the monolith became a bottleneck. Deployments broke unrelated features. A bug in the payments module could take down the driver location service.

Uber began migrating to microservices using the Strangler Fig pattern — new services were built alongside the monolith, and traffic was gradually routed to them. Trip management, user accounts, payments, and driver dispatch were extracted first because they had distinct data models and distinct teams.

Uber's microservices eventually numbered in the thousands. Operational overhead became significant. They later consolidated services and invested heavily in internal tooling. Microservices solve one class of problems while creating another.

Practice Exercise (30 minutes): Choose a system you have worked on. Produce two diagrams on paper: a C4 Level 1 (System Context) and a C4 Level 2 (Container). Then write one paragraph answering: "Why did I choose this architectural style, and what would change if user volume increased 100x?"

Module 02 — Non-Functional Requirements

A system that does what it is supposed to do but crashes under load, leaks data, or responds in 30 seconds is not a working system — it is a liability. NFRs define the operational qualities that make a system fit for production. They are invisible until violated.

NFR Categories

Category	What It Means	Example Metric
Performance	Response time and throughput under load	API response < 200ms at p95 under 1,000 RPS
Scalability	Ability to handle growth in users, data, transactions	Handle 10x current load within 6 months
Availability	Uptime and resilience to failure	99.9% uptime = max 8.7 hours downtime per year
Reliability	Correctness and consistency of behaviour over time	Zero data loss on payment transactions
Security	Protection from unauthorised access or data breach	All PII encrypted at rest and in transit (AES-256)
Maintainability	Ease of modifying or extending the system	A new developer can make a code change within 1 day
Observability	Ability to understand system state from its outputs	Full distributed tracing across all services
Disaster Recovery	How fast you recover from a major failure	RTO < 4 hours, RPO < 1 hour

Note: Always express performance NFRs in percentiles — p50, p95, p99 — never averages. An average of 100ms can hide the fact that 1% of requests take 30 seconds.

Availability Numbers

Availability	Downtime per Year	Downtime per Month	Typical Use Case
99% (two nines)	~87.6 hours	~7.3 hours	Internal tools, dev environments
99.9% (three nines)	~8.77 hours	~43.8 minutes	Most business applications
99.95%	~4.38 hours	~21.9 minutes	Consumer-facing SaaS products
99.99% (four nines)	~52.6 minutes	~4.4 minutes	Financial systems, healthcare
99.999% (five nines)	~5.26 minutes	~26.3 seconds	Telecommunications, life-critical

The cost of achieving each additional nine increases exponentially. Always ask: "What is the business cost of each additional nine, and is it worth the engineering cost?"

RTO and RPO

RTO — Recovery Time Objective: The maximum acceptable length of time that the system can be offline after a failure before it causes unacceptable business damage. Drives your recovery automation, failover speed, and on-call staffing.

RPO — Recovery Point Objective: The maximum acceptable amount of data loss measured in time. An RPO of 1 hour means you must back up or replicate data at least every hour. An RPO of zero requires synchronous replication to a standby.

timeline title RTO and RPO on a Failure Timeline section Before Failure Last backup taken : RPO window begins here section Failure Point System goes down : RPO = data potentially lost section After Failure Recovery begins : RTO window begins here System restored : RTO = total downtime window

Writing SMART NFRs

Every NFR must be Specific, Measurable, Achievable, Relevant, and Time-bound. Vague NFRs cannot be tested, enforced, or used to drive architectural decisions.

Vague (Useless)	SMART (Useful)
The system should be fast	The product search API must return results in under 300ms at the 95th percentile under a sustained load of 500 concurrent users with production-representative data.
The system must be secure	All user passwords must be hashed using bcrypt with a cost factor of minimum 12. No PII may appear in application logs. All API tokens must expire within 24 hours.
The system should scale	The system must maintain p95 response times below 500ms when concurrency increases from 100 to 10,000 users via horizontal scaling with no code changes required.
The system should be available	The payment service must achieve 99.95% uptime measured monthly, excluding planned maintenance windows communicated 72 hours in advance, not exceeding 2 hours per month.

Case Study — Netflix and Chaos Engineering

In 2010, Netflix began migrating from data centres to AWS. Their engineers realised they could not guarantee reliability by preventing failures — at cloud scale, individual components would fail constantly. The question was not whether failures would occur but whether the system would survive them.

Rather than writing traditional availability NFRs, Netflix defined a resilience requirement: the system must continue serving user-facing features gracefully even when individual backend instances are randomly terminated during business hours. They built Chaos Monkey in 2011 to enforce this requirement in production. Netflix released Chaos Monkey as open source in 2012 under the Apache 2.0 licence, expanding it into the Simian Army.

This single resilience NFR drove mandatory architectural decisions: every service needed circuit breakers, all downstream calls needed timeouts and fallbacks, every feature needed a degraded mode, and all services needed to be stateless so any instance could be terminated without data loss.

Practice Exercise (25 minutes): Choose any application you have worked on. Write 6 NFRs covering: performance, scalability, availability, security, observability, and one of your choice. For each, write it first in vague form then rewrite it in SMART form. Identify which two NFRs are in tension with each other.

Module 03 — Governance & Architecture Decision Records

Without governance, every team makes different decisions, the codebase fragments, and institutional knowledge lives only in people's heads. Governance is how you scale architectural consistency across teams without becoming a bottleneck.

Architecture Decision Records (ADRs)

An ADR is a short document — one to two pages maximum — that captures a single architectural decision, the context that forced it, the options considered, and the rationale for the choice made.

stateDiagram-v2 [*] --> Proposed : New decision identified Proposed --> Accepted : Review complete Proposed --> Rejected : Alternative chosen Accepted --> Deprecated : Context has changed Accepted --> Superseded : Replaced by newer ADR Deprecated --> [*] Superseded --> [*] Rejected --> [*]

ADR Section	What to Write	Common Mistake
Title	Short, imperative, specific. "Use PostgreSQL for all transactional data."	Vague titles that do not communicate the decision
Status	Proposed / Accepted / Deprecated / Superseded (by ADR-023)	Never updating status when decisions change
Context	What forces are at play? What constraints exist? Write as if explaining to someone not in the room.	Writing context as justification for a decision already made
Decision	The specific choice, active voice. "We will use PostgreSQL 15 with read replicas."	Describing the process of deciding rather than the decision
Consequences	What becomes easier? What becomes harder? What risks does this introduce?	Only listing positive consequences
Alternatives Considered	Other options evaluated and why they were rejected	Omitting this section entirely

Note: The "Alternatives Considered" section is the most valuable part for future readers. Without it, a new engineer cannot know that an option was already evaluated and rejected. They may spend weeks investigating something already dismissed.

Architecture Review Boards

An ARB is a governance forum where significant architectural decisions are reviewed before implementation.

flowchart TD A["Architect proposes decision"] --> B{"Meets trigger criteria?"} B -- "No (routine change)" --> C["Proceed with ADR only"] B -- "Yes" --> D["Submit to ARB"] D --> E["ARB Review — Presentation and Q&A"] E --> F{"ARB Decision"} F -- "Approved" --> G["Proceed — ADR marked Accepted"] F -- "Conditions" --> H["Address conditions then proceed"] F -- "Rejected" --> I["Document rejection, explore alternatives"]

ARB trigger criteria: New technology adoption, cross-team architectural changes, changes to shared infrastructure, decisions with significant security or cost implications, and any decision where the cost of being wrong is high.

Case Study — Amazon's API Mandate

In 2002, Amazon was struggling with a codebase so deeply interconnected that teams could not work independently. Jeff Bezos issued the "API Mandate": all teams must expose their data and functionality through service interfaces; teams must communicate only through these interfaces, with no shared databases and no direct linking; all interfaces must be designed to be externally exposable.

This was a governance decision, not just an architectural one. It changed how teams were evaluated and how data was accessed. The mandate created the foundation for what eventually became Amazon Web Services — APIs built for internal use turned out to be sellable externally because they were designed with strict interface contracts from the beginning.

Practice Exercise (30 minutes): Write a complete ADR for a real architectural decision from your current or most recent project. Write all six sections. Write at least two rejected alternatives with reasons. Then answer: "If someone reads this ADR in three years, what context might they be missing?"

Module 04 — Cloud Readiness

There is a critical distinction between an application that runs in the cloud and one that is cloud-native. Running in the cloud means you have moved your virtual machines from a data centre to AWS. Cloud-native means your application is designed to exploit cloud capabilities: elastic scaling, managed services, pay-per-use cost models, and geographic distribution.

The 6 Rs of Cloud Migration

flowchart TD START["Assess Workload"] --> Q1{"Still needed?"} Q1 -- "No" --> RETIRE["Retire — Decommission"] Q1 -- "Yes" --> Q2{"Regulatory or technical blocker?"} Q2 -- "Yes" --> RETAIN["Retain — Keep on-premise"] Q2 -- "No" --> Q3{"Better SaaS alternative?"} Q3 -- "Yes" --> REPURCHASE["Repurchase — Move to SaaS"] Q3 -- "No" --> Q4{"Time constraint?"} Q4 -- "Urgent (weeks)" --> REHOST["Rehost — Lift and Shift"] Q4 -- "Medium (months)" --> REPLATFORM["Replatform — Lift and Reshape"] Q4 -- "Flexible (quarters)" --> REFACTOR["Refactor — Re-architect cloud-native"]

Strategy	Description	Effort	Cloud Benefit
Rehost (Lift & Shift)	Move as-is to cloud VMs, no code changes	Low	Minimal
Replatform (Lift & Reshape)	Small optimisations, e.g. move to managed RDS	Low–Medium	Moderate
Repurchase	Replace entirely with a SaaS product	Medium	High
Refactor / Re-architect	Redesign as cloud-native	High	Very High
Retire	Decommission — application no longer needed	Low	N/A
Retain	Keep on-premise — regulatory or risk reasons	None	None

The 12-Factor App

Authored by Adam Wiggins (Heroku co-founder) and published in 2011. The four most architecturally significant factors:

Factor	What It Means	Architectural Impact
III. Config in environment	No hardcoded config. Use environment variables or a config service.	Enables deploying the same artefact to dev, staging, and prod
VI. Stateless processes	No persistent data in memory between requests. State lives in backing services.	Makes horizontal scaling possible — any instance can serve any request
IX. Disposability	Processes start fast and shut down gracefully.	Enables auto-scaling and zero-downtime deployments
XI. Logs as event streams	Applications write logs to stdout; infrastructure handles aggregation.	Application has no knowledge of log destination — deployable anywhere

Cloud-Native Resilience Patterns

Circuit Breaker

stateDiagram-v2 [*] --> Closed : Initial state Closed --> Open : Failure threshold exceeded Open --> HalfOpen : Timeout period elapsed HalfOpen --> Closed : Test request succeeds HalfOpen --> Open : Test request fails Closed : CLOSED — requests pass through normally Open : OPEN — requests fail immediately, fallback returned HalfOpen : HALF-OPEN — one test request allowed

Other Key Patterns

Bulkhead: Isolate failures by partitioning resources (thread pools, connection pools) so a failure in one area cannot consume all shared resources.
Retry with Exponential Backoff: Retry transient failures with increasing wait times plus jitter to avoid thundering herd problems.
Sidecar: Attach a helper process alongside each service for cross-cutting concerns (logging, tracing, auth, TLS termination).
Blue/Green Deployment: Run two identical environments; route traffic to the new one, roll back instantly if needed.
Health Endpoint: Every service exposes a /health endpoint so orchestrators can detect and replace failed instances.

Case Study — Airbnb's Migration from Monolith

By 2017, Airbnb's Rails monolith had grown to millions of lines of code. Build times exceeded 45 minutes. Before migrating, engineers assessed the monolith against cloud-native principles and found critical violations: session state in server memory (violating Factor VI), configuration embedded in YAML files committed to version control (violating Factor III), startup time of 8 minutes (violating Factor IX), and logs written to local disk (violating Factor XI).

Airbnb addressed the 12-Factor violations first (sessions moved to Redis, config moved to Consul), then began extracting services at natural domain boundaries. The search service was extracted first because it had the clearest data boundary and the highest independent scaling need.

Practice Exercise (30 minutes): Choose a system you know. Apply the 6 Rs. Check the system against the four most important 12-Factor principles. Where does it fall short? Identify the single most important cloud-native pattern it is missing. Estimate the migration effort using T-shirt sizing.

Module 05 — Estimation

The goal of estimation is not accuracy. The goal is calibrated confidence — communicating what you know, what you do not know, and how confident you are, in terms a business stakeholder can act on.

T-Shirt Sizing

Size	Approximate Effort	Characteristics	Examples
XS	1–3 days	Well-understood, single developer, no dependencies	Adding a field to an API response
S	1–2 weeks	Clear scope, single team, some implementation unknowns	New API endpoint; integration with a known SDK
M	2–6 weeks	Multiple components, cross-team coordination, meaningful unknowns	New microservice extraction; auth system replacement
L	2–4 months	Significant complexity, multiple teams, unclear scope areas	Cloud migration of an application; new event-driven pipeline
XL	4+ months	High complexity, many unknowns — break into phases	Full microservices migration; new product platform

Note: An XL estimate is a signal, not a commitment. The right response is: "How do we break this into deliverable phases where each phase provides business value?"

Three-Point Estimation (PERT)

Program Evaluation and Review Technique gives a statistically defensible expected value while explicitly modelling uncertainty.

Optimistic (O): Everything goes right. No unexpected problems.
Most Likely (M): Realistic case with normal friction and rework.
Pessimistic (P): Significant problems occur. Dependencies are late. Complexity was underestimated.

Expected = (O + 4M + P) / 6
Std Dev  = (P - O) / 6

Worked Example — OAuth 2.0 Migration:

Scenario	Weeks	Reasoning
Optimistic (O)	4	Library works first time, no legacy edge cases, team has done this before
Most Likely (M)	7	2 weeks discovery, 3 weeks implementation, 2 weeks testing and fixes
Pessimistic (P)	14	Legacy session handling deeply embedded, clients need coordinated migration, security review required

Expected = (4 + 4×7 + 14) / 6 = 46 / 6 ≈ 7.7 weeks
Std Dev  = (14 - 4) / 6 ≈ 1.7 weeks
68% confidence interval: 6 to 9.5 weeks

Cone of Uncertainty

The earlier in a project you estimate, the wider the range of error. This is not a failure of skill — it is a mathematical reality. Estimation error decreases as a project progresses and more information is known.

xychart-beta title "Estimation Error Decreases Over Project Lifecycle" x-axis ["Initial Concept", "Requirements Complete", "Design Complete", "Code Complete"] y-axis "Estimation Error (%)" 0 --> 400 bar [350, 75, 30, 10]

Warning — The Commitment Trap: The most dangerous moment in estimation is when a stakeholder receives a wide early-phase estimate and asks you to "just give a number for planning purposes." The number you give becomes the plan. Caveats evaporate. "I cannot give you a number more accurate than X until we complete the discovery phase" is one of the most important sentences a Technical Architect can say.

Case Study — Healthcare.gov Launch Failure

Healthcare.gov, the US federal health insurance marketplace, launched in October 2013 and immediately failed. On its first day, only 6 people successfully enrolled despite 250,000 visitors. Load estimates were based on optimistic scenarios. The database was not load-tested until two weeks before launch. Integration testing of the 55 separate agencies' systems was not completed until the final weeks.

The most critical failure was in integration complexity estimation. Integration complexity does not scale linearly: two systems have one integration point, three systems have three, ten systems have 45 possible interaction points. This was not adequately accounted for.

Practice Exercise (25 minutes): Estimate an OAuth 2.0 migration where 40 client applications consume the current LDAP-based system and the team has never implemented OAuth 2.0 before. Document your O/M/P scenarios, top 3 assumptions, top 3 risks, and what a 2-week discovery phase should investigate.

Module 06 — GenAI Architecture

Large Language Models are not deterministic systems. Send the same prompt twice and you may receive different responses. Ask them a factual question and they may answer confidently with fabricated information. Integrating LLMs into production systems requires patterns specifically designed to manage non-determinism, latency, cost, and trust.

Retrieval Augmented Generation (RAG)

RAG solves the fundamental problem that LLMs know nothing about your organisation's specific documents, policies, or internal knowledge. RAG retrieves relevant context from your own data store and provides it to the LLM as part of the prompt.

Query-Time Flow

flowchart TD UQ["User Query"] --> EM["Embedding Model — convert query to vector"] EM --> VS["Vector Database — semantic similarity search"] VS -- "Top-K relevant chunks" --> PC["Prompt Constructor\nSystem prompt + Context + Query"] PC --> LLM["LLM (GPT-4 / Claude / Gemini)"] LLM --> RV["Response Validator — check for hallucination, PII"] RV --> USER["Response to User"]

Ingestion Pipeline (Offline)

flowchart LR DOCS["Source Documents\nPDF, Word, Confluence"] --> LOAD["Document Loader\nExtract text"] LOAD --> CHUNK["Chunker\nSplit into overlapping pieces"] CHUNK --> EMBED["Embedding Model\nConvert to vectors"] EMBED --> INDEX["Vector Index\nStore with metadata"] INDEX --> FRESH["Freshness Manager\nHandle updates and re-indexing"]

Agents and Tool Use

Pure RAG is a read-only pattern. Agents extend this by allowing the LLM to call functions, query APIs, execute code, and take real-world actions. This dramatically increases capability and risk simultaneously.

sequenceDiagram participant U as User participant A as Agent (LLM) participant T as Tools / APIs U->>A: "Book a meeting with the marketing team for Tuesday 2pm" A->>A: Reason: Need to find marketing team members A->>T: get_team_members(team="marketing") T-->>A: ["alice@co.com", "bob@co.com", "carol@co.com"] A->>A: Reason: Check availability A->>T: check_calendar(attendees=[...], time="Tuesday 14:00") T-->>A: { available: true } A->>T: create_event(title="Marketing Meeting", ...) T-->>A: { event_id: "evt_123", status: "created" } A->>U: "Meeting booked with Alice, Bob, and Carol for Tuesday at 2pm."

Warning: Design every agentic system with explicit action allowlists, confirmation steps before irreversible actions, rate limiting per user and per action type, and comprehensive logging. A malicious user could inject instructions through a document the agent reads (prompt injection), overriding your system prompt.

Risks and Mitigations

Risk	Description	Architectural Mitigation
Hallucination	LLM generates plausible-sounding but factually incorrect information	Use RAG to ground responses in verified documents. Build evaluation harnesses with golden datasets.
Latency	LLM API calls take 1–30 seconds	Streaming responses, async processing, response caching, smaller models for latency-sensitive paths.
Cost	LLMs charge per token; complex prompts are expensive at scale	Token budget management, prompt caching, smaller models for classification tasks, monitor cost per request.
Data Privacy	User data sent to third-party LLM APIs may violate GDPR or HIPAA	PII detection and redaction before sending to external APIs. Audit logs of what data was sent.
Prompt Injection	Malicious users embed instructions that override system prompts	Input sanitisation, privilege separation, output validation, anomaly monitoring.
Model Version Drift	Providers update models without notice, changing behaviour	Pin to specific model versions. Maintain regression test suites.
Non-Determinism	Same prompt produces different outputs	Set temperature to 0 for deterministic tasks. Build probabilistic evaluation harnesses.

Case Study — GitHub Copilot

Code suggestions that arrive more than 200ms after the user pauses typing feel unresponsive. Copilot uses a purpose-trained model rather than a general-purpose model — a smaller, faster model achieves quality competitive with larger models on the specific task of code completion. The prompt is carefully engineered to include only the most relevant context (surrounding code, related files, open tabs), keeping token counts, and therefore latency and cost, bounded.

GitHub cannot test Copilot by checking if generated code is "correct" — correctness is domain-specific. Instead they evaluate on proxy metrics: acceptance rate (did the developer accept the suggestion?), persistence rate (is the accepted suggestion still in the code 30 seconds later?), and latency distribution (what percentage of suggestions arrive within 200ms?).

Practice Exercise (35 minutes): Design a RAG-based internal knowledge assistant for a 5,000-person company. Produce: a C4 Level 2 container diagram; the top 5 NFRs with SMART metrics; the top 3 risks with specific mitigations; an explanation of how you would handle document freshness; and a cost model.

Artifact Templates

Architecture Proposal Template

Section	Content	Length
1. Executive Summary	Problem, proposed solution, key trade-offs	1 paragraph
2. Problem Statement	Business problem, who is affected, constraints	1–2 paragraphs
3. Assumptions & Constraints	All assumptions listed explicitly	Bulleted list
4. Proposed Architecture	C4 Level 1 + Level 2 diagrams, one paragraph per major component	Diagrams + 1 page
5. Key Decisions	3–5 decisions in mini-ADR format	½ page per decision
6. NFR Coverage	Top 5 NFRs with SMART metrics and how the architecture addresses each	Table
7. Risks & Mitigations	Top 3–5 risks with likelihood, impact (H/M/L), and specific mitigation	Table
8. Open Questions	What would you need to know to finalise this design?	Bulleted list

NFR Specification Template

Attribute	Content
Category	Performance / Scalability / Availability / Security / Observability / etc.
Statement	The specific, measurable requirement
Measurement Method	How this will be verified — load test, security scan, uptime monitor
Acceptance Threshold	Pass/fail criteria with specific numbers
Priority	Must Have / Should Have / Nice to Have
Architectural Impact	Which architectural decisions this NFR drives

Glossary

Term	Definition	Module
ADR	Architecture Decision Record — a document capturing a single architectural decision, its context, alternatives, and consequences	03
ARB	Architecture Review Board — a governance forum for reviewing significant architectural decisions	03
Availability	The proportion of time a system is operational, expressed as a percentage	02
Bounded Context	A logical boundary within which a domain model is internally consistent	01
C4 Model	A hierarchical diagramming approach with four levels: Context, Container, Component, Code	01
Circuit Breaker	A pattern that prevents cascade failures by stopping calls to a failing service after a threshold	04
Cohesion	The degree to which elements within a component belong together. High cohesion is desirable.	01
Coupling	The degree of dependency between components. Low coupling is desirable.	01
Embedding	A numerical vector representation of text capturing semantic meaning	06
Hallucination	An LLM generating plausible-sounding but factually incorrect information	06
NFR	Non-Functional Requirement — a requirement specifying how well a system performs its functions	02
PERT	Program Evaluation and Review Technique — three-point estimation using O, M, and P estimates	05
Prompt Injection	An attack where malicious input overrides an LLM system prompt	06
RAG	Retrieval Augmented Generation — grounds LLM responses in retrieved documents	06
RPO	Recovery Point Objective — the maximum acceptable data loss measured in time	02
RTO	Recovery Time Objective — the maximum acceptable duration of downtime after a failure	02
Strangler Fig Pattern	Migrating a legacy system by gradually extracting components over time	01
T-Shirt Sizing	Relative estimation using XS/S/M/L/XL categories	05
12-Factor App	A methodology defining 12 characteristics of applications designed for cloud deployment, authored by Adam Wiggins in 2011	04
Vector Database	A database optimised for storing and querying embedding vectors	06

Indrajith's — Reference Documents — April 2026