Skip to content

02 · The Architect's Thinking Framework

The last chapter said "architecture is making important decisions amid trade-offs." This chapter hands you a method you can actually follow — turning "gut calls" into "traceable reasoning." Any system at all can be taken apart with this one process.


Good news: architectural judgment has a method

Many people think architectural judgment is a kind of mysticism — a matter of talent, of the "feel" you accumulate over ten years of hitting pitfalls. That's half right: experience does matter. But the other half is that a top architect actually runs a fairly fixed process in their head — they've just done it so often they no longer notice it.

What we're going to do is pull that process out, so you can walk it consciously, step by step. Once you've walked it enough, it too will settle into your own "feel" — but a feel built on method, not intuition out of thin air.

The process looks like this:

   ┌─────────┐    ┌─────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
   │  Needs  │──▶ │Constraints│──▶│  Quality │──▶ │ Candidate│──▶ │Trade-offs│──▶ │ Decision │
   │         │    │         │    │attributes│    │ solutions│    │          │    │          │
   └─────────┘    └─────────┘    └──────────┘    └──────────┘    └──────────┘    └──────────┘
   what should    what bounds     how "well"      what designs    what does     pick one,
   the system     can't be        must it do      could solve     each one      and write
   do?            crossed?        it?             it?             get / give?   down why
        │                                                                            │
        └────────── when business/scale changes, return to the start and rerun ◀─────┘

Notice that return arrow at the bottom: this isn't a process you run once and finish — it's a loop you'll rerun again and again. When the business changes and the scale grows, the constraints and quality goals change with them, and the optimal solution from before may no longer be optimal — which is exactly why architecture needs to evolve (the subject of Chapter 08).

Let's take it section by section. We'll dwell on the second and third steps (constraints and quality attributes), because those are precisely where beginners most often skip ahead, and where the gap opens widest.


Step 1: Needs — but distinguish two kinds

Everything starts with needs. But the word "needs" hides the single most important dividing line in architecture, one most beginners have never noticed:

Functional requirements: what the system should "do."Non-functional requirements (quality attributes): how "well" the system should do it.

This dividing line is worth carving into your brain. Let's make it crystal clear with an example.

Suppose the requirement is: "build a service that lets people upload images and share them for others to view."

The functional requirements (what to do) are easy to list:

  • A user can upload an image
  • A user gets a link back
  • Others can view the image through the link
  • A user can delete their own images

These say "the system must have these features." But notice: with these alone, you cannot make a single architectural judgment. Because —

"Upload an image" — for 100 people, or for a billion? "Others can view it" — must it pop open instantly, or is a three-second spinner fine? "Can share" — must the image be kept forever, or auto-deleted after seven days? A data center catches fire — can these images be lost, or must not a single one go missing?

The answers to these questions are what truly decide what the system should look like. And every one of them is a quality attribute (how well):

  • Performance: the image must display within 200 milliseconds of being opened.
  • Scalability: it must scale smoothly from 100 users to a billion.
  • Availability: total downtime no more than a few minutes a year.
  • Durability: once an image is uploaded, the probability of losing it must be extremely low (say, "eleven nines").
  • Cost: storing a billion images mustn't get so expensive it drags the company under.

This is the architect's most important fundamental skill: the moment you see "what to do," immediately ask "how well."

Functional requirements decide whether the system "works at all"; quality attributes decide how it should be built. The same "upload an image," when it must be instant + never lost + at a billion scale, leads to a completely different architecture than when it just needs to "work — a little personal tool."

In one line: functional requirements are the foundation's "purpose," quality attributes are its "spec." Look at purpose without spec, and you can't build the right building.

Why do beginners always skip quality attributes? Because functional requirements are "explicit" — the product manager writes them in the doc, in black and white. Quality attributes are "implicit" — rarely does anyone volunteer "this has to support a billion users and absolutely cannot lose data." You have to ask them out. And that is exactly what the next section, "asking the right questions," is for.


Step 2: Constraints — you're not designing in a vacuum

Beginners often think architecture is about pursuing "the theoretically best solution." But the reality is: you are always designing inside a pile of "boundaries you cannot cross." Those boundaries are the constraints.

The difference between constraints and quality attributes: quality attributes are "goals you want to reach" (the better the better); constraints are "limits you can't get around" (these are the conditions, full stop). One is what you pursue, the other is what boxes you in.

Common constraints fall into a few categories:

ConstraintHow it forces your hand
Team sizeA three-person team forcing in dozens of microservices? The ops alone will grind them into dust. Architectural complexity can't exceed what the team can handle.
Time"Must ship next week" and "we have a year" force out completely different solutions. When time is tight, you pick "what can ship now," not "what's theoretically optimal."
BudgetMoney decides how many machines, how expensive a service, how many people you can afford. However elegant the solution, if you can't pay for it, it's just talk.
Compliance / regulation"Data must stay within the country," "medical data must meet such-and-such standard" — these are red lines, not "try your best," but "you must." Violate them and the solution is void on the spot.
Existing systemsMost of the time you're not starting from scratch — you have to coexist with a pile of "legacy systems." Your design has to accommodate their interfaces and their temper.
Third-party dependenciesThe payment gateway, the cloud service, the external API you depend on — their capability ceilings and their outages are your constraints.

The key insight: constraints aren't there to annoy you — they're there to help you "cut options."

A design problem with no constraints ("design a perfect system") is actually impossible to start, because the options are infinite. Once you know "three people, three months, must be compliant, must integrate with the old system," the vast majority of flashy solutions are eliminated on the spot, and only a handful of truly viable ones remain. The clearer the constraints, the easier the decision.

So when you take on a new task, don't think about solutions first — first get clear on the constraints. The person who nails down the constraints often ends up faster and steadier than the one who rushes to draw a solution.


Step 3: Quality attributes — make yourself a "checklist of considerations"

Step 1 already showed how important quality attributes are. This step gives you a checklist of common quality attributes to use as a "checkbox sheet" each time you think — not because every item matters, but because running through them one by one ensures you don't miss the one that's lethal.

Quality attributeOne-line explanation
PerformanceHow fast the system reacts. Includes latency (how long one operation takes) and throughput (how much it can handle per unit of time).
AvailabilityWhat fraction of the time the system is "alive and usable." Often measured in "nines."
Reliability / DurabilityWhether data gets lost or operations go wrong. "Durability" specifically means stored data won't mysteriously vanish.
ScalabilityWhen users/data grow, whether the system can cope gracefully by "adding machines" rather than being rebuilt from scratch.
ConsistencyWhether the data seen in multiple places agrees. Touches on "can I read what I just wrote immediately" and "do different users see the same copy."
SecurityWhether it can block attacks, prevent unauthorized access, and protect sensitive data.
CostHow much it takes to build and operate. Architects often forget this is a kind of "quality" too.
MaintainabilityHow easy the system is to change and to understand. How quickly a newcomer gets up to speed, how many places a feature touches.
ObservabilityWhen something breaks, whether you can quickly see "what broke and why."
EvolvabilityWhen the business changes, whether the architecture can grow with it rather than being locked in.

The usage is simple: take a system, run the table top to bottom, and for each item ask "does this matter for my system? what's the target?"

Say you're building an "internal reporting tool": Performance? Medium is fine. Availability? Half an hour of downtime is bearable. Consistency? Just needs to be accurate, not real-time. Cost? Save where you can. — You'll find most items don't demand much, and that itself is important information: it tells you "don't over-engineer."

Now say you're building a "payment system": Consistency? Money can't be miscalculated — top priority. Reliability? Not a single transaction can be lost. Security? Red line. — The same table, but it yields a completely different emphasis.

This checklist is exactly where each architecture template's "quality attributes table" comes from. When you go look at Section 3 of the AI chat product template in a moment, you'll see it lists "time-to-first-token latency, throughput, cost, availability, security" — that's just this generic checklist applied to the specific scenario of "AI chat." Generic framework + specific scenario = that table.


The core idea running through it all: no silver bullet, only trade-offs

Now we reach the soul of the whole framework. If there's only one sentence you remember from this chapter, remember this:

No silver bullet, only trade-offs. Every architecture decision is, at its core, "trading A for B."

"Silver bullet" is an old phrase, meaning "a magic solution that fixes everything in one shot." In architecture, silver bullets don't exist. Almost every choice that makes you better in one dimension makes you worse in another.

  • Want it faster? You often sacrifice cost (add cache, add machines) or consistency (you may read stale data).
  • Want strong consistency? You often sacrifice performance and availability (you have to wait for all nodes to sync).
  • Want high scalability? You often sacrifice simplicity (the system gets more complex, ops gets harder).
  • Want a fast launch? You often sacrifice maintainability (you take on tech debt, to be repaid later).
        Strengthen in this direction ───▶  and you usually cut from that one
        ─────────────────────────────────────────────────────────────────
        Performance ▲                   Cost ▼ / Consistency ▼
        Consistency ▲                   Performance ▼ / Availability ▼
        Scalability ▲                   Simplicity ▼ / Cost ▼
        Security ▲                      Convenience ▼ / Performance ▼
        Launch speed ▲                  Maintainability ▼ (tech debt)

This doesn't mean architecture is a pessimistic business — quite the opposite: once you accept "everything has a price," you shift from "finding the best solution" (which doesn't exist) to "finding the most fitting trade-off" (which does). The latter is the real work of architecture.

From this comes an extraordinarily useful test:

If someone hands you a solution and says it's "good everywhere, with no downsides at all" — that's not because the solution is perfect, it's because they haven't thought it through.

A person who has thought it through can always tell you "here's what this solution is good at, here's what it sacrifices, and here's why that sacrifice is worth it." Seeing the trade-offs is the mark of having thought it through; not seeing them is the mark of not having. Whether you're evaluating someone else's solution or scrutinizing your own, this is a nearly universal litmus test.


How do you walk the whole framework? The key is "asking the right questions"

The framework is the skeleton; what truly sets it in motion is asking questions. As said earlier, quality attributes and constraints are mostly "implicit" — you have to ask them out. So knowing how to ask questions is the architect's core craft.

The following questions apply to almost any system. When you get a new requirement, run them all first:

┌────────────────────────────────────────────────────────────────────┐
│  The architect's "six soul questions"                               │
├────────────────────────────────────────────────────────────────────┤
│  1. How big is the scale?  How many users/data now? Peak?           │
│  2. Read/write ratio?      Read-heavy, or write-heavy?              │
│  3. Consistency demand?    Must a fresh write be readable instantly?│
│                            Can you tolerate brief inconsistency?    │
│  4. Growth expectation?    How big in a year? Gradual or explosive? │
│  5. Cost of failure?       If this dies / loses data, how bad is it?│
│  6. What constraints?      Team size? Time? Budget? Compliance?     │
└────────────────────────────────────────────────────────────────────┘

Why these six? Because each one anchors directly to a class of architecture decision:

  • Scale and growth decide whether and when you must prepare for "scaling."
  • Read/write ratio decides whether your system should lean toward "read-optimized" or "write-optimized" (which profoundly affects how data is stored — see Chapter 05).
  • Consistency demand decides that most classic, most agonizing trade-off: consistency vs performance/availability.
  • Cost of failure decides how much you invest in reliability and availability — "doesn't matter if it's lost" and "lose one and it's a catastrophe" are two different worlds.

Behind these six questions hides a mindset that's hard for beginners to internalize but extremely important: there is no "best architecture," only "the most fitting architecture given this set of answers."

The same "chat feature," built as an internal tool for three people versus built as a WeChat for a billion, yields worlds-apart answers — not because the latter's engineers are smarter, but because the two have completely different answers to the six questions. So don't go hunting for "the right answer" from the start — first answer your six questions clearly; when the answers change, the optimal solution changes.


A full walkthrough: designing a "URL shortener service"

After all that talk, let's take a classic, compact example and walk the framework end to end. Note: our goal is not to give "the standard answer," but to demonstrate "how to think."

The requirement: "build a URL shortener that turns long URLs into short links like short.ly/x7Kp9, which redirect to the original URL when clicked." (i.e., the various "short URL generators")

① Needs: split functional vs quality first

Functional requirements (what to do):

  • Input a long URL, generate a short URL
  • Visit the short URL, redirect to the corresponding long URL
  • (maybe) count how many times each short link was clicked

Just those three, so simple there's almost nothing to design. The real design hides in the quality attributes. So we start asking.

② Ask the right questions (the six soul questions)

  • How big is the scale? Suppose we benchmark against a mid-sized service: 10 million new short links created per day.
  • Read/write ratio? Here comes the key insight — a short link is "created once, clicked countless times." A single link, once shared, might get clicked millions of times. So reads (redirects) vastly outnumber writes (creations) — the ratio might be 100:1 or even 1000:1.
  • Consistency demand? "Must a freshly created short link be accessible immediately?" — ideally yes, but a second or two of delay is tolerable (right after you create it, nobody's usually clicking it that very instant). This is an important "we can loosen up here" signal.
  • Growth expectation? Links only ever grow, never shrink — data accumulates without bound. Over ten years it's an astronomical number, and you'd better plan how to store it.
  • Cost of failure? If "redirect goes down," users can't open their links — bad experience but not fatal. If "data is lost" — every short link already shared out becomes dead, and that's a catastrophe, so durability demands are high.
  • Constraints? Suppose it's a small team, must ship soon, limited budget.

③ From the answers, the quality attributes surface on their own

Translate the answers to the six questions into quality goals:

Quality attributeGoalFrom which answer
Read performanceRedirect must be blazing fast (< 50ms)Reads vastly outnumber writes; redirect is the core experience
Scalability (reads)Must withstand massive read trafficRead/write ratio 100:1+; the peak is in "reads"
DurabilityShort links must never be lostCost of failure: lose them and every shared link dies
ConsistencyCan be "eventually consistent"Accessible a second or two later — users tolerate it
CostStorage must be economicalData accumulates without bound + limited budget

See it? Three plain functional requirements, run through "asking the right questions," grew into a clear table of quality goals. This table is the basis for every decision that follows.

④ Candidate solutions & trade-offs

Now, holding this table, let's look at a few key decision points — each one a trade-off:

Decision A: how to generate the short code?

  • Option 1: hash the long URL and take the first few characters as the short code. Simple, but it collides (different URLs hashing to the same code), so you need extra collision handling.
  • Option 2: use a globally incrementing counter and convert the number into a shorter character representation. No collisions, but it needs a "global ID issuer" component, which is itself a challenge under high concurrency.
  • Trade-off: Option 1 is simple but you handle collisions; Option 2 is clean but introduces the "ID issuer" as a new point of complexity and a potential bottleneck. No free lunch — which you pick depends on whether you fear "the hassle of handling collisions" or "the hassle of maintaining an ID issuer" more.

Decision B: how to withstand the massive "reads"?

  • Our read/write ratio is 100:1+, and popular links get clicked over and over.
  • This is practically shouting at you: add a cache. Put the mappings of popular short links into an in-memory cache, so the vast majority of redirect requests never touch the database.
  • Trade-off: the cache makes reads blazing fast and relieves database pressure (hitting our "read performance" and "cost" goals), at the price of one more thing to maintain, plus the "what if the cache holds stale data" problem — luckily we asked earlier and established that we "can tolerate eventual consistency," so this price we can afford. See — the questions we asked right at the start directly settle the call here.

Decision C: where does the data live, and how is it stored?

  • The data shape is extremely simple: just a "short code → long URL" key-value mapping, and we almost only ever look it up by short code.
  • This "simple key-value, looked up by key, massive in volume, must be fast" shape is naturally suited to a key-value store, rather than being crammed into a complex relational structure.
  • Trade-off: this is exactly the general principle that "the data's access shape decides the storage choice" (which will recur in the next chapter and in Chapter 05).

⑤ Decide & write down the "why"

By the time you reach here, you have not just a solution — more importantly, you can articulate the reason and the cost behind every decision:

"We use a cache to withstand reads, because the read/write ratio is lopsided and we can tolerate eventual consistency; the cost is one more layer to maintain and brief staleness, but we confirmed earlier that this cost is acceptable. We use a key-value store, because the data shape is a simple lookup by key. We use Option X for the short code, because…"

This, right here, is architectural judgment. What separates it from "I just picked a database and started hacking" isn't the technology used — it's that behind every step there's a "why" and a "cost." Writing down those "whys" is the ADR (Architecture Decision Record) we'll cover in Chapter 08.


📌 Real-world cases: this framework isn't armchair theory

This chapter ran through the six questions with a "URL shortener service." Real-world shorteners (Bitly, TinyURL, etc.) are designed exactly this way: lopsided read/write ratio, heavy caching, KV storage, unique-ID issuance.

  • To see the "finished product" this framework yields, compare directly with this repo's URL shortener template — its Section 3 (needs and constraints) and Section 8 (key decisions) are exactly the result of running this chapter's "six questions + trade-offs" all the way through.

Chapter summary

  • Architectural judgment has a method: needs → constraints → quality attributes → candidate solutions → trade-offs → decision, and the moment the business changes, you return to the start and rerun.
  • The architect's most important fundamental skill is distinguishing functional requirements (what to do) from quality attributes (how well). Function decides whether it "works at all"; quality attributes decide how it should be built. The former is explicit, written in the doc; the latter is implicit — you have to ask it out.
  • Constraints aren't there to annoy you — they're there to help you cut options. Team, time, budget, compliance, existing systems, third-party dependencies — nail down the constraints and only a handful of viable solutions remain.
  • No silver bullet, only trade-offs. Every decision is trading A for B. A solution that's "good everywhere, with no downsides" isn't perfect — it just hasn't been thought through. Seeing the trade-offs = having thought it through.
  • Asking questions is the core craft. The six soul questions: scale, read/write ratio, consistency, growth, cost of failure, constraints. There is no best architecture, only the most fitting one given this set of answers.
  • The URL shortener example shows us: three plain functional requirements, run through "asking the right questions," grow into a clear table of quality goals — and that table is the basis of every decision.

This framework maps neatly onto two core sections in every architecture template: "Core needs and constraints" (corresponding to this chapter's needs/constraints/quality attributes) and "Key architecture decisions and trade-offs" (corresponding to this chapter's candidate solutions/trade-offs/decision).

Go verify it right now: open any template under templates/ and look straight at its Section 3 and Section 8 — you'll find they're the actual products of this framework. Try reverse-engineering with this chapter's "six soul questions": its designer must have asked these very questions back then.

Once you've thought it through, you still have to explain it clearly — so everyone on the team can see the system in your head. And that takes knowing how to draw.

👉 Continue: 03 · Reading and Drawing Architecture Diagrams Well