Skip to content

06 · Quality Attributes & Trade-offs

The thesis in one line: function decides whether a system "works at all"; quality attributes decide whether it's "good to use, can withstand the load, and is worth it." And these attributes conflict with one another — you can't have them all. The essence of architecture is to prioritize them by the business, then make trade-offs with clear eyes.


First, picking up from 02: what quality attributes are, and why they're architecture's main battlefield

In 02 · The Architect's Thinking Framework, we split needs into two kinds:

  • Functional requirements: what the system does (can place orders, can search, can send messages).
  • Non-functional requirements / quality attributes: how well the system does these things (how fast, how stable, how economical, how secure).

Beginners stare at function, architects stare at quality attributes. The reason is simple: function usually has a standard answer (need a shopping cart? everyone builds roughly the same thing), while quality attributes are almost all trade-offs with no standard answer — and that's where judgment is truly tested.

This chapter expands the quality-attribute checklist from 02 one by one. For each, I'll make three things clear:

① How to measure it (a goal you can't put a number on is empty talk) ② How to achieve it (which architectural means are common) ③ Who it conflicts with (it's not free — who does it offend)

The third is the soul. Because this chapter's most central truth is: these attributes fight each other. Max out any one and you almost inevitably hurt another.


I. Performance

① How to measure — first tell apart two metrics often conflated:

  • Latency: how fast a single request goes from sent to result. It cares about "how long one operation waits."
  • Throughput: how many requests can be handled per unit of time (e.g. QPS). It cares about "how many people can be served per second."

These two aren't the same thing, and are often even opposed. An intuitive analogy:

   Latency    = how long one car takes from Beijing to Shanghai (one-way time)
   Throughput = how many cars per hour this highway can pass (total flow)

   Build a 200-km/h racetrack: ultra-low latency (one car flies), but only one
                               lane, so low throughput
   Build an 80-km/h ten-lane:  one car isn't fast (high latency), but huge flow,
                               so high throughput

When measuring latency, never look only at the average. An average latency of 100ms sounds fine, but maybe 99% of people are at 50ms and 1% are stuck for 5 seconds — and that 1% may be your most important big customers. So look at percentiles (P95, P99, P999): "99% of requests complete within how many milliseconds." Tail latency (the long tail) is often the real experience-killer.

There's another oft-overlooked dimension: perceived latency. The "fast" a user feels and the real total elapsed time are often not the same.

Remember the streaming output of the AI chat product? It didn't make total generation time shorter, but let the user see the first character pop out within 1 second — worlds apart in experience. This is the magic of "perceived latency." Any design that lets the user see feedback sooner (streaming, skeleton screens, optimistic updates, progress hints) improves perceived latency — and that often matters more than real latency.

② How to achieve: caching (turning recompute/remote-fetch into a near read), read-write splitting, going asynchronous (toss slow work into a queue and return first), CDN (placing content closer to the user), better data structures/indexes, precomputation. Note — you've seen these means already in 04 and 05.

③ Who it conflicts with: mainly with cost (faster usually means more machines / more expensive storage / more cache), with consistency (caching and async buy speed at the price of possibly-less-fresh data), and with simplicity (every performance optimization is an extra dose of complexity).


II. Availability

① How to measure — use "nines" for "percentage of uptime." The cold reality behind this string of numbers is "how much downtime is allowed per year," and you must develop a feel for it:

   Availability     Downtime/year       Downtime/month    Feel
   ─────────────────────────────────────────────────────────────────
   99%    (two 9s)    ≈ 3.65 days         ≈ 7.2 hours      toy/internal tool
   99.9%  (three 9s)  ≈ 8.76 hours        ≈ 43 minutes     pass mark for a normal
                                                            online service
   99.99% (four 9s)   ≈ 52.6 minutes      ≈ 4.3 minutes    serious commercial service
   99.999%(five 9s)   ≈ 5.26 minutes      ≈ 26 seconds     telecom/payment grade,
                                                            extremely expensive

Each extra 9 usually multiplies cost and complexity by an order of magnitude. Going from three 9s to five 9s isn't "trying a bit harder," it's "nearly redoing the architecture and pouring in several times the money." So — don't demand five 9s off the cuff; first ask the business: if this service is down for an hour, how much is actually lost? Sometimes three 9s is plenty, sometimes even a minute is a huge loss.

② How to achieve: the core idea is just one — eliminate the single point of failure (SPOF), backstop it with redundancy.

  • Redundancy: every critical component has at least a backup; one dies, the other takes over (remember the primary-replica replication of Chapter 05? a replica is the primary's redundant backstop).
  • Fault-domain isolation: don't put the eggs in one basket — multiple machines, racks, availability zones, regions. Let no single point's failure bring down the whole.
  • No single point: from ingress to storage, check layer by layer "if this link dies, does the whole chain break," and make every such link redundant.
  • Graceful degradation: when you can't hold up, rather shut off secondary features and keep the core (e.g. during a big sale, turn off "you may also like" but keep "place order & pay") than crash entirely.
   With a single point (fragile): user ─▶ [the only gateway] ─▶ [the only database]
                                            ↑ either one dies = whole system down

   No single point (robust):     user ─▶ [gateway ×3] ─▶ [primary + replicas, cross-AZ]
                                            ↑ one dies, rest take over, user feels nothing

③ Who it conflicts with: with cost (redundancy = paying to keep a pile of "normally unused" backup resources), with consistency (this is the core conflict, covered below), and with simplicity (multi-active, failover, health checks are all complexity).


III. Scalability

① How to measure: when the load (users, data volume, request volume) grows N-fold, can the system withstand it smoothly by adding resources, rather than crashing or needing a rebuild? Good scalability means "add machines and it withstands more," at a roughly linear cost.

② How to achieve — first tell apart two ways to scale:

   Scale Up (vertical)             Scale Out (horizontal)
   upgrade a single machine        add more machines
   (stronger CPU, more RAM)        (one becomes ten, ten becomes a hundred)

   ┌──────┐    ┌──────────┐       ┌──┐         ┌──┐┌──┐┌──┐┌──┐
   │ small │ ─▶ │  big       │       │M │  ──────▶ │M ││M ││M ││M │
   │machine│    │ machine    │       └──┘         └──┘└──┘└──┘└──┘
   └──────┘    └──────────┘

   Pro: simple, no code change      Pro: in theory unlimited, with redundancy
                                         as a bonus
   Con: physical ceiling, worse      Con: the architecture must support it (the
        value the pricier it gets,        key prerequisite — components must be
        and this machine is itself        "stateless"! see Chapter 05)
        a single point

Key insight: vertical scaling is "treating the symptom," with a ceiling; horizontal scaling is "treating the root," but it has a hard prerequisite — the component must be horizontally scalable. And what scales out well? The stateless. This connects straight to the first principle of Chapter 05: stateless is easy to scale, stateful is hard. So "scalability" is largely the art of "making the system as stateless as possible, and corralling the hard-to-handle state into a few dedicated spots (sharding, replication)."

③ Who it conflicts with: with consistency (horizontal scaling = multiple copies/shards = consistency gets harder, see Chapter 05), with simplicity (distributed is always more complex than single-machine), and with cost (in the short term, reworking the architecture to be horizontally scalable is itself an investment).


IV. Consistency — picking up from 05

We covered this fully in 05 · Data & State, so here we just put it back on the quality-attribute chessboard and stress its "conflict nature."

  • ① How to measure: on the "strong consistency ←→ eventual consistency" spectrum, where does your data fall? How soon after a write are all reads guaranteed to see the newest value?
  • ② How to achieve: strong consistency relies on transactions and coordination protocols (at the price of being slow, possibly refusing service); eventual consistency relies on async sync and the BASE approach (at the price of an "inconsistency window").
  • ③ Who it conflicts with: it conflicts with nearly all of "scalability / availability / performance." CAP already sealed it: during a partition, pick one of consistency and availability. This is the most fundamental, most unavoidable conflict in architecture.

A one-line recap: strong consistency is expensive — spend it where things truly go wrong (money, inventory); use eventual consistency elsewhere to buy availability and scaling. See Chapter 05.


V. Security

① How to measure: security is hard to measure with a single number, but you can get clear on: what's the attack surface? in the worst case, what gets leaked/lost? It's more of a "floor" than an "the-more-the-better metric."

② How to achieve — a few principles that run throughout:

  • Never trust input: treat all data coming from outside (user input, third-party callbacks, even upstream services' responses) as potentially poisonous.

    Remember "prompt injection" in AI products? That's just the old rule of "don't trust input" in a new AI-era form. Any external text re-entering the core system is untrusted.

  • Least privilege: each component gets only the minimum privilege it needs to do its work, no more. If one spot is breached, the loss is confined to the smallest range.
  • Defense in depth: don't count on one wall to block everyone. Auth, rate limiting, encryption, isolation, auditing… layer upon layer; breach one and there's the next.
  • Tiered data protection: sensitive data (passwords, payments, privacy) is encrypted in transit and at rest, and strictly isolated and auditable.

③ Who it conflicts with: with performance (every check, every encrypt/decrypt has overhead), with usability / maintainability (security measures often make the system harder to use and development more cumbersome — the famous "security vs convenience" tug-of-war), and with cost (security is a continuous investment).

What's special about security: in normal times it's an "invisible cost," and when something goes wrong it's a "fatal price." You can't skimp on it citing "low ROI" — one data breach can shut a company down outright.


VI. Maintainability / Evolvability

① How to measure: to change a feature, fix a bug, add a feature — how long does it take? How quickly does a newcomer get up to speed? Will a change ripple through the whole body and trigger unexpected cascading failures? This measures the system's composure in the face of "change."

This is the most easily overlooked yet most far-reaching attribute. Because software spends 99% of its time in the state of "being modified" — you write it once but change it a thousand times. A system that runs blazing fast but that nobody dares touch is, in the long run, an enormous liability.

② How to achieve: clear module boundaries and low coupling (change one spot without rippling through the whole, see the layering and modularization of Chapter 04), high cohesion (put related things together), good abstractions, ample observability (logs/monitoring/tracing, so you can see what's happening inside the system), and — writing down decisions and their reasons.

This is exactly what 08 · Architecture Decision Records & Evolution is about: using ADRs to record "why we decided this way back then." The you of six months later (or your successor), seeing a strange design, most wants to know "is this deliberate, or historical baggage?" — evolvability depends largely on "whether successors can read their predecessors' trade-offs."

③ Who it conflicts with: with performance (many extreme performance optimizations cost you code that's obscure, hard to understand, hard to change), and with short-term delivery speed (writing clean code and leaving good extension points takes more time — this is the source of "tech debt": borrowing tomorrow's maintainability for today's launch speed).


VII. Cost — the most overlooked, yet often the real constraint

① How to measure: the total spend on servers / storage / bandwidth / third-party services / human ops. Most useful is to look at "unit cost" — how much per user, per request, per order, per thousand tokens (remember that "cost per thousand tokens" was the number-one metric for AI products?).

② I'm going to say a few heavy words just for this:

Cost is the quality attribute engineers most often overlook, but it's often the one that truly blocks you — an unsurmountable constraint.

When beginners draw architecture diagrams, their minds often default to "infinite resources" — add a few more caches, a few more replicas, a few more services, demand five 9s… each one sounds "better." But the reality is: these "betters" all burn money, and money is finite. An architecture that's theoretically perfect but burns the company into bankruptcy is a failed architecture.

A few counterintuitive things about cost:

  • It compounds quietly: an inefficient design costs a few hundred extra per month at 10,000 users and nobody cares; at 100 million users it's a few million extra burned per month — the same design flaw, amplified by scale into an astronomical figure.
  • It's tied to nearly every means you learned earlier: redundancy costs money, caching costs money, multiple copies cost money, strong consistency (because it's hard to scale and needs beefier machines) costs money, low latency (needs more and better resources) costs money. Every quality attribute you pay for ultimately lands on the "cost" bill.
  • It's often the "real constraint": many architecture debates, dug down to the bottom, turn out to be not about "is it technically feasible" but "is this bit of money / this bit of headcount worth doing it this way."

③ Who it conflicts with: it conflicts with nearly every other quality attribute — because raising any one of them most likely means spending more money. Cost is the "accountant" standing behind every trade-off, making the final call.


The core: these attributes conflict, you can't have them all

String the seven above together and you find a "web of conflicts." This is this chapter's — and all of architectural thinking's — most important lesson:

There is no system that's "high-performance and strongly consistent and highly available and cheap and simple and secure" all at once. Maxing out any one attribute sacrifices others. The architect's job isn't to "have it all," but to "decide, by the business, which to keep and which to give up."

A few groups of classic conflicts, to carve into your brain:

   ┌─────────────────────────────────────────────────────────────┐
   │  ① Consistency vs Availability  (CAP: during a partition,     │
   │      money/inventory lean consistent ◀──▶ likes/feed lean     │
   │      available — can't have both)                             │
   │                                                               │
   │  ② Performance vs Cost  (faster nearly always means pouring in │
   │      more/pricier resources)                                  │
   │                                                               │
   │  ③ Flexible/scalable vs Simple  (microservices are flexible    │
   │      but complexity explodes; a monolith is simple but hard to │
   │      scale independently — see Chapter 04)                    │
   │                                                               │
   │  ④ Security vs Convenience/Performance  (every line of defense │
   │      adds friction and overhead)                              │
   │                                                               │
   │  ⑤ Launch speed vs Maintainability  (rushing = borrowing tech  │
   │      debt, repaid with interest sooner or later)              │
   └─────────────────────────────────────────────────────────────┘

The most classic intuition diagram is the "consistency–availability–latency" impossible triangle — you can hardly max out all three corners at once; optimizing any one usually means yielding on the other two:

                         Consistency (C)
                        (data always newest)
                          ╱      ╲
                        ╱          ╲
                      ╱  you can only ╲
                    ╱  stand somewhere  ╲
                  ╱  in this triangle;    ╲
                ╱  near a corner = stronger ╲
              ╱  on it, at the price of     ╲
            ╱  being farther from the other two╲
            ╱____________________________________╲
       Availability (A)                       Low latency (L)
    (serve at any time)                       (lightning response)

   • Want near "Consistency" (strong consistency + sync) → usually sacrifice
     availability or latency
   • Want near "Low latency" (cache + async) → usually sacrifice consistency
   • Want near "Availability" (multi-copy redundancy) → during a partition,
     usually give up strong consistency

Don't take this diagram as a strict mathematical theorem — use it as intuition: every time someone says "I want it all," lay out this triangle (or the web of conflicts above) and ask, "So which corner do you plan to back away from?"

How to break the deadlock? The answer is always the same sentence: go back to the business and prioritize. There's no one-size-fits-all optimal solution, only "the more reasonable solution given this business, this scale, this budget" — exactly the principle in the README: there is no best architecture, only the most fitting architecture.


How to discuss trade-offs with the boss / product manager: translate "technology" into "business consequences"

This is, in the quality-attributes chapter, the most practical thing, and the one engineers most often get wrong.

You run to the boss and say, "We should go eventual consistency, not strong consistency." — The boss is baffled, feels you're showing off jargon, and then makes the call by gut. The one in the wrong isn't the boss — it's you. The priority of trade-offs should be set by people who understand the business; your job is to translate technical choices into language they understand and can decide on — namely money, risk, and time to launch.

The translation formula: don't talk technical parameters, talk business consequences.

   ❌ The engineer's version (the other side can't follow, can't decide)
      "Using strong consistency here drops write throughput; we'd have to shard,
       and it'd sacrifice availability."

   ✅ Translated into business consequences (the other side can make the call)
      "Plan A (strong consistency): guarantees not a cent is ever wrong, but during
                       a big sale there may be queuing and some slowdown, and it
                       costs ¥X0,000 more per month in ops.
       Plan B (eventual consistency): fast and cheap, but in extreme cases a user's
                       balance may display inaccurately for a few seconds — can we
                       accept that?
       My recommendation is A, because this is money, and one error costs far more
                       than what we'd save. What do you think?"

A few communication mental rules:

  1. Always give "options + costs," not "a conclusion." Lay out paths A and B and, for each, "what you get, what you give up, how much it costs, how soon it ships," and let the business side choose informed. This both respects their decision authority and protects you (the decision was made jointly).
  2. Quote in three currencies: money, risk, time. Almost any technical trade-off can be converted into "how much more/less money," "the probability and consequence of failure," and "how much earlier/later it ships." These three are the universal language of the business world.
  3. Tie "quality attributes" to "business metrics." Don't say "availability needs four 9s," say "at our average order value, an hour of downtime loses about ¥Y0,000, so it's worth investing Z in availability."
  4. Clearly separate "the floor" from "the optimizations." Security compliance and the correctness of money are usually non-negotiable floors; optimizing latency from 200ms to 100ms may just be icing on the cake. Don't make everything sound equally urgent, or you'll lose credibility.

A good architect has half their craft at the whiteboard and the other half in the meeting room. You must not only make the right trade-offs but also explain them clearly, so the people footing the bill and the people setting the requirements make a wise choice together with you.


📌 Real-world cases: how much is "a number of 9s" actually worth

  • Durability: AWS S3 is designed for 11 nines (99.999999999%) of durability — intuitively, store 10 million objects and it'd take roughly 10,000 years to possibly lose one. The cost is data redundancy across at least 3 availability zones. (S3 FAQ)
  • Availability: Google SRE made it crystal clear with the "error budget" — since 100% is impossible, set an SLO (say 99.9%), and the remaining 0.1% is the "error budget"; when it's used up, stop shipping new features and focus on stability. It turns availability from mysticism into a manageable budget. (Google SRE Book)

Proving this chapter's line: each extra 9 raises cost by an order of magnitude — so go back to the business and ask "does this system really need that many 9s?"


Chapter summary

  • Quality attributes decide whether a system is "good," and they're almost all trade-offs — this is architecture's main battlefield. To assess each attribute, ask: how to measure, how to achieve, who it conflicts with.
  • The seven core attributes:
    • Performance: split latency vs throughput, look at P99 not the average, perceived latency often matters more than real latency.
    • Availability: measured in "nines" (mapping to yearly downtime), achieved via redundancy + eliminating single points + fault-domain isolation; each extra 9 raises cost by an order of magnitude.
    • Scalability: vertical scaling (symptom, has a ceiling) vs horizontal scaling (root, but requires statelessness).
    • Consistency: picking up from 05, strong consistency is expensive and fundamentally conflicts with scaling/availability/performance.
    • Security: never trust input, least privilege, defense in depth; it's a floor, not an optimization.
    • Maintainability / evolvability: the most overlooked yet most far-reaching, because software is always being changed.
    • Cost: the most overlooked, yet often the real constraint; every attribute you pay for ultimately lands on this bill.
  • The core truth: these attributes conflict, you can't have them all (consistency vs availability, performance vs cost, flexible vs simple…). The way to break the deadlock is always: go back to the business and prioritize — there is no best architecture, only the most fitting.
  • The craft of discussing trade-offs: translate technical choices into money, risk, time to launch; always give "options + costs," not a conclusion, and let the business side decide informed.

Bridging forward: here, "Part Two: Master the Toolbox" (04 patterns / 05 data / 06 trade-offs) is complete — you have cards in hand, you understand the hard bone of data, and you grasp that everything is a trade-off. Next comes Part Three: Practice and Evolution. 07 · Designing a System from 0 to 1 gives you a methodology you can follow to produce an architecture, putting the judgment from the first six chapters truly to work.