Articles on CTO Field Notes

Staying One Step Ahead

Sun, 05 Apr 2026 00:00:00 +0000

The CTO’s job is not to see the future. It is to reduce surprise.

Startups rarely fall behind all at once. They fall behind gradually, then suddenly.

At first, the signals are small. Pull requests get larger. Deployments become more stressful. Engineers start avoiding certain parts of the codebase. Product debates become more opinion-driven. Customer feedback reaches the team too late. Planning meetings get longer, but decisions do not get better.

Nothing looks broken yet. But the system is already losing speed, clarity, and resilience.

By the time the roadmap slips, the warning signs have usually been visible for weeks or months¹. By the time reliability becomes a company-level concern, the patterns have already appeared in incidents, alerts, retries, and manual workarounds. By the time technical debt becomes urgent, engineers have already been paying interest on it for a long time².

This is why one of the CTO’s most important responsibilities is staying one step ahead.

Not by predicting the future. Not by chasing every new tool. Not by turning the engineering organization into a playground for whatever is trending this week.

Staying one step ahead means building a system that notices change early, learns quickly, and acts before friction compounds. Most leaders discover problems when they become expensive. CTOs need to discover them while they are still cheap.

Tools Extend the Organization’s Senses#

A CTO cannot personally inspect every pull request, customer conversation, roadmap decision, incident, architecture tradeoff, hiring bottleneck, or production anomaly.

At some point, the job is no longer to see everything directly. The job is to design systems that see.

This is where tools matter.

Not because tools are inherently valuable. Most tools are noise unless they change behavior.

Tools matter when they extend the organization’s ability to sense, decide, act, and learn. Observability shows what production is actually doing. Product analytics show what users are actually doing. CI surfaces whether the system still tolerates change. Feature flags keep decisions reversible. Customer feedback tools expose where assumptions are breaking. Planning systems reveal whether commitments are based on evidence or hope. Internal knowledge systems show whether the team can reuse what it already knows. AI assistants change how cheaply the team can explore, summarize, prototype, and compare options.

The point is not to accumulate more dashboards, workflows, or reports. The point is to reduce the time between signal and action.

A good tool makes reality harder to ignore. A great tool makes the right action easier to take.

The CTO’s Leverage Stack#

The best CTOs do not simply accumulate tools. They build leverage.

A tool is something you introduce. Leverage is what changes the slope of the organization.

The CTO’s job is to understand the current constraint and apply the right form of leverage against it—technical, organizational, architectural, or cultural.

There are five loops every CTO should continuously strengthen.

1. Sense Earlier#

The first advantage is noticing weak signals before they become visible failures.

Most expensive problems begin as cheap signals. Reliability issues start as noisy alerts or repeated manual fixes. Delivery issues start as larger pull requests, slower reviews, or unclear ownership. Product issues start as customer confusion or low adoption. Team issues start as decision bottlenecks or prolonged onboarding.

The CTO needs mechanisms that surface these signals early: observability, analytics, feedback loops, engineering metrics, incident reviews, deployment data, and decision records.

But the tool is not the point.

What would we know earlier because this exists?

If the answer is unclear, the tool is probably theater.

2. Decide Faster#

Sensing without decision-making is wasted signal.

Many organizations collect data but still move slowly. They have dashboards without owners, metrics without thresholds, and discussions without decisions.

Good CTOs create decision systems: RFCs for explicit tradeoffs, architecture reviews for irreversible choices, metrics for grounding debates, and decision logs to capture reasoning.

The goal is not perfect decisions. It is timely, explicit, and reversible decisions.

A slow decision is not more thoughtful. Often, it is avoidance disguised as rigor.

3. Act Smaller#

The larger the action, the harder it is to learn.

Large releases obscure causality. Large rewrites obscure risk. Large migrations obscure sequencing problems.

To stay ahead, organizations must operate in smaller units: feature flags, canary releases, trunk-based development, automated tests, and safe rollback paths.

These are not conveniences. They are strategic tools that reduce blast radius and preserve optionality.

The same applies beyond engineering. Test demand before building fully. Migrate incrementally. Pilot organizational changes before scaling them.

Small actions keep the cost of being wrong low. And when the cost of being wrong is low, learning accelerates³.

4. Learn Continuously#

Shipping without learning is not speed—it is output⁴.

The real loop is:

observe → decide → act → learn → adjust

Every system should tighten this loop.

After shipping, do we know adoption? After incidents, do we reduce recurrence? After refactors, did change become easier? After planning, where were we wrong?

Learning requires more than instrumentation. It requires behavior change. If nothing changes, nothing was learned.

5. Compound Knowledge#

Scaling organizations often fail by relearning.

Solutions stay local. Decisions are repeated. Lessons are lost.

Knowledge must be captured and reusable: decision records, architecture notes, onboarding paths, runbooks, postmortems, and searchable internal systems.

The goal is not documentation. It is reuse at the moment of need.

Compounding knowledge is quiet leverage—but over time, it separates teams that scale from teams that stall.

AI Changes the Cost of Exploration#

AI lowers the cost of many previously expensive actions: exploring options, summarizing data, drafting plans, and analyzing systems.

This is a real shift.

But cheaper exploration increases the importance of judgment.

When generating options becomes trivial, selecting the right ones becomes the bottleneck. Without strong judgment, teams produce more code, more plans, and more noise—faster.

That is not progress. That is scaled confusion.

The key question is not:

Can we do this faster with AI?

It is:

Does this improve decisions, reduce risk, or accelerate learning?

If yes, use it aggressively. If not, it is likely accelerating waste.

Tools Amplify the System You Already Have#

Tools feel like progress because they are tangible. But they do not fix broken systems—they amplify them.

AI amplifies code quality—good or bad. Dashboards amplify ownership gaps. Planning tools amplify prioritization issues. Documentation amplifies maintenance discipline.

A tool is only useful when tied to behavior.

Visibility without ownership creates anxiety. Ownership without visibility creates surprises.

You need both.

Staying Ahead Means Designing for Optionality#

The best CTOs do not predict better. They stay adaptable.

They keep deployments small, decisions reversible, feedback loops short, systems understandable, and knowledge accessible.

They preserve optionality⁵.

That is what staying ahead really means—not certainty, but readiness while the cost of preparation is still low.

The CTO Question#

Every CTO should regularly ask:

What are we going to wish we had noticed earlier?

This question forces proactive thinking. It shifts focus from reacting to anticipating.

It also reframes how tools are evaluated.

The right tool is not the newest or most popular. It is the one that improves sensing, decision-making, action, learning, or knowledge reuse.

Everything else is optional.

Closing#

Staying one step ahead is not intuition. It is discipline.

It means building systems where signals surface early, decisions are explicit, actions are reversible, and learning compounds.

The CTO’s job is not to see the future.

The CTO’s job is to reduce surprise.

And the best way to reduce surprise is to build systems that learn faster than problems compound.

References#

Predictable Delivery in an Unpredictable World. https://avivzaken.com/docs/predictable-delivery-in-an-unpredictable-world/ ↩︎
Technical Debt Is a Tool. https://avivzaken.com/docs/technical-debt-is-a-tool/ ↩︎
De-Risking Big Bets Without Killing Innovation. https://avivzaken.com/docs/derisking-big-bets-without-killing-innovation/ ↩︎
Velocity Is the Product. https://avivzaken.com/docs/velocity-is-the-product/ ↩︎
Shipping in Uncertainty: Field Notes on Startup SDLC. https://avivzaken.com/docs/shipping-in-uncertainty-sdlc/ ↩︎

The Roadmap Problem No One Admits

Sun, 15 Mar 2026 00:00:00 +0000

A few weeks ago, I sat down with a founding team that felt something was off.

They weren’t stuck. Quite the opposite - they were moving fast. There was activity everywhere: features in progress, decisions being made, code being written.

But nothing seemed to land.

Features dragged. Decisions lingered. Shipping felt inconsistent.

“We’re slower than we should be,” one of them said.

From the outside, nothing obvious was broken. The team was strong. The product direction made sense. There was no chaos.

So I asked a simple question:

“What’s supposed to ship in the next two weeks?”

They started listing things - onboarding improvements, a new API direction, analytics, infrastructure cleanup.

It sounded like momentum.

Then I asked again:

“No - what is actually expected to be done?”

That’s when things got quiet.

The Illusion of Progress#

We pulled up their board. Nine active items.

At a glance, it looked healthy - work in motion, engineers engaged, no obvious bottlenecks.

But as we went through each item, a pattern emerged. The onboarding work was “almost there,” but still missing edge cases. The API effort was split between two directions, neither resolved. Analytics had backend work, but no usable surface. Infrastructure had been “improved,” but not stabilized.

Nothing was blocked. Nothing was abandoned.

But nothing was finished.

So I told them:

“You don’t have nine projects. You have zero.”

Individually, everything had progressed. Code existed. Effort was real.

But at the level that matters - what users experience, what the business can rely on - none of it existed yet.

This is the subtle failure mode where teams confuse motion with delivery. And once you fall into it, velocity starts to look intact while actually degrading underneath¹.

How Teams End Up Here#

This state rarely comes from bad decisions. It comes from non-decisions.

At some point, someone says: we should improve onboarding, we’ll probably need a better API here, can someone look into analytics, we should clean this up at some point.

These aren’t commitments. They’re signals.

But in small teams, signals are enough. Someone picks it up, starts exploring, opens a branch, pushes things forward.

Work begins.

But no one ever decides that it needs to end.

So it accumulates.

In practice, this is what “shipping in uncertainty” looks like when it’s unmanaged - continuous bets being placed, without a clear mechanism for deciding which ones actually need to resolve².

What This Does to Execution#

The cost isn’t obvious at first.

From the outside, the team looks productive. There are updates, pull requests, and visible movement across multiple fronts.

But internally, something breaks.

There’s no clear center of gravity. Engineers start their day without a single dominant objective. They move between tasks, resume half-finished work, and switch context as soon as friction appears.

Work stops converging.

Instead of pushing one thing to completion, the team advances several things slightly. You get a steady stream of partial progress - and almost no finished outcomes.

Over time, quality degrades. Not because people are careless, but because nothing stays in focus long enough to be done properly.

This is the inverse of what high-velocity teams optimize for. They don’t maximize activity - they maximize completed, shippable outcomes¹.

The Reset#

With this team, we did something simple - and uncomfortable.

We took every active item and asked a single question:

“Are we committed to finishing this in the next two weeks?”

Not whether it was important. Not whether it might matter later.

Only whether we were willing to finish it now.

If yes, it stayed.

If not, it stopped - completely. No background progress, no partial ownership, no “we’ll keep chipping away.”

This is the same discipline behind small bets and fast learning: forcing work into a form where it can actually resolve, instead of lingering indefinitely³.

Where It Gets Hard#

This is the point where most teams hesitate.

Every item comes with a justification. You’ve already invested time. It will likely matter soon. You’re close to something useful. Dropping it feels wasteful.

All of that is true. None of it is relevant.

The real question isn’t whether something is valuable - it’s whether you’re willing to delay everything else for it.

That’s what being on the roadmap actually means.

What Changed#

They ended up keeping three items. Everything else was paused.

At first, it felt slower. There were fewer updates, fewer parallel threads, less visible activity.

But within two weeks, the difference was obvious. One feature shipped cleanly, end-to-end. The API direction converged. Infrastructure issues were actually resolved.

Nothing about the team changed.

No new process. No new tools.

Just fewer things, taken seriously.

The Role of “No”#

Early-stage teams tend to avoid saying no. It feels like giving up optionality, like slowing down.

In reality, it’s the opposite.

Every “yes” adds surface area: more coordination, more dependencies, more decisions deferred into the future.

Without constraint, the system becomes harder to reason about - and eventually, harder to trust.

Saying “no” isn’t about rejecting ideas.

It’s about protecting the conditions required to finish them.

This is also why breaking work into smaller, committed slices matters. Without that discipline, you don’t just take on more work - you take on more unresolved risk⁴.

What a Roadmap Actually Represents#

Most teams treat the roadmap as a list of things they want to do.

In practice, it’s a list of trade-offs.

Every item carries an implicit statement:

This is more important than everything we are not doing right now.

If that trade-off isn’t explicit, the roadmap loses meaning.

Execution becomes reactive - driven by momentum, interruptions, and partially completed work.

A Simple Test#

If you want to know whether your roadmap is real, don’t look at priorities.

Look at what’s currently in progress.

For each item, ask:

“Are we committed to finishing this?”

If the answer isn’t clearly yes, you’re not prioritizing.

You’re accumulating.

Closing#

The teams that move fastest aren’t the ones that generate the most ideas.

They’re the ones willing to ignore most of them.

Because in practice, speed doesn’t come from doing more.

It comes from finishing what you start - and being deliberate about what you never start at all.

References#

Velocity Is the Product. https://avivzaken.com/docs/velocity-is-the-product/ ↩︎ ↩︎
Shipping in Uncertainty: Field Notes on Startup SDLC. https://avivzaken.com/docs/shipping-in-uncertainty-sdlc/ ↩︎
De-Risking Big Bets Without Killing Innovation. https://avivzaken.com/docs/derisking-big-bets-without-killing-innovation/ ↩︎
4 Musts of Building a New Feature. https://avivzaken.com/docs/4-musts-of-feature-dev/ ↩︎

Predictable Delivery in an Unpredictable World

Wed, 25 Feb 2026 00:00:00 +0000

Startups don’t fail because they move slowly. They fail because they lose control of their velocity.

In the early days, speed feels like momentum. You ship fast, customers respond, features pile up. You knowingly cut corners - tests can wait, architecture can evolve later, observability is “good enough for now.” And for a while, it works.

Until it doesn’t.

A small change breaks something unrelated. Estimates become fiction. Engineers warn each other before touching certain parts of the codebase. Releases feel tense. You are still shipping - technically - but every deployment feels like defusing a bomb.

This is tech-debt hell.

And the way out isn’t to slow down. It’s to redefine what velocity actually means - and build a system that protects it.

Velocity Is the Product#

In Velocity Is the Product, I argued that velocity is not a byproduct of engineering work - it is the product of a healthy engineering system¹.

Velocity is not story points per sprint. It’s not features per quarter. It’s not raw speed.

Velocity is the rate at which your organization can safely turn ideas into validated learning.

When velocity is healthy:

Changes behave as expected.
Feedback loops are tight.
Risk is surfaced early.
Engineers are confident making modifications.

When velocity degrades, something deeper is wrong. And most of the time, unmanaged technical debt is the underlying cause.

In Technical Debt Is a Tool, I reframed debt as information². Debt signals where we intentionally traded long-term safety for short-term progress. That trade-off is often rational. The mistake is pretending the trade-off doesn’t exist.

Debt becomes dangerous when it stops being visible.

Once confidence drops, velocity drops - even if output temporarily appears high. The system becomes fragile. And fragile systems cannot deliver predictably.

Why Tech Debt Turns Into Chaos#

Every feature increases complexity. That’s unavoidable.

New code interacts with old assumptions. Edge cases multiply. Implicit coupling becomes explicit pain.

In a healthy system, complexity is absorbed. Tests fail fast. Monitoring surfaces regressions. Refactoring reduces fragility before it compounds.

In an unhealthy system, complexity accumulates silently. Features are patched over symptoms. “Temporary” fixes become permanent. Knowledge becomes tribal. Each release increases the blast radius of the next one.

At this point, teams misdiagnose the problem. They think they need:

Better estimation.
Stricter process.
More planning.
Harder deadlines.

But estimation doesn’t reduce fragility. Process doesn’t compensate for missing feedback loops. Pressure doesn’t restore confidence.

What restores confidence is redesigning the system of delivery itself.

Shipping in Uncertainty#

In Shipping in Uncertainty, I described the SDLC as a learning engine - not a ceremony checklist³.

Uncertainty is not an exception in startups. It is the default state.

If your SDLC assumes clarity, it will break under reality.

A resilient SDLC optimizes for:

Small batch sizes.
Frequent integration.
Early validation.
Tight feedback loops.

Large features feel efficient on paper. In practice, they defer risk discovery. By the time the feature ships, assumptions are outdated and complexity is entrenched.

Small slices, by contrast, reduce both technical and product uncertainty. They expose hidden dependencies earlier. They allow course correction before complexity calcifies.

Predictable delivery is not about predicting everything upfront. It’s about reducing the time between assumption and feedback.

The shorter that loop, the more controllable your system becomes.

From Firefighting to Managed Debt#

Moving out of tech-debt hell requires a mental shift:

Stop treating debt as something you “pay down later.” Start treating it as a continuous signal.

Managed debt looks like this:

Engineers can articulate where the fragile areas are.
Refactoring happens alongside feature work.
Tests and observability provide fast failure signals.
Risk is discussed explicitly during planning.

Unmanaged debt looks like this:

No one wants to touch certain modules.
Refactoring is perpetually postponed.
Features require excessive coordination.
Surprises happen late.

The key operational shift is simple but powerful:

Refactor toward safety before adding behavior.

When touching a fragile area, first improve its structure. Strengthen tests. Reduce coupling. Clarify interfaces. Only then add the new feature.

This prevents complexity from compounding exponentially.

You don’t escape debt through heroic clean-up efforts. You escape it through disciplined, continuous micro-improvements.

Debt becomes manageable when it is integrated into daily delivery, not isolated into “future cleanup.”

Scoping as Risk Management#

Most scoping failures aren’t effort failures - they’re risk failures.

Teams scope features as delivery commitments instead of learning hypotheses.

A healthier framing asks:

What assumption are we testing?
What is the smallest deployable increment that tests it?
What technical risk are we introducing?
What will we learn if this fails?

This aligns directly with the idea that velocity equals learning¹. If a feature doesn’t reduce uncertainty, it’s not increasing velocity - it’s increasing complexity.

Proper scoping also prioritizes reversibility:

Merge is decoupled from release.
Feature flags reduce blast radius.
Rollback paths are tested.
Observability is in place before exposure.

When rollback is cheap, experimentation is safe. When rollback is expensive, fear dictates design decisions.

Predictable delivery is not about eliminating mistakes. It’s about ensuring mistakes are survivable.

The Core Shift: From Output to System Design#

The deepest change required is not technical - it’s conceptual.

Most organizations optimize for output: more features, faster timelines, visible progress.

But predictable delivery is not an output problem. It’s a system design problem.

Here’s the real shift.

1. From Speed to Stability of Change#

Speed without stability creates volatility. Stable change creates sustained velocity.

Ask:

How often do changes behave as expected?
How quickly can we detect regressions?
How easy is it to reverse a bad decision?

If these answers are weak, increasing speed will amplify instability.

Stability of change - not raw throughput - determines long-term velocity.

2. From Feature Delivery to Risk Reduction#

Every feature should reduce uncertainty somewhere:

Product uncertainty (will users care?)
Technical uncertainty (will this scale? integrate?)
Operational uncertainty (can we monitor and support this?)

If a feature increases uncertainty without reducing any, it is accumulating debt.

Predictable organizations explicitly track risk reduction as part of delivery. They celebrate learning milestones, not just release milestones.

3. From Local Optimizations to System Thinking#

Teams often optimize locally:

Faster coding.
More parallel work.
Larger sprint commitments.

But if integration, testing, deployment, or observability lag behind, the entire system slows.

Predictable delivery requires viewing engineering as a single flow system:

Idea → Implementation → Integration → Deployment → Observation → Learning.

A bottleneck anywhere reduces overall velocity. Fixing code speed while ignoring deployment safety doesn’t improve the system.

4. From Heroics to Boring Excellence#

Tech-debt hell rewards heroics: late nights, emergency fixes, “save the release” moments.

Predictable systems eliminate the need for heroics.

Releases become routine. Failures are contained. Improvements are incremental.

This feels less dramatic - but it’s vastly more powerful.

Boring excellence compounds.

5. From Slogans to Mechanisms#

The shift only works if supported by mechanisms:

Continuous integration with real signal.
Mandatory code review with structural focus.
Observability as part of feature definition.
Capacity reserved for structural improvements.
Postmortems that lead to system changes, not blame.

Without mechanisms, principles decay into slogans.

With mechanisms, discipline becomes default behavior.

Predictable in an Unpredictable World#

You cannot eliminate uncertainty in startups. Markets change. Users surprise you. Assumptions break.

But you can build an engineering system that absorbs change instead of amplifying it.

Tech debt is not the enemy. Unmanaged risk is.

Speed is not the goal. Sustainable velocity is.

Predictable delivery emerges when:

Debt is visible and managed.
Scope is framed as risk reduction.
Batch sizes stay small.
Feedback loops stay tight.
Leadership rewards system health, not output theater.

When that happens, features stop feeling dangerous. Estimates become directional rather than fictional. Shipping becomes routine rather than dramatic.

Velocity becomes real.

And in an unpredictable world, that stability of delivery is your competitive advantage.

References#

Velocity Is the Product. https://avivzaken.com/docs/velocity-is-the-product/ ↩︎ ↩︎
Technical Debt Is a Tool. https://avivzaken.com/docs/technical-debt-is-a-tool/ ↩︎
Shipping in Uncertainty: Field Notes on Startup SDLC. https://avivzaken.com/docs/shipping-in-uncertainty-sdlc/ ↩︎

De-Risking Big Bets Without Killing Innovation

Sun, 15 Feb 2026 00:00:00 +0000

Innovation and big bets are the lifeblood of startups - the only way you escape commoditization and create new paradigms. But those bets come with risk: uncertainty, unknown unknowns, and the possibility that a huge investment of time, capital, and attention delivers very little value in return.

The challenge leaders face isn’t avoiding risk - it’s managing uncertainty in a way that preserves creativity and learning.

At its core, de-risking isn’t about insuring every decision against failure. It’s about discovering what must be true for a big idea to succeed, validating those assumptions early, and structuring your organization so that you learn more than you build. The teams that innovate well are the ones that learn the fastest - not necessarily the ones that build the most.

1. Risk and Innovation Are the Same Conversation#

The vocabulary of risk often makes teams think in terms of “bad things that might happen” - technical debt, schedule slips, shifting markets. But in innovation, risk and learning are two sides of the same coin.

Every feature, product, or business model carries assumptions:

Will customers care?
Can we build it?
Is it technically feasible at scale?

These are not questions you answer by guessing well. They are questions you answer by turning assumptions into experiments.

If your first instinct is a giant effort plan, you’ve already placed your bet.

Instead, start with risk - not the solution.

Effort vs. Risk framing:

Low effort, low risk → build
Low effort, high risk → experiment
High effort, high risk → reduce uncertainty before committing

This reframing forces teams to ask the most important question in innovation:

What do we not know - and how will we find out?

2. Small Bets, Big Learning#

Innovation doesn’t happen in monolithic chunks. It happens in small, deployable increments that expose assumptions to reality as early as possible.

Large projects hide feedback. Small slices reveal it.

When work is structured this way:

Feedback loops shorten
Mistakes surface when they’re cheap
Decisions stay reversible
Teams optimize for learning velocity, not delivery velocity

This isn’t dogmatic agility - it’s survival strategy.

Cheap failures are data, not embarrassment.

Feature flags, incremental rollouts, and thin vertical slices of value all serve one purpose: reduce the cost of being wrong.

That is what de-risking actually means.

3. Build Metrics Before You Build Product#

A product without success metrics isn’t a product - it’s speculation in code form.

If you can’t define how you’ll measure success before writing a single line of code, you’re not ready to build.

Good success metrics are:

Measurable - observable and instrumented
Actionable - tied to real decisions
Aligned - connected to actual user or business outcomes

Metrics are not reporting tools. They are learning objectives.

They tell you when to double down, pivot, or stop.

Without them, you’re flying blind - and blind bets are expensive.

4. Design Learning, Not Hype#

Innovation isn’t a sprint. It’s a structured exploration under uncertainty.

Shipping something is not the same as succeeding.

Practical principles:

Validate assumptions before execution
Separate discovery from delivery
Make decisions reversible for as long as possible
Stop when the data tells you to stop

In startups, the SDLC isn’t about shipping software. It’s a decision-making framework under uncertainty.

Treat it as such.

5. Preserve Innovation While Managing Risk#

De-risking should never mean risk-aversion.

Without risk, there is no upside. Without upside, innovation dies quietly.

The goal isn’t to eliminate risk - it’s to make risk intentional and informed, not accidental and blind.

That means:

Lean experiments before big builds
High-fidelity learning before heavy engineering
Evidence before executive certainty
Feedback loops instead of forecasts

This is how you protect innovation without suffocating it.

6. Learn Faster Than the World Changes#

The best innovation organizations don’t fail less. They fail better.

They extract signal from every experiment - success or failure - and feed it forward into smarter decisions.

Big bets don’t disappear. They become smart bets.

And in a world where change is the only constant, your ability to de-risk intelligently determines your ability to innovate meaningfully.

Velocity Is the Product

Sun, 25 Jan 2026 00:00:00 +0000

Speed Is Not a Metric — It’s the Mission#

Startups don’t exist to write elegant code. They exist to learn faster than everyone else.

Feature delivery velocity is simply the visible symptom of that learning rate. When velocity drops, insight slows. When insight slows, the company drifts — even if everything still “works.”

This is why speed cannot be treated as a secondary concern, something to optimize once the system is mature. By the time a startup is scaling, speed is already the most fragile asset it has.

Every decision either compounds velocity or quietly taxes it.

When Growth Turns Against You#

Hyper-growth feels like success, but it carries a hidden cost.

The product expands. The team grows. The surface area of the system explodes. And suddenly, the behaviors that once felt responsible begin to suffocate progress.

Releases become infrequent because they are risky. Risk grows because releases are infrequent.

It shows up in subtle ways:

Features wait “until the next release”
Branches live too long
Teams coordinate instead of acting
Shipping becomes ceremonial

The organization slows not because people are careless — but because they are rationally afraid of breaking something important.

This is the inflection point where many startups lose their edge.

Making Production the Default State#

The solution isn’t to be more careful. It’s to change what “careful” means.

Careful does not mean batching work. Careful means reducing the cost of change.

When code is pushed to production continuously, something important happens: each individual change becomes small enough to reason about.

Small changes are reviewable. Small changes are testable. Small changes are reversible.

Production stops being a cliff and starts being a stream.

This isn’t about moving fast at all costs. It’s about building a system where speed and safety are no longer in tension.

Separating Building from Releasing#

One of the most powerful — and underused — ideas in modern product engineering is this:

Engineers should decide when code is safe. Product should decide when it is live.

Feature flags are the boundary between those two worlds.

Once a feature is merged and deployed behind a flag, engineering has done its job. The code exists. It’s tested. It’s observable. It’s stable.

From that moment on, product teams should own:

When a feature is enabled
Who sees it
How quickly it rolls out
Whether it should be turned off

This separation eliminates waiting, negotiation, and redeployments. It allows learning to happen independently of engineering cycles — which is exactly where velocity multiplies.

What You Reward Is What You Get#

Culture doesn’t come from mission statements. It comes from incentives.

If you praise big launches, you’ll get big launches — and long gaps between them. If you celebrate heroics, you’ll get fragile systems. If you reward “saving the release,” you’ll keep creating near-failures.

Velocity-driven teams reward different things:

Shipping in small increments
Deleting unused code
Killing ideas early
Reducing scope without ego
Deploying quietly and often

These behaviors feel unremarkable in the moment. That’s the point.

When progress becomes boring, it becomes sustainable.

The Quiet Advantage#

The fastest teams rarely look frantic.

They don’t rush. They don’t announce. They don’t wait.

They push code continuously. They expose features deliberately. They learn in public and iterate in private.

In hyper-growth, this quiet discipline is the difference between scaling momentum and scaling friction.

Speed isn’t what you do when things are simple. Speed is what remains when things get hard.

And the only way to keep it is to design for it — on purpose.

References and Further Reading#

Continuous Deployment: Jez Humble and David Farley, Continuous Delivery (2010) - the seminal work on automated deployment pipelines and making release a non-event
Small Batch Releases: Nicole Forsgren et al., Accelerate (2018) - DORA research showing that high performers deploy more frequently with better outcomes
Feature Flags: Pete Hodgson, “Feature Toggles” (martinfowler.com, 2017); also Effective Feature Management by LaunchDarkly on separating deploy from release
Speed and Safety Together: Gene Kim et al., The DevOps Handbook (2016) on how modern practices align speed with reliability, not trade them off
Learning Rate: Eric Ries, The Lean Startup (2011) on learning as the fundamental unit of progress; validated learning over vanity metrics
Risk from Infrequent Releases: Michael Nygard, Release It! (2017, 2nd edition) on how batch deployments increase risk and blast radius
Cultural Incentives: Daniel Pink, Drive (2009) on motivation and what organizations actually reward; also Westrum’s organizational culture research
Progressive Delivery: Split.io and LaunchDarkly resources on percentage rollouts, canary releases, and gradual feature exposure
Trunk-Based Development: Paul Hammant on trunkbaseddevelopment.com - the practice of working on main branch with short-lived feature branches and feature flags

Shipping in Uncertainty: Field Notes on Startup SDLC

Tue, 06 Jan 2026 00:00:00 +0000

Startups don’t fail because they lack process. They fail because they adopt the wrong process too early, or worse - believe process will save them from uncertainty.

In a fast-growing startup, SDLC isn’t a methodology. It’s a series of bets you place under incomplete information, changing constraints, and real deadlines. The goal isn’t to be “correct.” The goal is to stay alive long enough to learn.

1. Being Agile Isn’t About Speed - It’s About Optionality#

Most teams confuse agility with velocity.

Shipping fast is easy when nothing matters. Shipping fast repeatedly, when users depend on you and the product keeps changing - that’s the hard part.

Agile process in startups is about keeping options open:

Short planning horizons because long ones lie
Small batch sizes because big bets hide mistakes
Lightweight rituals because heavy ones calcify too early

The mistake isn’t “no process.” The mistake is locking yourself into decisions before reality forces you to.

Your SDLC should answer one question every week:

What can we safely change our mind about next week?

If your process can’t absorb change without breaking people, morale, or production - it’s already too rigid.

2. Feedback Is the Only Truth That Matters (And It’s Often Uncomfortable)#

In early startups, opinions are loud and feedback is quiet.

Executives debate. Engineers speculate. Roadmaps get polished. Meanwhile, the only signal that actually matters - user behavior - trickles in slowly and inconveniently.

A healthy startup SDLC is built to embarrass your assumptions quickly.

That means:

Shipping before you’re comfortable
Measuring before you’re confident
Listening before you’re defensive

Feedback isn’t validation. It’s friction - and friction is information.

The faster your SDLC turns friction into learning, the less time you spend building the wrong thing exceptionally well.

3. Process Evolves With the Code - Tests Are How You Buy Courage#

Early on, you move fast because nothing is stable. Later, you slow down because everything is connected.

This is where many startups stall - not because they lack talent, but because every change feels dangerous.

Tests aren’t about correctness. They’re about courage.

Courage to refactor instead of rewrite
Courage to say yes instead of “let’s be careful”
Courage to let new engineers touch old code

As the product evolves, your SDLC must evolve with it:

Manual checks turn into automated tests
Tribal knowledge turns into repeatable pipelines
Heroics turn into systems

The moment your team starts asking “what might break?” before every change is the moment you realize you waited too long.

4. Decision-Making Under Uncertainty Is the Real SDLC#

Frameworks don’t make decisions. People do - usually with partial data and full responsibility.

In startups, every technical decision is made too early:

Architecture before scale is known
Abstractions before usage is clear
Tooling before constraints are real

So the goal isn’t to avoid wrong decisions. It’s to make reversible ones cheap.

A good startup SDLC optimizes for:

Fast feedback on irreversible choices
Clear ownership when tradeoffs are made
Systems that surface risk early instead of hiding it

Confidence doesn’t come from certainty. It comes from knowing how quickly you can recover when you’re wrong.

Closing Thought: Process Is a Living System#

The biggest SDLC mistake startups make is freezing process in time.

What worked at 5 engineers will choke you at 50. What saved you at MVP will sink you at scale.

Process isn’t a badge of maturity. It’s a tool for learning, alignment, and survival.

The right question isn’t:

“Are we doing Agile correctly?”

It’s:

“Is our SDLC helping us learn faster than the market is changing?”

If the answer is yes - keep going. If not - change the process before reality does it for you.

References and Further Reading#

Agile as Optionality: Kent Beck et al., Agile Manifesto (2001); also Mary and Tom Poppendieck, Lean Software Development (2003) on “deciding at the last responsible moment”
Small Batch Sizes: Donald Reinertsen, The Principles of Product Development Flow (2009) on batch size and cycle time; also core lean manufacturing principle adapted to software
Build-Measure-Learn: Eric Ries, The Lean Startup (2011) on validated learning and pivoting based on feedback
Tests as Courage: Kent Beck, Test Driven Development: By Example (2002) - TDD gives confidence to refactor; also Ron Jeffries on tests as “courage enabler”
Reversible vs. Irreversible Decisions: Jeff Bezos’s concept of “Type 1 and Type 2 decisions” from Amazon shareholder letters; also discussed in Working Backwards (2021)
Fast Feedback Loops: Jez Humble and David Farley, Continuous Delivery (2010) on reducing cycle time and getting rapid feedback
Process Evolution: Alistair Cockburn, Agile Software Development: The Cooperative Game (2006) on adapting methodology to context
Decision-Making Under Uncertainty: Douglas Hubbard, How to Measure Anything (2010) on probabilistic thinking and reducing uncertainty through experimentation

Technical Debt Is a Tool

Sat, 03 Jan 2026 00:00:00 +0000

Teams rarely slow down because their engineers suddenly become less capable. They slow down because their software systems become harder to reason about.

Technical debt is often blamed for this slowdown, but that diagnosis is incomplete. Debt itself is not the fundamental problem. The real issue is whether the team can still make changes with a reasonable level of confidence.

The fastest teams are not those that eliminate technical debt entirely. They are the teams that prevent debt from turning into uncertainty.

Shortcuts Are How Software Is Made#

Every non-trivial system begins with shortcuts.

At the start, teams lack information. They do not know which features will persist. They do not know which abstractions will hold. They do not know what scale or usage patterns will emerge.

Attempting to design a clean, extensible architecture in this environment is usually an exercise in speculative generality.

Instead, teams hard-code values. They duplicate logic. They accept designs that are clearly provisional.

This is not negligence; it is pragmatism.

Early systems move quickly not because they are well-designed, but because they accurately reflect what the team currently understands. The structure of the code mirrors the structure of knowledge at the time it was written.

The difficulty arises when the system continues to embody assumptions that the team has already outgrown.

How Trust Quietly Erodes#

Most teams encounter a transition phase, often without realizing it.

Changes begin to take longer. Certain areas of the code are approached with caution. Engineers add verbal warnings: “Be careful when touching this.”

Nothing appears broken. Tests still pass. Deployments still occur. Yet the system no longer feels predictable.

In response, engineers adapt their behavior.

They route around risky components rather than through them. They add new behavior on top of old structures instead of reshaping them. They optimize for minimizing surprise rather than reducing complexity.

Development continues, but with increasing resistance.

When teams say “technical debt is slowing us down,” they are rarely referring to messy code in isolation. They are describing the experience of working in a system whose behavior is no longer well understood.

At that point, the problem is not debt. It is loss of trust.

Refactoring as Ongoing Design#

Refactoring is often framed as cleanup: something to do after the “real work” is finished.

Teams that sustain high delivery speed over time see it differently. For them, refactoring is how the system stays aligned with reality.

As requirements change and understanding deepens, the code must change shape accordingly. Earlier decisions are revisited. Temporary structures are either reinforced or removed. Concepts that were once implicit become explicit.

This work is rarely dramatic. There are no large ceremonies or special initiatives. Instead, there is a continuous process of small adjustments that keep the system coherent.

Externally, this appears as discipline. Internally, it feels like maintaining orientation.

When the System Starts to Decide#

When this continuous realignment stops, the effects emerge gradually.

Engineers become more cautious. Estimates become less dependable. Product discussions begin with “if this is possible” rather than “when we do this.”

Over time, architectural constraints begin to shape product decisions. Not deliberately, but through accumulated friction.

Features are deferred not because they lack value, but because they intersect with fragile parts of the system. Eventually, the codebase exerts a quiet veto over change.

This is typically when rewrites are proposed.

Rewrites are rarely motivated by aesthetics. They are attempts to regain a system that feels comprehensible. Most fail - not because rewriting is infeasible, but because the underlying practices that allowed trust to decay remain unchanged.

The Role of Leadership#

As teams slow down, management pressure often increases.

More planning. Tighter timelines. Greater urgency.

Unfortunately, pressure does not restore trust. It accelerates its erosion.

When every change already feels risky, urgency encourages defensive behavior. Engineers avoid destabilizing areas, add protective layers, and further entrench complexity. The system becomes heavier at an increasing rate.

The teams that remain effective are not those granted time for large-scale cleanup. They are the teams allowed to improve the system while continuing to deliver value.

That permission is seldom stated explicitly - but its absence is immediately apparent.

Speed Emerges from Confidence#

High-performing teams are not reckless.

They move quickly because they trust:

that changes will behave approximately as expected
that the code represents current understanding
that problems can be corrected incrementally

This confidence is not achieved by avoiding technical debt.

It is achieved by continually engaging with it.

Shortcuts are unavoidable. Allowing them to persist beyond their usefulness is not.

The distinction is rarely about technology. It is about whether the team preserves the conditions required to move with confidence.

References and Further Reading#

Technical Debt Origin: Ward Cunningham’s original 1992 OOPSLA talk introducing the debt metaphor; see also his 2009 clarification on what he actually meant by the term
Debt vs. Trust: Steve McConnell, “Technical Debt” (IEEE Software, 2007) distinguishes between deliberate and inadvertent debt; this article extends that to focus on confidence erosion
Speculative Generality: Martin Fowler, Refactoring (2018) - one of his code smells; also discussed in Extreme Programming Explained on YAGNI (You Aren’t Gonna Need It)
Continuous Refactoring: Michael Feathers, Working Effectively with Legacy Code (2004) on characterization tests and safe refactoring; also Fowler’s Refactoring on opportunistic refactoring
System Comprehension: Fred Brooks, “No Silver Bullet” (1986) discusses essential vs. accidental complexity; Peter Naur, “Programming as Theory Building” (1985) on understanding as the core asset
Architecture and Velocity: Martin Fowler, “Design Stamina Hypothesis” (martinfowler.com) on how good design enables sustained speed
Why Rewrites Fail: Joel Spolsky, “Things You Should Never Do, Part I” (2000) on the dangers of ground-up rewrites; also Chad Fowler, “The Big Rewrite” (2006)
Pressure and Quality: Gerald Weinberg, Quality Software Management series (1992-1997) on how organizational pressure affects development practices

Refactoring Is How You Keep Moving Fast

Thu, 01 Jan 2026 00:00:00 +0000

Refactoring isn’t a ceremony. It’s not a cleanup phase. It’s not something you “schedule for later.”

It’s how you prevent your team from slowing to a crawl while still shipping features.

Most codebases don’t die because of bad ideas. They die because every new feature costs more than the last one. The system becomes fragile, changes get risky, and engineers start working around the code instead of with it.

Refactoring is the mechanism that keeps the cost of change flat over time.

Not zero. Flat.

Refactoring Is About Preserving Behavior While Changing Structure#

Refactoring has a very specific meaning:

You change how the code is structured without changing what it does.

That constraint matters.

It forces discipline:

Small steps
Clear intent
Continuous verification

You’re not “improving things” in the abstract. You’re reshaping the code so that the next change is easy, obvious, and low-risk.

If behavior changes, you’re no longer refactoring—you’re developing a feature or fixing a bug. Mixing those two is how systems break.

Why Refactoring Is a Feature Development Tool#

Every feature exposes the truth about your system.

If adding something simple requires:

Touching 12 files
Copy-pasting logic
Adding conditionals “just this once”

That’s not bad luck. That’s feedback.

The system is telling you its current shape no longer matches the product you’re building.

Refactoring is how you realign the system with reality.

Before adding a feature, I ask:

Is the code already structured to make this change easy?

If the answer is no, I refactor first.

This almost always makes the overall work faster—not slower—because it removes incidental complexity before it compounds.

Refactoring Is Continuous, Not a Phase#

Healthy teams refactor:

While reading code they don’t understand
While extracting logic they need elsewhere
After a feature works but feels unclear
When duplication starts to appear
When responsibilities blur

This is not hero work. It’s maintenance of velocity.

The goal isn’t “clean code.” The goal is code that future you can change without fear.

If you find yourself thinking:

“I’ll remember how this works later”

You won’t.

Refactor now. Your future self is busy shipping.

Small Steps Are the Only Way This Works#

Big rewrites feel productive and usually fail.

Refactoring works because each step is:

Small
Behavior-preserving
Reversible
Verifiable

You keep the system working at all times. That’s non-negotiable.

This is why tests matter—not because of ideology, but because they make refactoring safe. When tooling exists (IDEs, automated refactors), use it. When it doesn’t, slow down and shrink the steps.

Speed comes from control, not recklessness.

Tools Help, But Discipline Matters More#

Modern IDEs can rename, extract, inline, and move code safely. That’s great. Use them.

LLM-based code assistants are also powerful tools. They can suggest refactorings, spot duplication, and accelerate mechanical transformations. Used well, they reduce friction and help you move faster.

But tools don’t replace judgment.

An IDE doesn’t understand your system. An LLM doesn’t understand your product.

They don’t know which invariants matter, which abstractions are stable, or which shortcuts will become tomorrow’s bottlenecks. They optimize for plausibility and local correctness, not long-term system shape.

You still need to decide:

What responsibility belongs where
What abstraction actually reduces complexity
What should be deleted instead of generalized

If you can’t explain why a refactoring was done, you probably shouldn’t merge it—no matter how clean it looks.

LLMs are best treated like very fast junior engineers: great at execution, dangerous without context, and ineffective without clear direction.

Refactoring is thinking, not formatting.

Refactoring Is an Investment Strategy#

Refactoring is how you:

Keep feature velocity predictable
Reduce cognitive load across the team
Prevent “expert-only” areas of the codebase
Avoid long stabilization phases
Scale engineering without scaling pain

If your roadmap assumes constant delivery speed but your codebase gets harder to change every month, the plan is already broken.

Refactoring is how you close that gap.

The Real Test#

Here’s the litmus test I use:

Can a new engineer make a meaningful change in this area without asking for help?

If not, refactoring is already overdue.

Not because the code is “ugly,” but because it’s taxing future execution.

And execution is the whole game.

References and Further Reading#

Refactoring Definition: Martin Fowler, Refactoring: Improving the Design of Existing Code (2018, 2nd edition) - the definitive work on behavior-preserving code transformations
Keeping the Cost of Change Flat: Kent Beck, Extreme Programming Explained (2004) on the “cost of change curve” and how XP practices flatten it
Make the Change Easy: Kent Beck’s famous tweet (2012): “for each desired change, make the change easy (warning: this may be hard), then make the easy change”
Tests as Safety Net: Michael Feathers, Working Effectively with Legacy Code (2004) on using tests to enable safe refactoring
Continuous Refactoring: Joshua Kerievsky, Refactoring to Patterns (2004) on refactoring as ongoing design activity, not a phase
Small Steps: Martin Fowler’s refactoring catalog emphasizes “small, behavior-preserving steps” as the core discipline
Code as Liability: Kevlin Henney’s talks on “The Forgotten Art of Structured Programming” discuss how less code often means better code
LLMs in Development: GitHub Copilot research (2022) and OpenAI Codex papers on AI-assisted coding tools; also Anthropic’s work on Claude for code assistance

4 Musts of Building a New Feature

Sat, 27 Dec 2025 00:00:00 +0000

A CTO’s guide to reducing waste, increasing learning, and shipping what matters

As CTOs, we like to believe that building features is a technical problem. If the architecture is clean, the code is solid, and the team is strong, success should follow.

Reality disagrees.

Most failed features don’t fail because of bad code. They fail because we invested heavily before we learned enough. We optimized execution before validating direction. We treated uncertainty as something to be managed later, instead of the core problem to solve first.

Over the years, I’ve learned that successful feature development is less about brilliance and more about discipline. There are a few non-negotiable principles that, when applied consistently, dramatically increase the odds that a feature will deliver real value.

This article outlines a few of those principles.

1. Start With Effort vs. Risk - Not With the Solution#

Every new feature carries two independent dimensions that are often confused: effort and risk.

Effort is the cost we can reasonably estimate upfront:

Engineering time
Number of people involved
Infrastructure changes
Coordination overhead

Risk, on the other hand, is about uncertainty.

How “researchy” is this problem?
How many assumptions are we making?
How confident are we that this will work - technically and for users?

Both can usually be classified as low, medium, or high.

The mistake many teams make is assuming that high effort automatically implies high risk, or worse, ignoring risk entirely. In practice, risk is what causes effort to explode.

A feature that looks like “medium effort” can quietly turn into a multi-quarter project if the underlying assumptions are wrong.

Before committing to a solution, I insist on explicitly placing the work on an effort–risk matrix:

Low effort, low risk: execute quickly.
High effort, low risk: plan carefully, but proceed.
Low effort, high risk: perfect candidate for experiments.
High effort, high risk: stop. Break it down or reduce uncertainty first.

This framing forces a critical shift: our first job is not to build, but to learn.

2. Break the Feature Until It Can Be Deployed#

Large features fail in predictable ways. They take too long, hide feedback, and accumulate irreversible decisions.

The antidote is deceptively simple: never build a feature as a single unit.

Instead of asking, “How do we deliver this feature?”, ask:

“What is the smallest version of this that we can deploy to production?”

This is not about cutting corners. It’s about vertical slicing - delivering thin, end-to-end increments that work in the real system.

This step is also iterative with the effort vs. risk assessment. Breaking a feature apart is often how hidden risk reveals itself. What initially looked like a single medium-effort task may turn out to contain a high-risk component that deserves its own experiment, or a low-risk slice that can be shipped immediately.

Each slice should:

Be deployable on its own
Reduce uncertainty
Move the system closer to the final vision

Sometimes the first slice delivers no visible user value. That’s fine. Internal value counts too:

A backend capability behind a flag
A new data pipeline with no UI
A manual workflow that validates demand

If it can’t be deployed, it’s too big.

Breaking work into small components does more than improve delivery speed. It changes how teams think. Progress becomes measurable. Risk becomes localized. Decisions become reversible.

3. Fail Fast, Fail Often - On Purpose#

Failure has a bad reputation in engineering cultures, usually because it arrives late and costs too much.

But failure itself isn’t the problem. Delayed failure is.

When we break features into deployable parts, we unlock the ability to fail early - when failure is cheap and informative.

Failing fast means:

Shipping before we feel “ready”
Exposing assumptions to real users
Letting reality correct our plans

This requires operational discipline:

Feature flags to control exposure
Incremental rollouts
Monitoring from day one

It also requires cultural discipline. Teams must understand that early negative signals are not a sign of incompetence, but a sign that the system is working.

The goal is not to avoid failure. The goal is to maximize learning per unit of effort.

4. Define Measurable KPIs Before You Write Code#

A feature without success metrics is not a feature - it’s a hypothesis.

Before implementation begins, we must be able to answer a simple question:

“How will we know if this worked?”

Good KPIs are:

Measurable: based on observable data
Actionable: they inform a decision
Aligned: they reflect real user or business value

Examples include:

Adoption rate
Frequency of use
Retention impact
Performance or cost improvements
Time saved for users

Vanity metrics don’t count. “It feels useful” is not a KPI.

One often-overlooked detail: KPIs are not free. Tracking them requires instrumentation, logging, dashboards, and ongoing maintenance. That effort must be part of the feature’s cost from day one — not an afterthought.

If a metric is important enough to define success, it is important enough to build proper visibility for.

KPIs serve a second, equally important role: they give us permission to stop.

If a feature fails to move its metrics after sufficient iteration, the data gives us cover to deprecate it. This is how organizations avoid accumulating dead weight disguised as progress.

The System Matters More Than Any Feature#

These four principles are not independent. They reinforce each other.

Evaluating effort vs. risk pushes us to reduce uncertainty early. Breaking features into small deployable parts enables fast learning. Failing fast shortens feedback loops. KPIs anchor decisions in reality.

Together, they form a system designed for learning, not heroics.

As CTOs, our real responsibility is not to ship features. It is to build organizations that can discover what to ship - efficiently, repeatedly, and with humility.

In the long run, the teams that win are not the ones that build the most. They are the ones that learn the fastest.

References and Further Reading#

Vertical Slicing: Alistair Cockburn, “Walking Skeleton” pattern in Crystal Clear: A Human-Powered Methodology for Small Teams (2004)
Effort vs. Risk Assessment: Similar to the Eisenhower Matrix adapted for engineering; see also The Lean Startup by Eric Ries (2011) on validated learning and hypothesis testing
Feature Flags & Progressive Delivery: Pete Hodgson, “Feature Toggles (aka Feature Flags)” (martinfowler.com, 2017); LaunchDarkly’s Effective Feature Management (2020)
Fail Fast Philosophy: Jim Shore, “Fail Fast” in The Art of Agile Development (2007); also related to the “Build-Measure-Learn” feedback loop from Lean Startup methodology
Measurable KPIs: Dan Olsen, The Lean Product Playbook (2015), Chapter 7 on metrics that matter; Cagan & Jones, Empowered (2020) on product team accountability
Incremental Development: Kent Beck, Extreme Programming Explained (2004) on small releases and continuous integration
Learning as Core Practice: Jez Humble et al., Lean Enterprise (2014) on creating a culture of experimentation and learning