The Hidden Tax of Feature Flags
At Microsoft, I learned that feature flags pile up faster than anyone admits. Old ones turn into hidden rules running in production. This post shares the runbook we used to clean them up in five days.
It was 2016. I’d just joined Microsoft. Two teams were interested in me, and I chose Azure’s new SRE org because they were tackling ambiguity.
The kind where the telemetry’s clean but the system is lying to you. The kind where you realize the docs are wrong, and the only way out is to build a new mental model from first principles.
The problem came fast.
My boss assigned three teams from different corners of Azure to me. Each brought their own version of DevOps, and an untracked set of “temporary” feature flags.
Some had survived multiple reorgs.
I told myself it was just code debt. It wasn’t.
It was runtime ambiguity.
The Flag That Bit Us
The report came in subtle.
“Intermittent 5xxs in Canada during maintenance windows. Staging is clean. CI is green. No recent deploy.”
That was Priya, the EM for traffic infra.
I asked for a list of flags that might influence routing. No one had one.
Someone tailed logs in a Canadian region. We saw two different code paths for the same request. The split wasn’t in the release diff. It matched something else.
A flag with a kind name.
lb.legacy_health_check_enabled
true
in Canada only. Not present in other environments. Buried layers deep in a service module that had migrated across teams and repos.
No comment. No TTL. No owner.
Stale config pretending to be logic.
The system was behaving exactly as instructed—based on an instruction we no longer remembered issuing.
We toggled it off. Errors stopped. Logs normalized.
Then I pulled everyone into a room: EMs, principals, our TPM Marcus.
“This isn’t a cleanup task,” I said. “It’s a system fix.”
What follows is the runbook of how to save your system from the mess of feature flags. You’ll get:
The five valid flag types with placement rules you can copy
The payload every flag must carry to prevent outages
A five-day plan to reduce runtime uncertainty without adding headcount
The Five Flags You Need
“We start with definitions,” Marcus said. “No code until we agree on contracts. Then we encode them into the system.”
Priya grabbed the marker and wrote:
Valid Flag Types
We debated. Sketched. Fought over semantics and failure modes.
By the end, we had five categories, each with rules:
1. Kill Switch
Why it exists
To instantly isolate failure. Fast path to stop the bleeding.
Where it lives
At the platform edge. Not inside nested business logic.
Example
We paused token issuance when our identity provider hit its error budget. One toggle. One dashboard. One line in the playbook.
Rule
If a kill switch is hidden, you are debugging live traffic without a brake.
2. A/B Test
Why it exists
To validate product behavior with live traffic.
Where it lives
Near the boundary. Never inside core services.
Example
Two different health check intervals ran against 10% buckets. When one won, the flag self-deleted. No leftover forks.
Rule
If your experiment doesn’t expire, you’re not testing. You’re splitting reality.
3. Entitlement
Why it exists
To reflect account tier, region, or role in a consistent way.
Where it lives
In a policy layer that propagates claims downstream.
Example
Enterprise customers saw enhanced DDoS analytics. One entitlement flag controlled access. Same logic, no matter which service answered.
Rule
If different services return different answers to the same user, your policy layer is broken. You’re introducing nondeterminism.
4. Work in Progress (WIP)
Why it exists
To land incomplete work behind a guard.
Where it lives
Right next to the feature code. Not abstracted.
Example
We gated the new policy editor while audit logging was still in progress. Once the logging shipped, the flag was deleted.
Rule
If a WIP flag lives longer than 60 days, it isn’t scaffolding. It’s accidental architecture.
5. Conditional Rollout
Why it exists
To control blast radius and support progressive exposure.
Where it lives
In gateways or infra layers with real-time observability.
Example
We enabled buildout automation in Australia. Then 10% of U.S. traffic. Rollbacks auto-triggered on error thresholds.
Rule
If you can’t observe the ramp or auto-revert on failure, you’re not rolling out. You’re gambling.
Every Flag Carries a Payload
A flag without metadata is a rogue variable in production. Mutable. Undocumented. Dangerous.
Every flag we kept had to include the following payload:
Keep reading with a 7-day free trial
Subscribe to The Conscious Leader to keep reading this post and get 7 days of free access to the full post archives.