How I Debugged a Regex That Brought Down Our API

It was a Tuesday afternoon when our monitoring dashboard lit up like a Christmas tree. Response times spiking from 80ms to 45 seconds. Timeout errors cascading across every endpoint. My phone buzzing with PagerDuty alerts. And me, staring at the screen thinking — we hadn't deployed anything in three days. What the hell happened?

Two hours later, I traced everything back to eleven characters in a regular expression. A regex I had written myself, six months prior, without a second thought.

The Innocent-Looking Pattern

Our API accepted user-submitted email addresses for a newsletter signup feature. Pretty standard stuff. The validation logic lived in a small utility function that looked like this:

const emailRegex = /^([a-zA-Z0-9]+\.?)*@[a-zA-Z0-9]+\.[a-zA-Z]{2,}$/;

If you've been doing this long enough, your stomach might have just dropped. Mine certainly did — but only after I finally understood what I was looking at.

I had written that pattern during a late-night refactor, half-copying something from Stack Overflow, half improvising. It looked reasonable. It matched valid emails. It passed the handful of test cases I threw at it. I committed it, moved on, forgot about it entirely.

What I didn't notice was the nested quantifier — (something+)* — which is the classic setup for catastrophic backtracking.

What Catastrophic Backtracking Actually Means

Before this incident, I had a vague awareness that "bad regexes can be slow." I did not appreciate just how exponentially slow they could get.

Here's what happens with a pattern like ([a-zA-Z0-9]+\.?)* when the input doesn't match. The regex engine tries every possible way to split the input across the repeating group. Each character position becomes a branching point. For a 30-character string, the engine might attempt millions of combinations before giving up and reporting no match. A 50-character string? You're looking at effectively infinite time. The process hangs.

This class of vulnerability has an actual name: ReDoS — Regular Expression Denial of Service. It's listed in the OWASP Top 10 for a reason, and I had introduced it into production while thinking I was doing something completely mundane.

The trigger in our case was an attacker — or possibly a fuzzer, we never confirmed — submitting email strings like:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa@

The domain part after the @ was missing, so the regex would never match. But before giving up, the engine exhausted every possible interpretation of the local part. Our Node.js process — single-threaded, remember — would lock up on that one request. The event loop couldn't process anything else. Timeouts cascaded. The API appeared completely down.

The Debugging Process (The Part Nobody Talks About)

I want to be honest about how long it took me to find this, because I think the "aha moment" narrative does a disservice to how this actually goes.

My first assumption was a database problem. I checked slow query logs. Nothing unusual. I checked connection pool exhaustion. Fine. I restarted the DB replica. No change.

My second assumption was a memory leak. I pulled heap snapshots. Took another hour. Red herring.

What actually broke the case was looking at the Node.js CPU metrics more carefully. Normally our API process sits at maybe 5–8% CPU. During the incident: 99%, sustained, for minutes at a time. That's not I/O bound behavior. That's the CPU spinning on compute.

I started adding console.time() markers around different sections of the request handler. Crude, but effective. The validation step — which should have been microseconds — was taking 40+ seconds. I literally could not believe it at first. I added the markers twice, thinking I'd made a mistake.

Then I copy-pasted the regex into regex101.com, switched to the "step" debugger, and fed it a bad input string. The step count went into the hundreds of thousands before I stopped watching. The debugger actually warned me: "This may indicate catastrophic backtracking."

So that's what that means, I thought.

The Fix (Which Is Not Just "Write a Better Regex")

The immediate fix was obvious enough: replace the broken regex with something that doesn't have nested quantifiers. For email validation specifically, I ended up doing the simplest thing that actually works:

const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]{2,}$/;

It's not perfect — it would accept technically malformed emails — but for our use case (newsletter signup, not cryptographic identity verification), it's more than sufficient. And critically: it's immune to backtracking because there's no ambiguity in how the engine can interpret each character. Every position has exactly one possible match path.

For the longer term, I did a few other things:

Added a timeout wrapper around all regex operations. Node.js doesn't give you this natively, but you can run validation in a worker thread and terminate it if it exceeds a threshold. Overkill for most cases; not overkill when user-controlled strings touch your regex engine.

Implemented request body size limits at the nginx layer, not just in the application. An attacker can't send a 10,000-character "email" if your reverse proxy rejects anything over 500 bytes in that field.

Started using a linter rule for ReDoS patterns. There's an ESLint plugin called eslint-plugin-regexp that catches a bunch of these statically. It would have flagged my original pattern immediately. I added it to the project and ran it across our entire codebase. Found three other suspicious patterns, none as severe, but worth fixing.

Added the string to our test suite as a "must not hang" case. The test is simple — if validation takes more than 10ms, it fails. We now catch this class of issue before it ships.

What I Wish I'd Known Earlier

The pattern that kills you is almost always (a+)+ or ([ab]+)* or any variant where a quantifier wraps a group that itself has a quantifier. The reason is that there are exponentially many ways to partition a repeated sequence across the group boundaries, and the regex engine will try all of them on a failed match.

Safe patterns tend to have clear, unambiguous character class partitions. If A and B are mutually exclusive character classes, then (A+B)* is fine — the engine always knows exactly which class each character belongs to. The problem arises when a character could match in multiple parts of the pattern, creating ambiguity that forces backtracking exploration.

Tools I now keep open during any regex work:

  • regex101.com — the step debugger and backtracking warning is invaluable. Spend the extra 30 seconds clicking through it.
  • regexper.com — generates a railroad diagram of your pattern. Nested loops jumping back into each other on the diagram are a visual warning sign.
  • vuln-regex-detector — a command-line tool from Virginia Tech researchers that statically analyzes regexes for ReDoS vulnerability. Run it in CI if you're paranoid (which you should be if users control the input).

The Bigger Lesson

What sticks with me about this incident isn't the technical detail — that's findable in OWASP docs and any good security blog. What sticks is how invisible the vulnerability was.

The code looked correct. It tested correctly on valid and obviously-invalid inputs. It had been in production for months without issue. It only became a problem when someone deliberately (or accidentally) provided an input designed to exploit the engine's backtracking behavior.

Security properties of code are not always visible in the happy path. A function can behave perfectly for 999 out of 1000 inputs and be catastrophically exploitable on the thousandth. That's not a new insight — it's the basis of almost every injection attack — but I hadn't internalized it for something as seemingly innocent as a validation regex.

I now treat any regex that accepts user-controlled strings as a potential attack surface, the same way I treat SQL queries and file paths. Test it on adversarial input. Lint it statically. Apply length limits before it ever reaches the regex engine. And when in doubt, reach for a purpose-built parsing library instead of rolling your own pattern.

The regex that brought down our API was eleven characters of local-part validation logic. That's what keeps me humble about "simple" code.