FreshRSS

πŸ”’
❌ About FreshRSS
There are new available articles, click to refresh the page.
Before yesterdayTroy Hunt

To Infinity and Beyond, with Cloudflare Cache Reserve

By Troy Hunt
To Infinity and Beyond, with Cloudflare Cache Reserve

What if I told you... that you could run a website from behind Cloudflare and only have 385 daily requests miss their cache and go through to the origin service?

To Infinity and Beyond, with Cloudflare Cache Reserve

No biggy, unless... that was out of a total of more than 166M requests in the same period:

To Infinity and Beyond, with Cloudflare Cache Reserve

Yep, we just hit "five nines" of cache hit ratio on Pwned Passwords being 99.999%. Actually, it was 99.9998% but we're at the point now where that's just splitting hairs, let's talk about how we've managed to only have two requests in a million hit the origin, beginning with a bit of history:

Optimising Caching on Pwned Passwords (with Workers)- @troyhunt - https://t.co/KjBtCwmhmT pic.twitter.com/BSfJbWyxMy

β€” Cloudflare (@Cloudflare) August 9, 2018

Ah, memories 😊 Back then, Pwned Passwords was serving way fewer requests in a month than what we do in a day now and the cache hit ratio was somewhere around 92%. Put another way, instead of 2 in every million requests hitting the origin it was 85k. And we were happy with that! As the years progressed, the traffic grew and the caching model was optimised so our stats improved:

There it is - Pwned Passwords is now doing north of 2 *billion* requests a month, peaking at 91.59M in a day with a cache-hit ratio of 99.52%. All free, open source and out there for the community to do good with 😊 pic.twitter.com/DSJOjb2CxZ

β€” Troy Hunt (@troyhunt) May 24, 2022

And that's pretty much where we levelled out, at about the 99-and-a-bit percent mark. We were really happy with that as it was now only 5k requests per million hitting the origin. There was bound to be a number somewhere around that mark due to the transient nature of cache and eviction criteria inevitably meaning a Cloudflare edge node somewhere would need to reach back to the origin website and pull a new copy of the data. But what if Cloudflare never had to do that unless explicitly instructed to do so? I mean, what if it just stayed in their cache unless we actually changed the source file and told them to update their version? Welcome to Cloudflare Cache Reserve:

To Infinity and Beyond, with Cloudflare Cache Reserve

Ok, so I may have annotated the important bit but that's what it feels like - magic - because you just turn it on and... that's it. You still serve your content the same way, you still need the appropriate cache headers and you still have the same tiered caching as before, but now there's a "cache reserve" sitting between that and your origin. It's backed by R2 which is their persistent data store and you can keep your cached things there for as long as you want. However, per the earlier link, it's not free:

To Infinity and Beyond, with Cloudflare Cache Reserve

You pay based on how much you store for how long, how much you write and how much you read. Let's put that in real terms and just as a brief refresher (longer version here), remember that Pwned Passwords is essentially just 16^5 (just over 1 million) text files of about 30kb each for the SHA-1 hashes and a similar number for the NTLM ones (albeit slight smaller file sizes). Here are the Cache Reserve usage stats for the last 9 days:

To Infinity and Beyond, with Cloudflare Cache Reserve

We can now do some pretty simple maths with that and working on the assumption of 9 days, here's what we get:

To Infinity and Beyond, with Cloudflare Cache Reserve

2 bucks a day 😲 But this has taken nearly 16M requests off my origin service over this period of time so I haven't paid for the Azure Function execution (which is cheap) nor the egress bandwidth (which is not cheap). But why are there only 16M read operations over 9 days when earlier we saw 167M requests to the API in a single day? Because if you scroll back up to the "insert magic here" diagram, Cache Reserve is only a fallback position and most requests (i.e. 99.52% of them) are still served from the edge caches.

Note also that there are nearly 1M write operations and there are 2 reasons for this:

  1. Cache Reserve is being seeded with source data as requests come in and miss the edge cache. This means that our cache hit ratio is going to get much, much better yet as not even half all the potentially cacheable API queries are in Cache Reserve. It also means that the 48c per day cost is going to come way down πŸ™‚
  2. Every time the FBI feeds new passwords into the service, the impacted file is purged from cache. This means that there will always be write operations and, of course, read operations as the data flows to the edge cache and makes corresponding hits to the origin service. The prevalence of all this depends on how much data the feds feed in, but it'll never get to zero whilst they're seeding new passwords.

An untold number of businesses rely on Pwned Passwords as an integral part of their registration, login and password reset flows. Seriously, the number is "untold" because we have no idea who's actually using it, we just know the service got hit three and a quarter billion times in the last 30 days:

To Infinity and Beyond, with Cloudflare Cache Reserve

Giving consumers of the service confidence that not only is it highly resilient, but also massively fast is essential to adoption. In turn, more adoption helps drive better password practices, less account takeovers and more smiles all round 😊

As those remaining hash prefixes populate Cache Reserve, keep an eye on the "cf-cache-status" response header. If you ever see a value of "MISS" then congratulations, you're literally one in a million!

Full disclosure: Cloudflare provides services to HIBP for free and they helped in getting Cache Reserve up and running. However, they had no idea I was writing this blog post and reading it live in its entirety is the first anyone there has seen it. Surprise! πŸ‘‹

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

By Troy Hunt
Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

I found myself going down a previously unexplored rabbit hole recently, or more specifically, what I thought was "a" rabbit hole but in actual fact was an ever-expanding series of them that led me to what I refer to in the title of this post as "6 rabbits deep". It's a tale of firewalls, APIs and sifting through layers and layers of different services to sniff out the root cause of something that seemed very benign, but actually turned out to be highly impactful. Let's go find the rabbits!

The Back Story

When you buy an API key on Have I Been Pwned (HIBP), Stripe handles all the payment magic. I love Stripe, it's such an awesome service that abstracts away so much pain and it's dead simple to integrate via their various APIs. It's also dead simple to configure Stripe to send notices back to your own service via webhooks. For example, when an invoice is paid or a customer is updated, Stripe sends information about that event to HIBP and then lists each call on the webhooks dashboard in their portal:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

There are a whole range of different events that can be listened to and webhooks fired, here we're seeing just a couple of them that are self explanatory in name. When an invoice is paid, the callback looks something like this:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

HIBP has received this call and updated it's own DB such that for a new customer, they can now retrieve an API key or for an existing customer whose subscription has renewed, the API key validity period has been extended. The same callback is also issued when someone upgrades an API key, for example when going from 10RPM (requests per minute) to 50RPM. It's super important that HIBP gets that callback so it can appropriately upgrade the customer's key and they can immediately begin making more requests. When that call doesn't happen, well, let's go down the first rabbit hole.

The Failed API Key Upgrade 🐰

This should never happen:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

This came in via HIBP's API key support portal and is pretty self-explanatory. I checked the customer's account on Stripe and it did indeed show an active 50RPM subscription, but when drilling down into the associated payment, I found the following:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Ok, so at least I know where things have started to go wrong, but why? Over to the webhooks dashboard and into the failed payments and things look... suboptimal:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Dammit! Fortunately this is only a small single-digit percentage of all callbacks, but every time this fails it's either stopping someone like the guy above from making the requests they've paid for or potentially, causing someone's API key to expire even though they've paid for it. The latter in particular I was really worried about as it would nuke their key and whatever they'd built on top of it would cease to function. Fortunately, because that's such an impactful action I'd built in heaps of buffer for just such an occurrence and I'd gotten onto this issue quickly, but it was disconcerting all the same.

So, what's happening? Well, the response is HTTP 403 "Forbidden" and the body is clearly a Cloudflare challenge page so something at their end is being triggered. Looks like it's time to go down the next rabbit hole.

Cloudflare's Firewall and Logs 🐰 🐰

Desperate just to quickly restore functionality, I dropped into Cloudflare's WAF and allowed all Stripe's outbound IPs used for webhooks to bypass their security controls:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

This wasn't ideal, but it only created risk for requests originating from Stripe and it got things up and running again quickly. With time up my sleeve I could now delve deeper and work out precisely what was going on, starting with the logs. Cloudflare has a really extensive set of APIs that can control a heap of features of the service, including pulling back logs (note: this is a feature of their Enterprise plan). I queried out a slice of the logs corresponding to when some of the 403s from Stripe's dashboard occurred and found 2 entries similar to this one:

{"BotScore":1,"BotScoreSrc":"Verified Bot","CacheCacheStatus":"unknown","ClientASN":16509,"ClientCountry":"us","ClientIP":"54.187.205.235","ClientRequestHost":"haveibeenpwned.com","ClientRequestMethod":"POST","ClientRequestReferer":"","ClientRequestURI":"[redacted]","ClientRequestUserAgent":"Stripe/1.0 (+https://stripe.com/docs/webhooks)","EdgeRateLimitAction":"","EdgeResponseStatus":403,"EdgeStartTimestamp":1674073983931000000,"FirewallMatchesActions":["managedChallenge"],"FirewallMatchesRuleIDs":["6179ae15870a4bb7b2d480d4843b323c"],"FirewallMatchesSources":["firewallManaged"],"OriginResponseStatus":0,"WAFAction":"unknown","WorkerSubrequest":false}

That's one of Stripe's outbound IP's on 54.187.205.235 and the "FirewallMatchesRuleIDs" collection has a value in it. Ergo, something about this request triggered the firewall and caused it to be challenged. I'm sure many of us have gone through the following thought process before:

What did I change?

Did I change anything?

Did they change something?

Except "they" could have been either Cloudflare or Stripe; if it wasn't me (and I was fairly certain it wasn't), was it a Cloudflare change to the rules or a Stripe change to a webhook payload that was now triggering an existing rule? Time to dig deeper again so it's over to the Cloudflare dashboard and down into the WAF events for requests to the webhook callback path:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Yep, something proper broke! Let's drill deeper and look at recent events for that IP:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

As you dig deeper through troubleshooting exercises like this, you gradually turn up more and more information that helps piece the entire puzzle together. In this case, it looks like the "Inbound Anomaly Score Exceeded" rule was being triggered. What's that? And why? Time to go down another rabbit hole.

The Cloudflare OWASP Core Ruleset 🐰 🐰 🐰

So, deeper and deeper down the rabbit holes we go, this time into the depths of the requests that triggered the managed rule:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Well that's comprehensive πŸ™‚

There's a lot to unpack here so let's begin with the ruleset that the previously identified "Inbound Anomaly Score Exceeded" rule belongs to, the Cloudflare OWASP Core Ruleset:

The Cloudflare OWASP Core Ruleset is Cloudflare’s implementation of the OWASP ModSecurity Core Rule SetOpen external link (CRS). Cloudflare routinely monitors for updates from OWASP based on the latest version available from the official code repository.

That link is yet another rabbit hole altogether so let me summarise succinctly here: Cloudflare uses OWASP's rules to identify anomalous traffic based on a customer-defined paranoia level (how strict you want to be) and then applies a score threshold (also customer-defined) at which an action will be taken, for example challenging the request. What I learned as this saga progressed is that the "Inbound Anomoly Score Exceeded" rule is actually a rollup of the rules beneath it. The OWASP score of "26" is the sum of the 6 rules listed beneath it and once it exceeds 25, the superset rule is triggered.

Further - and this is the really important bit - Cloudflare routinely updates the rules from OWASP which makes sense because these are ever-evolving in response to new threats. And when did they last upgrade the rules? It looks like they announced it right before I started having issues:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Whilst it's not entirely clear from above when this release was scheduled to occur, I did reach out to Cloudflare support and was advised it had already taken place:

Please note that we did bump the OWASP version, which we are integrating with to 3.3.4 as noted on our scheduled changes.

So maybe it's not Cloudflare's fault or Stripe's fault, but OWASP's fault? In fairness to all, I don't think it's anyone's fault per se and is instead just an unfortunate result of everyone doing their best to keep the bad guys out. Unless... it really is Stripe's fault because there's something in the request payload that was always fishy and is now being caught? But why for only some requests and not others? Next rabbit!

Cloudflare Payload Logging 🐰 🐰 🐰 🐰

Sometimes, people on the internet lose their minds a bit over things they really shouldn't. One of those things, in my experience, is Cloudflare's interception of traffic and it's something I wrote about in detail nearly 7 years ago now in my piece on security absolutism. Cloudflare plays an enormously valuable role in the internet's ecosystem and a substantial part of the value comes from being able to inspect, cache, optimise, and yes, even reject traffic. When you use Cloudflare to protect your website, they're applying rulesets like the aforementioned OWASP ones and in order to do that, they must be able to inspect your traffic! But they don't log it, not all of it, rather just "metadata generated by our products" as they refer to it on their logs page. We saw an example of that earlier on with Stripe's request from their IP showing it triggered a firewall rule, but what we didn't see is the contents of that POST request, the actual payload that triggered the rule. Let's go grab that.

Because the contents of a POST request can contain sensitive information, Cloudflare doesn't log it. Obviously they see it in transit (that's how OWASP's rules can be applied to it), but it's not stored anywhere and even if you want to capture it, they don't want to be able to see it. That's where payload logging (another Enterprise plan feature) comes in and what's really neat about that is every payload must be encrypted with a public key retained by Cloudflare whilst only you retain the private key. The setup looks like this:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Pretty self-explanatory and once done, right under where we previously saw the additional logs we now have the ability to decrypt the payload:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

As promised, this requires the private key from earlier:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

And now, finally, we have the actual payload that triggered the rule, seen here with my own test data:

[ " },\n \"billing_reason\": \"subscription_update\",\n \"charge\": null,\n \"collection_method\": \"charge_automatically\",\n \"created\": 1674351619,\n \"currency\": \"usd\",\n \"custom_fields\": null,\n \"customer\": \"cus_MkA71FpZ7XXRlt\",\n \"customer_address\": ", " },\n \"customer_email\": \"troy-hunt+1@troyhunt.com\",\n \"customer_name\": \"Troy Hunt 1\",\n \"customer_phone\": null,\n \"customer_shipping\": null,\n \"customer_tax_exempt\": \"none\",\n \"customer_tax_ids\": [\n\n ],\n \"default_payment_method\": null,\n \"default_source\": null,\n \"default_tax_rates\": [\n\n ],\n \"description\": \"You can manage your subscription (i.e. cancel it or regenerate the API key) at any time by verifying your email address here: https://haveibeenpwned.com/API/Key\",\n \"discount\": null,\n \"discounts\": [\n\n ],\n \"due_date\": null,\n \"ending_balance\": -11804,\n \"footer\": null,\n \"from_invoice\": null,\n \"hosted_invoice_url\": \"https://invoice.stripe.com/i/acct_1EdQYpEF14jWlYDw/test_YWNjdF8xRWRRWXBFRjE0aldsWUR3LF9OREo5SlpqUFFvVnFtQnBVcE91YUFXemtkRHFpQWNWLDY0ODkyNDIw02004bEyljdC?s=ap\",\n \"invoice_pdf\": \"https://pay.stripe.com/invoice/acct_1EdQYpEF14jWlYDw/test_YWNjdF8xRWRRWXBFRjE0aldsWUR3LF9OREo5SlpqUFFvVnFtQnBVcE91YUFXemtkRHFpQWNWLDY0ODkyNDIw02004bEyljdC/pdf?s=ap\",\n \"last_finalization_error\": null,\n \"latest_revision\": null,\n \"lines\": ", " ", " ],\n \"discountable\": false,\n \"discounts\": [\n\n ],\n \"invoice_item\": \"ii_1MSsXfEF14jWlYDwB1nfZvFm\",\n \"livemode\": false,\n \"metadata\": ", " },\n \"period\": ", " },\n \"plan\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4eLcJ7JYF02f\",\n \"tiers_mode\": null,\n \"transform_usage\": null,\n \"trial_period_days\": null,\n \"usage_type\": \"licensed\"\n },\n \"price\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4eLcJ7JYF02f\",\n \"recurring\": ", " },\n \"tax_behavior\": \"unspecified\",\n \"tiers_mode\": null,\n \"transform_quantity\": null,\n \"type\": \"recurring\",\n \"unit_amount\": 15000,\n \"unit_amount_decimal\": \"15000\"\n },\n \"proration\": true,\n \"proration_details\": ", " \"il_1MMjfcEF14jWlYDwoe7uhDPF\"\n ]\n }\n },\n \"quantity\": 1,\n \"subscription\": \"sub_1MMjfcEF14jWlYDwi8JWFcxw\",\n \"subscription_item\": \"si_N6xapJ8gSXdp7W\",\n \"tax_amounts\": [\n\n ],\n \"tax_rates\": [\n\n ],\n \"type\": \"invoiceitem\",\n \"unit_amount_excluding_tax\": \"-14304\"\n },\n ", " ],\n \"discountable\": true,\n \"discounts\": [\n\n ],\n \"livemode\": false,\n \"metadata\": ", " },\n \"period\": ", " },\n \"plan\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4lTSl4axd9mt\",\n \"tiers_mode\": null,\n \"transform_usage\": null,\n \"trial_period_days\": null,\n \"usage_type\": \"licensed\"\n },\n \"price\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4lTSl4axd9mt\",\n \"recurring\": ", " },\n \"tax_behavior\": \"unspecified\",\n \"tiers_mode\": null,\n \"transform_quantity\": null,\n \"type\": \"recurring\",\n \"unit_amount\": 2500,\n \"unit_amount_decimal\": \"2500\"\n },\n \"proration\": false,\n \"proration_details\": ", " },\n \"quantity\": 1,\n \"subscription\": \"sub_1MMjfcEF14jWlYDwi8JWFcxw\",\n \"subscription_item\": \"si_NDJ98tQrCcviJf\",\n \"tax_amounts\": [\n\n ],\n \"tax_rates\": [\n\n ],\n \"type\": \"subscription\",\n \"unit_amount_excluding_tax\": \"2500\"\n }\n ],\n \"has_more\": false,\n \"total_count\": 2,\n \"url\": \"/v1/invoices/in_1MSsXfEF14jWlYDwxHKk4ASA/lines\"\n },\n \"livemode\": false,\n \"metadata\": ", " },\n \"next_payment_attempt\": null,\n \"number\": \"04FC1917-0008\",\n \"on_behalf_of\": null,\n \"paid\": true,\n \"paid_out_of_band\": false,\n \"payment_intent\": null,\n \"payment_settings\": ", " },\n \"period_end\": 1674351619,\n \"period_start\": 1674351619,\n \"post_payment_credit_notes_amount\": 0,\n \"pre_payment_credit_notes_amount\": 0,\n \"quote\": null,\n \"receipt_number\": null,\n \"rendering_options\": null,\n \"starting_balance\": 0,\n \"statement_descriptor\": null,\n \"status\": \"paid\",\n \"status_transitions\": ", " },\n \"subscription\": \"sub_1MMjfcEF14jWlYDwi8JWFcxw\",\n \"subtotal\": -11804,\n \"subtotal_excluding_tax\": -11804,\n \"tax\": null,\n \"test_clock\": null,\n \"total\": -11804,\n \"total_discount_amounts\": [\n\n ],\n \"total_excluding_tax\": -11804,\n \"total_tax_amounts\": [\n\n ],\n \"transfer_data\": null,\n \"webhooks_delivered_at\": 1674351619\n }\n },\n \"livemode\": false,\n \"pending_webhooks\": 1,\n \"request\": ", " },\n \"type\": \"invoice.paid\"\n}" ]

But enough of what's present in the payload, it's what's absent that especially struck me. No obvious XSS patterns, nor SQL injection or any other suspicious looking strings. The request looked totally benign, so why did it trigger the rule?

I wanted to compare the payload of a blocked request with a similar request that wasn't blocked, but they're only logged at Cloudflare when they trigger a rule. No problem, it's easy to grab the full request from Stripe's webhook history so I found one that passed and one that failed and diff'd them both:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

This clearly isn't the full 200 lines, but it's a very similar story over the remainder of the files; tiny differences largely down to dates, IDs, and of course, the customers themselves. No suspicious patterns, no funky characters, nothing visibly abnormal. It's a bit pointless to even mention it because they're near identical, but the payload on the left is the one that passed the firewall whilst the payload on the right was blocked.

Next rabbit hole!

Cloudflare's Internal Rules Engine 🐰 🐰 🐰 🐰 🐰

Completely running out of ideas and options, focus moved to the folks inside Cloudflare who were already aware there was an issue:

We are actively looking into this and will likely release an update to the Cloudflare OWASP ruleset soon

β€” Michael Tremante (@MichaelTremante) January 20, 2023

What followed was a period of back and forth initially with Cloudflare, then Stripe as well with everyone trying to nut out exactly where things were going wrong. Essentially, the process went like this:

Is Cloudflare inadvertently blocking the requests?

Is the OWASP ruleset raising false positives?

Is Stripe issuing requests that are deemed to be malicious?

And round and round we went. At one time, Cloudflare identified a change in the OWASP ruleset which appeared to have resulted in their implementation inadvertently triggering the WAF. They rolled it back and... the same thing happened. We deferred back to Stripe on the assumption that something must have changed on their end, but they couldn't identify any change that would have any sort of material impact. We were stumped, but we also had an easy fix just one last rabbit hole away...

Fine Tuning the Cloudflare WAF 🐰 🐰 🐰 🐰 🐰 🐰

The joy of a managed firewall is that someone else takes all the rigmarole of looking after it away. I'm going to talk more about that in the summary shortly but clearly, that also creates risk as you're delegating control of traffic flow to someone else. Fortunately, Cloudflare gives you a load of configurability with their managed rules which makes it easy to add custom exceptions:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

This meant I could create a simple exception that was much more intelligent than the previous "just let all outbound Stripe IPs in" by filtering down to the specific path those webhooks were flowing in to:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

And finally, because sequence matters, I dragged that rule right up to the top of the pile so it would cause matching inbound requests to skip all the other rules:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

And finally, there were no more rabbits 😊

Lessons Learned

I know what you're thinking - "what was the actual root cause?" - and to be honest, I still don't know. I don't know if it was Cloudflare or OWASP or Stripe or if it even impacted other customers of these services and to be honest, yes, that's a little frustrating. But I learned a bunch of stuff and for that alone, this was a worthwhile exercise I took three big lessons away from:

Firstly, understanding the plumbing of how all these bits work together is super important. I was lucky this wasn't a time critical issue and I had the luxury of learning without being under duress; how rules, payload inspection and exception management all work together is really valuable stuff to understand. And just like that, as if to underscore my first point, I found this right before hitting the publish button on the blog post:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

I added a couple more OWASP rules to the exception in Cloudflare (things like a MySQL rule that was adding 5 points), and we were back in business.

Secondly, I look at the managed WAF Cloudflare provides more favourably than I did before simply because I have a better understanding of how comprehensive it is. I want to write code and run apps on the web, that's my focus, and I want someone else to provide that additional layer on top that continuously adapts to block new and emerging threats. I want to understand it (and I now do, at least certainly better than before), but I don't want managing it day in and day out to be my job.

And finally, IMHO, Stripe needs a better mechanism to report on webhook failures:

In live mode you are notified after 3 days of trying. You can also query the events (https://t.co/0mujOPssV0) to create a running list of statuses on web hooks that have been sent and alert on that via your own app.

β€” Blake Krone (@blakekrone) January 19, 2023

Waiting until stuff breaks really isn't ideal and whilst I'm sure you could plug into the (very extensive) API ecosystem Stripe has, this feels like an easy feature for them to build in. So, Stripe friends, when you read this that's a big "yes" vote from me for some form of anomalous webhook response alerting.

This experience was equal parts frustration and fun and whilst the former is probably obvious, the latter is simply due to having an opportunity to learn something new that's a pretty important part of the service I run. May my frustrated fun story here make your life easier in the future if you face the same problems 😊

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

By Troy Hunt
Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

How best to punish spammers? I give this topic a lot of thought because I spend a lot of time sifting through the endless rubbish they send me. And that's when it dawned on me: the punishment should fit the crime - robbing me of my time - which means that I, in turn, need to rob them of their time. With the smallest possible overhead on my time, of course. So, earlier this year I created Password Purgatory with the singular goal of putting spammers through the hellscape that is attempting to satisfy really nasty password complexity criteria. And I mean really nasty criteria, like much worse than you've ever seen before. I opened-sourced it, took a bunch of PRs, built out the API to present increasingly inane password complexity criteria then left it at that. Until now because finally, it's live, working and devilishly beautiful 😈

Step 1: Receive Spam

This is the easy bit - I didn't have to do anything for this step! But let me put it into context and give you a real world sample:

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

Ugh. Nasty stuff, off to hell for them it is, and it all begins with filing the spam into a special folder called "Send Spammer to Password Purgatory":

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

That's the extent of work involved on a spam-by-spam basis, but let's peel back the covers and look at what happens next.

Step 2: Trigger a Microsoft Power Automate Flow

Microsoft Power Automate (previously "Microsoft Flow") is a really neat way of triggering a series of actions based on an event, and there's a whole lot of connectors built in to make life super easy. Easy on us as the devs, that is, less easy on the spammers because here's what happens as soon as I file an email in the aforementioned folder:

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

Using the built in connector to my Microsoft 365 email account, the presence of a new email in that folder triggers a brand new instance of a flow. Following, I've added the "HTTP" connector which enables me to make an outbound request:

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

All this request does is makes a POST to an API on Password Purgatory called "create-hell". It passes an API key because I don't want just anyone making these requests as it will create data that will persist at Cloudflare. Speaking of which, let's look at what happens over there.

Step 3: Call a Cloudflare Worker and Create a Record in KV

Let's start with some history: Back in the not too distant past, Cloudflare wasn't a host and instead would just reverse proxy requests through to origin services and do cool stuff with them along the way. This made adding HTTPS to any website easy (and free), added heaps of really neat WAF functionality and empowered us to do cool things with caching. But this was all in-transit coolness whilst the app logic, data and vast bulk of the codebase sat at that origin site. Cloudflare Workers started to change that and suddenly we had code on the edge running in hundreds of nodes around the world, nice and close to our visitors. Did that start to make Cloudflare a "host"? Hmm... but the data itself was still on the origin service (transient caching aside). Fast forward to now and there are multiple options to store data on Cloudflare's edges including their (presently beta) R2 service, Durable Objects, the (forthcoming) D1 SQL database and of most importance to this blogpost, Workers KV. Does this make them a host if you can now build entire apps within their environment? Maybe so, but let's skip the titles for now and focus on the code.

All the code I'm going to refer to here is open source and available in the public Password Purgatory Logger Github repo. Very early on in the index.js file that does all the work, you'll see a function called "createHell" which is called when the flow step above runs. That code creates a GUID then stores it in KV after which I can easily view it in the Cloudflare dashboard:

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

There's no value yet, just a key and it's returned via a JSON response in a property called "kvKey". To read that back in the flow, I need a "Parse JSON" step with a schema I generated from a sample:

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

At this point I now have a unique ID in persistent storage and it's available in the flow, which means it's time to send the spammer an email.

Step 4: Invite the Spammer to Hell

Because it would be rude not to respond, I'd like to send the spammer back an email and invite them to my very special registration form. To do this, I've grabbed the "Reply to email" connector and fed the kvKey through to a hyperlink:

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

It's an HTML email with the key hidden within the hyperlink tag so it doesn't look overtly weird. Using this connector means that when the email sends, it looks precisely like I've lovingly crafted it myself:

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

With the entire flow now executed, we can view the history of each step and see how the data moves between them:

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

Now, we play the waiting game 😊

Step 5: Log Spammer Pain

Wasting spammer time in and of itself is good. Causing them pain by having them attempt to pass increasingly obtuse password complexity criteria is better. But the best thing - the pièce de résistance - is to log that pain and share it publicly for our collective entertainment 🀣

So, by following the link the spammer ends up here (you're welcome to follow that link and have a play with it):

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

The kvKey is passed via the query string and the page invites the spammer to begin the process of becoming a partner. All they need to leave is an email address... and a password. That page then embeds 2 scripts from the Password Purgatory website, both of which you can find in the open source and public Github repository I created in the original blog post. Each attempt at creating an account sends off the password only to the original Password Purgatory API I created months ago, after which it responds with the next set of criteria. But each attempt also sends off both the criteria that was presented (none on the first go, then something increasingly bizarre on each subsequent go), the password they tried to use to satisfy the criteria and the kvKey so it can all be tied together. What that means is that the Cloudflare Workers KV entry created earlier gradually builds up as follows:

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

There are a couple of little conditions built into the code:

  1. If a kvKey is passed in the log request that doesn't actually exist on Cloudflare, HTTP 404 is returned. This is to ensure randos out there don't attempt to submit junk logs into KV.
  2. Once the first password is logged, there's a 15 minute window within which any further passwords can be logged. The reason is twofold: firstly, I don't want to share the spammers attempts publicly until I'm confident no more passwords can be logged just in case they add PII or something else inappropriate. Secondly, once they know the value of the kvKey a non-spammer could start submitting logs (for example, when I tweet it later on or share it via this blog post).

That's everything needed to lure the spammer in and record their pain, now for the really fun bit 😊

Step 6: Enjoy Revelling in Spammer Pain

The very first time the spammer's password attempt is logged, the Cloudflare Worker sends me an email to let me know I have a new spammer hooked (this capability using MailChannels only launched this year):

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

It was so exciting getting this email yesterday, I swear it's the same sensation as literally getting a fish on your line! That link is one I can share to put the spammer's pain on display for the world to see. This is achieved with another Cloudflare Workers route that simply pulls out the logs for the given kvKey and formats it neatly in an HTML response:

Sending Spammers to Password Purgatory with Microsoft Power Automate and Cloudflare Workers KV

Ah, satisfaction 😊 I listed the amount of time the spammer burned with a goal to further refining the complexity criteria in the future to attempt to keep them "hooked" for longer. Is the requirement for a US post code in the password a bit too geographically specific, for example? Time will tell and I wholeheartedly welcome PRs to that effect in the original Password Purgatory API repo.

Oh - and just to ensure traction and exposure are maximised, there's a neatly formatted Twitter card that includes the last criteria and password used, you know, the ones that finally broke the spammer's spirit and caused them to give up:

Spammer burned a total of 80 seconds in Password Purgatory 😈 #PasswordPurgatory https://t.co/VwSCHNZ2AW

β€” Troy Hunt (@troyhunt) August 3, 2022

Summary

Clearly, I've taken a great deal of pleasure in messing with spammers and I hope you do too. I've gotta be honest - I've never been so excited to go through my junk mail! But I also thoroughly enjoyed putting this together with Power Automate and Workers KV, I think it's super cool that you can pull an app together like this with a combination of browser-based config plus code and storage that runs directly in hundreds of globally distributed edge nodes around the world. I hope the spammers appreciate just how elegant this all is 🀣

❌