There are new available articles, click to refresh the page.

Before yesterdayTroy Hunt

Inside the Massive Alleged AT&T Data Breach

By Troy Hunt

I hate having to use that word - "alleged" - because it's so inconclusive and I know it will leave people with many unanswered questions. (Edit: 12 days after publishing this blog post, it looks like the "alleged" caveat can be dropped, see the addition at the end of the post for more.) But sometimes, "alleged" is just where we need to begin and over the course of time, proper attribution is made and the dots are joined. We're here at "alleged" for two very simple reasons: one is that AT&T is saying "the data didn't come from us", and the other is that I have no way of proving otherwise. But I have proven, with sufficient confidence, that the data is real and the impact is significant. Let me explain:

Firstly, just as a primer if you're new to this story, read BleepingComputer's piece on the incident. What it boils down to is in August 2021, someone with a proven history of breaching large organisations posted what they claimed were 70 million AT&T records to a popular hacking forum and asked for a very large amount of money should anyone wish to purchase the data. From that story:

From the samples shared by the threat actor, the database contains customers' names, addresses, phone numbers, Social Security numbers, and date of birth.

Fast forward two and a half years and the successor to this forum saw a post this week alleging to contain the entire corpus of data. Except that rather than put it up for sale, someone has decided to just dump it all publicly and make it easily accessible to the masses. This isn't unusual: "fresh" data has much greater commercial value and is often tightly held for a long period before being released into the public domain. The Dropbox and LinkedIn breaches, for example, occurred in 2012 before being broadly distributed in 2016 and just like those incidents, the alleged AT&T data is now in very broad circulation. It is undoubtedly in the hands of thousands of internet randos.

AT&T's position on this is pretty simple:

AT&T continues to tell BleepingComputer today that they still see no evidence of a breach in their systems and still believe that this data did not originate from them.

The old adage of "absence of evidence is not evidence of absence" comes to mind (just because they can't find evidence of it doesn't mean it didn't happen), but as I said earlier on, I (and others) have so far been unable to prove otherwise. So, let's focus on what we can prove, starting with the accuracy of the data.

The linked article talks about the author verifying the data with various people he knows, as well as other well-known infosec identities verifying its accuracy. For my part, I've got 4.8M Have I Been Pwned (HIBP) subscribers I can lean on to assist with verification, and it turns out that 153k of them are in this data set. What I'll typically do in a scenario like this is reach out to the 30 newest subscribers (people who will hopefully recall the nature of HIBP from their recent memory), and ask them if they're willing to assist. I linked to the story from the beginning of this blog post and got a handful of willing respondents for whom I sent their data and asked two simple questions:

Does this data look accurate?
Are you an AT&T customer and if not, are you a customer of another US telco?

The first reply I received was simple, but emphatic:

This individual had their name, phone number, home address and most importantly, their social security number exposed. Per the linked story, social security numbers and dates of birth exist on most rows of the data in encrypted format, but two supplemental files expose these in plain text. Taken at face value, it looks like whoever snagged this data also obtained the private encryption key and simply decrypted the vast bulk (but not all of) the protected values.

The above example simply didn't have plain text entries for the encrypted data. Just by way of raw numbers, the file that aligns with the "70M" headline actually has 73,481,539 lines with 49,102,176 unique email addresses. The file with decrypted SSNs has 43,989,217 lines and the decrypted dates of birth file only has 43,524 rows. (Edit: the reason for this later became clear - there is only one entry per date of birth which is then referenced from multiple records.) The last file, for example, has rows that look just like this:

.encrypted_value='*0g91F1wJvGV03zUGm6mBWSg==' .decrypted_value='1996-07-18'

That encrypted value is precisely what appears in the large file hence providing an easy way of matching all the data together. But those numbers also obviously mean that not every impacted individual had their SSN exposed, and most individuals didn't have their date of birth leaked. (Edit: per above, the same entries in the DoB file are referenced by multiple source records so whilst not every record had a DoB recorded, the difference isn't as stark as I originally reported.)

As I'm fond of saying, there's only one thing worse than your data appearing on the dark web: it's appearing on the clear web. And that's precisely where it is; the forum this was posted to isn't within the shady underbelly of a Tor hidden service, it's out there in plain sight on a public forum easily accessed by a normal web browser. And the data is real.

That last response is where most people impacted by this will now find themselves - "what do I do?" Usually I'd tell them to get in touch with the impacted organisation and request a copy of their data from the breach, but if AT&T's position is that it didn't come from them then they may not be much help. (Although if you are a current or previous customer, you can certainly request a copy of your personal information regardless of this incident.) I've personally also used identity theft protection services since as far back as the 90's now, simply to know when actions such as credit enquiries appear against my name. In the US, this is what services like Aura do and it's become common practice for breached organisations to provide identity protection subscriptions to impacted customers (full disclosure: Aura is a previous sponsor of this blog, although we have no ongoing or upcoming commercial relationship).

What I can't do is send you your breached data, or an indication of what fields you had exposed. Whilst I did this in that handful of aforementioned cases as part of the breach verification process, this is something that happens entirely manually and is infeasible en mass. HIBP only ever stores email addresses and never the additional fields of personal information that appear in data breaches. In case you're wondering why that is, we got a solid reminder only a couple of months ago when a service making this sort of data available to the masses had an incident that exposed tens of billions of rows of personal information. That's just an unacceptable risk for which the old adage of "you cannot lose what you do not have" provides the best possible fix.

As I said in the intro, this is not the conclusive end I wanted for this blog post... yet. As impacted HIBP subscribers receive their notifications and particularly as those monitoring domains learn of the aliases in the breach (many domain owners use unique aliases per service they sign up to), we may see a more conclusive outcome to this incident. That may not necessarily be confirmation that the data did indeed originate from AT&T, it could be that it came from a third party processor they use or from another entity altogether that's entirely unrelated. The truth is somewhere there in the data, I'll add any relevant updates to this blog post if and when it comes out.

As of now, all 49M impacted email addresses are searchable within HIBP.

Edit (31 March): AT&T have just released a short statement making 2 important points:

AT&T data-specific fields were contained in a data set

it is not yet known whether the data in those fields originated from AT&T or one of its vendors

They've also been mass-resetting account passcodes after TechCrunch apparently alerted AT&T to the presence of these in the data set. That article also includes the following statement from AT&T:

Based on our preliminary analysis, the data set appears to be from 2019 or earlier, impacting approximately 7.6 million current AT&T account holders and approximately 65.4 million former account holders

Between originally publishing this blog post and AT&T's announcements today, there have been dozens of comments left below that attribute the source of the breach to AT&T in ways that made it increasingly unlikely that the data could have been sourced from anywhere else. I know that many journos (and myself) reached out to folks in AT&T to draw their attention to this, I'm happy to now end this blog post by quoting myself from the opening para 😊

But sometimes, "alleged" is just where we need to begin and over the course of time, proper attribution is made and the dots are joined.

Related tags
- ❌
- Have
- I
- Been
- Pwned
- Security
March 19^th 2024 at 06:39

Troy Hunt
How Spoutible’s Leaky API Spurted out a Deluge of Personal Data
February 5^th 2024 at 07:36

How Spoutible’s Leaky API Spurted out a Deluge of Personal Data

By Troy Hunt

Ever hear one of those stories where as it unravels, you lean in ever closer and mutter “No way! No way! NO WAY!” This one, as far as infosec stories go, had me leaning and muttering like never before. Here goes:

Last week, someone reached out to me with what they claimed was a Spoutible data breach obtained by exploiting an enumerable API. Just your classic case of putting someone else's username in the URL and getting back data about them, which at first glance I assumed was another scraping situation like we recently saw with Trello. They sent me a file with 207k scraped records and a URL that looked like this:

https://spoutible.com/sptbl_system_api/main/user_profile_box?username=troyhunt

But they didn't send me my account, in fact I didn't even have an account at the time and if I'm honest, I had to go and look up exactly what Spoutible was. The penny dropped as I read into it: Spoutible emerged in the wake of Elon taking over Twitter, which left a bunch of folks unhappy with their new social overlord so they sought out alternate platforms. Mastodon and Bluesky were popular options, Spoutible was another which was clearly intended to be an alternative to the incumbent.

In order to unravel this saga in increasing increments of "no way!" reactions, let's just start with the basics of what that API endpoint was returning:

{
  err_code: 0,
  status: 200,
  user: {
    id: 735525,
    username: "troyhunt",
    fname: "Troy",
    lname: "Hunt",
    about: "Creator of Have I Been Pwned. Microsoft Regional Director. Pluralsight author. Online security, technology and “The Cloud”. Australian.",

Pretty standard stuff and I'd expect any of the major social platforms to do exactly the same thing. Name, username, bio and ID are all the sorts of data attributes you'd expect to find publicly available via an API or rendered into the HTML of the website. These fields, however, are quite different:

email: "[redacted]",
ip_address: "[redacted]",
verified_phone: "[redacted]",
gender: "M",

Ok, that's now a "no way!" because I had no expectation at all of any of that data being publicly available (note: phone number is optional, I chose to add mine). It's certainly not indicated on the pages where I entered it:

But it's also not that different to previous scraping incidents; the aforementioned Trello scrape exposed the association of email addresses to usernames and the Facebook scrape of a few years ago did the same thing with phone numbers. That's not unprecedented, but this is:

password: "$2y$10$B0EhY/bQsa5zUYXQ6J.NkunGvUfYeVOH8JM1nZwHyLPBagbVzpEM2",

No way! Is it... real? Is that genuinely a bcrypt hash of my own password? Yep, that's exactly what it is:

The Spoutible API enabled any user to retrieve the bcrypt hash of any other user's password.

I had to check, double check then triple check to make sure this was the case because I can only think of one other time I've ever seen an API do this...

During my 14 years at Pfizer, I once reviewed an iOS app built for us by a low-cost off-shored development shop. I proxied the app through Fiddler, watched the requests and found an API that was returning every user record in the system and for each user, their corresponding password in plain text. When quizzing the developers about this design decision, their response was - and I kid you not, this isn't made up - "don't worry, our users don't use Fiddler" 🤦‍♂️

</TangentialStory>

I cannot think of any reason ever to return any user's hashed password to any interface, including an appropriately auth'd one where only the user themselves would receive it. There is never a good reason to do this. And even though bcrypt is the accepted algorithm of choice for storing passwords these days, it's far from uncrackable as I showed 7 years ago now after the Cloudpets breach. Here I used a small dictionary of weak, predictable passwords and easily cracked a bunch of the hashes. Weak passwords like... "spoutible". Wondering just how crazy things would get, I checked the change password page and found I could easily create a password of 6 or more characters (so long as it didn't exceed 20 characters) with no checks on strength whatsoever:

Strong hashing algorithms like bcrypt are weakened when poor password choices are allowed and strong password choices (such as having more than 20 characters in it), are blocked. For exactly the same reason breached services advise customers to change their passwords even when hashed with a strong algorithm, all Spoutible users are now in the same boat - change you password!

But fortunately these days many people make use of 2 factor authentication to protect against account takeover attacks where the adversary knows the password. Which brings us to the next piece of data the API returned:

2fa_secret: "7GIVXLSNKM47AM4R",
2fa_enabled_at: "2024-02-03 02:26:11",
2fa_backup_code: "$2y$10$6vQRDRDHVjyZdndGUEKLM.gmIIZVDq.E5NWTWti18.nZNQcqsEYki",

Oh wow! Why?! Let's break this down and explore both the first and last line. The 2FA secret is the seed that's used to generate the one time password to be used as the second factor. If you - as an attacker - know this value then 2FA is rendered useless. To test that this was what it looked like, I asked Stefán to retrieve my data from the public API, take the 2FA secret and send me the OTP:

It was a match. If Stefán could have cracked my bcrypted password hash (and he's a smart guy so "spoutible" would have definitely been in his word list), he could have then passed the second factor challenge. And the 2FA backup code? Thinking that would also be exactly what it looked like, I'd screen grabbed it when enabling 2FA:

Now, using the same bcrypt hash checker as I did for the password, here's what I found:

What I just don't get is if you're going to return the 2FA secret anyway, why bother bcrypting the backup code? And further, it's only a 6 digit number, do you know how long it takes to crack a bcrypted 6 digit number? Let's find out:

570075, 2m59s
— Martin Sundhaug (@sundhaug92@mastodon.social) (@sundhaug92) February 4, 2024

Many other people worked it out in single-digit minutes as well, but Martin did it fastest at the time of writing so he gets the shout-out 😊

You know how I said you'd keep leaning in further and further? Yeah, we're not done yet because then I found this:

em_code: "c62fcf3563dc3ab38d52ba9ddb37f9b1577d1986"

Maybe I've just seen too many data breaches before, but as vague as this looks I had a really good immediate hunch of what it was but just to be sure, I logged out and went to the password reset page:

Leaning in far enough now, anticipating what's going to happen next? Yep, it's exactly what you thought:

NO WAY! Exposed password reset tokens meant that anyone could immediately takeover anyone else's account 🤯

After changing the password, no notification email was sent to the account holder so just to make things even worse, if someone's account was taken over using this technique they'd have absolutely no idea until they either realised their original password no longer worked or their account started spouting weird messages. There's also no way to see if there are other active sessions, for example the way Twitter shows them:

Further, changing the password doesn't invalidate existing sessions so as best as I can tell, if someone has successfully accessed someone else's Spoutible account there's no way to know and no way to boot them out again. That's going to make recovering from this problematic unless Spoutible has another mechanism to invalidate all active sessions.

The one saving grace is that the token was rotated after reset so you can't use the one in the image above, but of course the new one was now publicly exposed in the API! And there's no 2FA challenge on password reset either but of course even if there was, well, you already read this far so you know how that could have been easily circumvented.

There's just one more "oh wow!" remaining, and it's the ease with which the vulnerable API was found. Spoutible has a feature called Pods and when you browse to that page, people listening to the pod are displayed with the ability to hover over their profile and display further information. For example, here's Rosetta and if we watch the request that's made in the dev tools...

By design, all the personal information including email and IP address, phone number, gender, bcrypt hashed password, 2FA secret and backup code and the code that can be immediately used to reset the password is returned to every single person that uses this feature. How many times has this API spouted troves of personal data out to people without them even knowing? Who knows, but I do know it wasn't the only API doing that because the one that listed the pods also did it:

Because the vulnerable APIs was requested organically as a natural part of using the service as it was intended, Spoutible almost certainly won't be able to fully identify abuse of it. To use the definition of the infamous Missouri governor who recently attempt to prosecute a journalist for pressing F12, everyone who used those features inadvertently became a hacker.

Just one last finding and I've not been able to personally validate it so let's keep it out of "oh wow!" scope: the individual that sent me the data and details of the vulnerability said that the exposed data includes access tokens for other platforms. A couple of months ago, Spoutible announced cross-posting to Mastodon and Bluesky and my own data does have a "cross_posting_auth" node, albeit set to null. I couldn't see anywhere within the UI to enable this feature, but there are profiles with values in there. During the disclosure process (more on that soon), Spoutible did say that those value were encrypted and without evidence of a private key compromise, they believe they're safe.

Here's my full record as it was originally returned by the vulnerable API:

To be as charitable as possible to Spoutible, you could argue that this is largely just the one vulnerability that is the inadvertent exposure of internal data via a public API. This is data that has a legitimate purpose in their system and it may simply be a case of a framework automatically picking all entity attributes up from the data tier and returning them via the UI. But it's the circumstances that allowed this to happen and then exacerbated the problem when it did that concern me more; clearly there's been no security review around this feature because it was so easily discoverable (at least there certainly wasn't review whilst it was live), nor has been any thought put in to notifying people of potential account takeovers or providing them with the means to invalidate other sessions. Then there are periphery issues such as very weak password rules that make cracking bcrypt so much easier, weak 2FA backup codes and pointless bcrypting of them. Not major issues in and of themselves, but they amplify the problems the exposed data presents.

Clearly this required disclosure before publication, unfortunately Spoutible does not publish a security.txt file so I went directly to the founder Christopher Bouzy on both Twitter and email (obviously I could have reached out on Spoutible, but he's very active on Twitter and my profile has more credibility there than a brand new Spoutible account). Here's the timeline, all AEST:

4 Feb, 15:30: Initial outreach asking for security contact
4 Feb, 17:27: Response from Spoutible
4 Feb, 18:31: Full details provided to Spoutible
4 Feb, 19:48 (or earlier): API is fixed
5 Feb 01:28 (or earlier): Announcement made about the incident
5 Feb 07:52: Spoutible confirmed all em_code values have been rotated

To give credit where it's due, Spoutible's response time was excellent. In the space of only about 4 hours, the data returned by the API had a huge number of attributes trimmed off it and now aligns with what I'd expect to see (although the 207k previously scraped records obviously still contain all the data). I'll also add that Christopher's communication with me commendable; he's clearly genuinely passionate about the platform and was dismayed to learn of the vulnerability. I've dealt with many founders of projects in the past that had suffered data breaches and it's especially personal for them, having poured so much of themselves into it.

Here's their disclosure in its entirety:

The revised API is now returning over 80% less data and looks like this:

If you're a detail person, yes, the forward slashes are no longer escaped and the remaining fields are ordered slightly differently so it looks like the JSON encoder has changed. In case you're interested, here's a link to a diff between the two with a little bit of manipulation to make it easier to see precisely what's changed.

As to my own advice to Spoutible users, here are the actions I'd recommend:

Change your Spoutible password and change any other account you reused that password on
If you had 2FA turned on for Spoutible, turn it off then back on again so that it generates a different secret
If you enabled cross-posting to Mastodon or Bluesky, out of an abundance of caution you should invalidate the keys on those platforms
Recognise that your email address, IP address, phone number if you added it and any intentionally publicly visible data associated to your profile may have been exposed

The 207k exposed email addresses that were sent to me are now searchable in Have I Been Pwned and my impacted subscribers have received email notifications.

Related tags
- ❌
- Security
February 5^th 2024 at 07:36

Troy Hunt
Safe, Secure, Anonymous, and Other Misleading Claims
October 4^th 2023 at 08:44

Safe, Secure, Anonymous, and Other Misleading Claims

By Troy Hunt

Imagine you wanted to buy some shit on the internet. Not the metaphorical kind in terms of "I bought some random shit online", but literal shit. Turds. Faeces. The kind of thing you never would have thought possible to buy online until... Shitexpress came along. Here's a service that enables you to send an actual piece of smelly shit to "An irritating colleague. School teacher. Your ex-wife. Filthy boss. Jealous neighbour. That successful former classmate. Or all those pesky haters." But it would be weird if the intended recipient of the aforementioned shit knew it came from you, so, Shitexpress makes a bold commitment:

100% anonymous! Not 90%, not 95% but the full whack 100%! And perhaps they really did deliver on that promise, at least until one day last year:

New sensitive breach: Faeces delivery service Shitexpress had 24k email addresses breached last week. Data also included IP and physical addresses, names, and messages accompanying the posted shit. 76% were already in @haveibeenpwned. Read more: https://t.co/7R7vdi1ftZ
— Have I Been Pwned (@haveibeenpwned) August 16, 2022

When you think about it now, the simple mechanics of purchasing either metaphorical or literal shit online dictates collecting information that, if disclosed, leaves you anything but anonymous. At the very least, you're probably going to provide your own email address, your IP will be logged somewhere and payment info will be provided that links back to you (Bitcoin was one of many payment options and is still frequently traceable to an identity). Then of course if it's a physical good, there's a delivery address although in the case above, that's inevitably not going to be the address of the purchaser (sending yourself shit would also just be weird). Which is why following the Shitexpress data breach, we can now easily piece together information such as this:

Here we have an individual who one day last year, went on an absolute (literal) shit-posting bender posting off half a dozen boxes of excrement to heavy hitters in the US justice system. For 42 minutes, this bright soul (whose IP address was logged with each transaction), sent abusive messages from their iPhone (the user agent is also in the logs) to some of the most powerful people in the land. Did they only do this on the assumption of being "100% anonymous"? Possibly, it certainly doesn't seem like the sort of activity you'd want to put your actual identity to but hey, here we are. Who knows if there were any precautions taken by this individual to use an IP that wasn't easily traceable back to them, but that's not really the point; an attribute that will very likely be tied back to a specific individual if required was captured, stored and then leaked. IP not enough to identify someone? Hmmm... I wonder what other information might be captured during a purchase...

Uh, yeah, that's all pretty personally identifiable! And there are nearly 10k records in the "invoices_stripe.csv" file that include invoice IDs so if you paid by credit card, good luck not having that traced back to you (KYC obligations ain't real compatible with anonymously posting shit).

Now, where have we heard all this before? The promise of anonymity and data protection? Hmmm...

"Anonymous". "Discreet". That was July 2015, and we all know what happened next. It wasn't just the 30M+ members of the adultery website that were exposed in the breach, it was also the troves of folks who joined the service, thought better of it, paid to have their data deleted and then realised the "full delete" service, well, didn't. Why did they think their data would actually be deleted? Because the website told them it would be.

Vastaamo, the Finnish service referred to "the McDonalds of psychotherapy" was very clear around the privacy of the data they collected:

Until a few years ago when the worst conceivable scenario was realised:

A security flaw in the company’s IT systems had exposed its entire patient database to the open internet—not just email addresses and social security numbers, but the actual written notes that therapists had taken.

What made the Vastaamo incident particularly insidious was that after failing to extract the ransom demand from the company itself, the perpetrator (for whom things haven't worked out so well this year), then proceeded to ransom the individuals:

If we do not receive this payment within 24 hours, you still have another 48 hours to acquire and send us 500 euros worth of Bitcoins. If we still don't receive our money after this, your information will be published: your address, phone number, social security number, and your exact patient report, which includes e.g. transcriptions of your conversations with the Receptionist's therapist/psychiatrist.

And then it was all dumped publicly anyway.

Here's what I'm getting at with all this:

Assurances of safety, security and anonymity aren't statements of fact, they're objectives, and they may not be achieved

I've written this post as I have so many others so that it may serve as a reference in the future. Time and time again, I see the same promises as above as though somehow words on a webpage are sufficient to ensure data security. You can trust those words just about as much as you can trust the promise of being able to choose the animal the excrement is sourced from, which turns out to be total horseshit 🐎

Related tags
October 4^th 2023 at 08:44

Troy Hunt
Join my Twitter Subscription for the Inside Word on Data Breaches
April 19^th 2023 at 09:02

Join my Twitter Subscription for the Inside Word on Data Breaches

By Troy Hunt

I want to try something new here - bear with me here:

Data breach processing is hard and the hardest part of all is getting in touch with organisations and disclosing the incident before I load anything into Have I Been Pwned (HIBP). It's also something I do almost entirely in isolation, sitting here on my own trying to put the pieces together to work out what happened. I don't want to just chuck data into HIBP and the first an organisation knows about it is angry customers smashing out their inbox, there's got to be a reasonable attempt from my side to get in touch, disclose and then coordinate on communication to impacted parties and the public at large. Very frequently, I end up reaching out publicly and asking for a security contact at the impacted company. I dislike doing this because it's a very public broadcast that regular followers easily read between the lines of and draw precisely the correct conclusion before the organisation has had a chance to respond. And the vast majority of the time, nobody has a contact anyway but a small handful of people trawl through the site and find obscure email addresses or look up employees on LinkedIn or similar. There has to be a better way.

Yesterday, I posted this tweet:

After I shared this, multiple people said "ah, but at least we have GDPR", as though that somehow fixes the problem. No, it doesn't, at least not in any absolute sense. Case in point: I'm now going through the disclosure process after someone sent me data from a company HQ'd well… https://t.co/yMYIlFXkCU
— Troy Hunt (@troyhunt) April 18, 2023

And around the same time I got to thinking about Twitter Subscriptions as a channel for communication with a much more carefully curated subset of the 214k people that follow my public feed. Tweets within a subscription are visible only to subscribers so the public broadcast problem goes away. (Of course, you'd always work on the assumption that a subscriber could take a tweet and share it more broadly, but the intention is to make content visible to a much smaller, more dedicated audience.) Issues around where to find contact details, verification of the breach, what's in it or all sorts of other discussions I'd rather not have with the masses prior to loading into HIBP can be had with a much more curated audience.

I don't know how well this will work and it's something I've come up with on a whim (hey, I'm nothing if not honest about it!) But that's also how HIBP started and sometimes the best ideas just emerge out of gut feel. So, I set up the subscription and of the 3 pricing options Twitter suggested ($3, $5 or $10 per month), I went middle of the road and made it 5 bucks (that's American bucks, YMMV). You can sign up directly from the big "Subscribe" button on my Twitter profile or follow the link behind this text. Just one suggestion from Twitter's "welcome on board" email if you do:

Encourage your followers to Subscribe on the web. Web Subscriptions go through Stripe, which takes a 3% fee from each purchase, compared to the 30% fee that Apple and Google currently take. Meaning web Subscriptions may potentially lead to more money in your pocket.

My hope is that this subscription helps me have much more candid discussions about data breaches with people that are invested in following them than the masses that see my other tweets. I also hope it helps me go through this process feeling a little less isolated from the world and with the support of some of the great people I regularly engage with more publicly. If that's you, then give it a go and if it isn't floating your boat, cancel the subscription. I think there's something in this and I'd appreciate all the support I can get to help make it a worthwhile exercise.

Related tags
- ❌
- Security
April 19^th 2023 at 09:02

Troy Hunt
Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰
February 20^th 2023 at 07:47

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

By Troy Hunt

I found myself going down a previously unexplored rabbit hole recently, or more specifically, what I thought was "a" rabbit hole but in actual fact was an ever-expanding series of them that led me to what I refer to in the title of this post as "6 rabbits deep". It's a tale of firewalls, APIs and sifting through layers and layers of different services to sniff out the root cause of something that seemed very benign, but actually turned out to be highly impactful. Let's go find the rabbits!

The Back Story

When you buy an API key on Have I Been Pwned (HIBP), Stripe handles all the payment magic. I love Stripe, it's such an awesome service that abstracts away so much pain and it's dead simple to integrate via their various APIs. It's also dead simple to configure Stripe to send notices back to your own service via webhooks. For example, when an invoice is paid or a customer is updated, Stripe sends information about that event to HIBP and then lists each call on the webhooks dashboard in their portal:

There are a whole range of different events that can be listened to and webhooks fired, here we're seeing just a couple of them that are self explanatory in name. When an invoice is paid, the callback looks something like this:

HIBP has received this call and updated it's own DB such that for a new customer, they can now retrieve an API key or for an existing customer whose subscription has renewed, the API key validity period has been extended. The same callback is also issued when someone upgrades an API key, for example when going from 10RPM (requests per minute) to 50RPM. It's super important that HIBP gets that callback so it can appropriately upgrade the customer's key and they can immediately begin making more requests. When that call doesn't happen, well, let's go down the first rabbit hole.

The Failed API Key Upgrade 🐰

This should never happen:

This came in via HIBP's API key support portal and is pretty self-explanatory. I checked the customer's account on Stripe and it did indeed show an active 50RPM subscription, but when drilling down into the associated payment, I found the following:

Ok, so at least I know where things have started to go wrong, but why? Over to the webhooks dashboard and into the failed payments and things look... suboptimal:

Dammit! Fortunately this is only a small single-digit percentage of all callbacks, but every time this fails it's either stopping someone like the guy above from making the requests they've paid for or potentially, causing someone's API key to expire even though they've paid for it. The latter in particular I was really worried about as it would nuke their key and whatever they'd built on top of it would cease to function. Fortunately, because that's such an impactful action I'd built in heaps of buffer for just such an occurrence and I'd gotten onto this issue quickly, but it was disconcerting all the same.

So, what's happening? Well, the response is HTTP 403 "Forbidden" and the body is clearly a Cloudflare challenge page so something at their end is being triggered. Looks like it's time to go down the next rabbit hole.

Cloudflare's Firewall and Logs 🐰 🐰

Desperate just to quickly restore functionality, I dropped into Cloudflare's WAF and allowed all Stripe's outbound IPs used for webhooks to bypass their security controls:

This wasn't ideal, but it only created risk for requests originating from Stripe and it got things up and running again quickly. With time up my sleeve I could now delve deeper and work out precisely what was going on, starting with the logs. Cloudflare has a really extensive set of APIs that can control a heap of features of the service, including pulling back logs (note: this is a feature of their Enterprise plan). I queried out a slice of the logs corresponding to when some of the 403s from Stripe's dashboard occurred and found 2 entries similar to this one:

{"BotScore":1,"BotScoreSrc":"Verified Bot","CacheCacheStatus":"unknown","ClientASN":16509,"ClientCountry":"us","ClientIP":"54.187.205.235","ClientRequestHost":"haveibeenpwned.com","ClientRequestMethod":"POST","ClientRequestReferer":"","ClientRequestURI":"[redacted]","ClientRequestUserAgent":"Stripe/1.0 (+https://stripe.com/docs/webhooks)","EdgeRateLimitAction":"","EdgeResponseStatus":403,"EdgeStartTimestamp":1674073983931000000,"FirewallMatchesActions":["managedChallenge"],"FirewallMatchesRuleIDs":["6179ae15870a4bb7b2d480d4843b323c"],"FirewallMatchesSources":["firewallManaged"],"OriginResponseStatus":0,"WAFAction":"unknown","WorkerSubrequest":false}

That's one of Stripe's outbound IP's on 54.187.205.235 and the "FirewallMatchesRuleIDs" collection has a value in it. Ergo, something about this request triggered the firewall and caused it to be challenged. I'm sure many of us have gone through the following thought process before:

What did I change?

Did I change anything?

Did they change something?

Except "they" could have been either Cloudflare or Stripe; if it wasn't me (and I was fairly certain it wasn't), was it a Cloudflare change to the rules or a Stripe change to a webhook payload that was now triggering an existing rule? Time to dig deeper again so it's over to the Cloudflare dashboard and down into the WAF events for requests to the webhook callback path:

Yep, something proper broke! Let's drill deeper and look at recent events for that IP:

As you dig deeper through troubleshooting exercises like this, you gradually turn up more and more information that helps piece the entire puzzle together. In this case, it looks like the "Inbound Anomaly Score Exceeded" rule was being triggered. What's that? And why? Time to go down another rabbit hole.

The Cloudflare OWASP Core Ruleset 🐰 🐰 🐰

So, deeper and deeper down the rabbit holes we go, this time into the depths of the requests that triggered the managed rule:

Well that's comprehensive 🙂

There's a lot to unpack here so let's begin with the ruleset that the previously identified "Inbound Anomaly Score Exceeded" rule belongs to, the Cloudflare OWASP Core Ruleset:

The Cloudflare OWASP Core Ruleset is Cloudflare’s implementation of the OWASP ModSecurity Core Rule SetOpen external link (CRS). Cloudflare routinely monitors for updates from OWASP based on the latest version available from the official code repository.

That link is yet another rabbit hole altogether so let me summarise succinctly here: Cloudflare uses OWASP's rules to identify anomalous traffic based on a customer-defined paranoia level (how strict you want to be) and then applies a score threshold (also customer-defined) at which an action will be taken, for example challenging the request. What I learned as this saga progressed is that the "Inbound Anomoly Score Exceeded" rule is actually a rollup of the rules beneath it. The OWASP score of "26" is the sum of the 6 rules listed beneath it and once it exceeds 25, the superset rule is triggered.

Further - and this is the really important bit - Cloudflare routinely updates the rules from OWASP which makes sense because these are ever-evolving in response to new threats. And when did they last upgrade the rules? It looks like they announced it right before I started having issues:

Whilst it's not entirely clear from above when this release was scheduled to occur, I did reach out to Cloudflare support and was advised it had already taken place:

Please note that we did bump the OWASP version, which we are integrating with to 3.3.4 as noted on our scheduled changes.

So maybe it's not Cloudflare's fault or Stripe's fault, but OWASP's fault? In fairness to all, I don't think it's anyone's fault per se and is instead just an unfortunate result of everyone doing their best to keep the bad guys out. Unless... it really is Stripe's fault because there's something in the request payload that was always fishy and is now being caught? But why for only some requests and not others? Next rabbit!

Cloudflare Payload Logging 🐰 🐰 🐰 🐰

Sometimes, people on the internet lose their minds a bit over things they really shouldn't. One of those things, in my experience, is Cloudflare's interception of traffic and it's something I wrote about in detail nearly 7 years ago now in my piece on security absolutism. Cloudflare plays an enormously valuable role in the internet's ecosystem and a substantial part of the value comes from being able to inspect, cache, optimise, and yes, even reject traffic. When you use Cloudflare to protect your website, they're applying rulesets like the aforementioned OWASP ones and in order to do that, they must be able to inspect your traffic! But they don't log it, not all of it, rather just "metadata generated by our products" as they refer to it on their logs page. We saw an example of that earlier on with Stripe's request from their IP showing it triggered a firewall rule, but what we didn't see is the contents of that POST request, the actual payload that triggered the rule. Let's go grab that.

Because the contents of a POST request can contain sensitive information, Cloudflare doesn't log it. Obviously they see it in transit (that's how OWASP's rules can be applied to it), but it's not stored anywhere and even if you want to capture it, they don't want to be able to see it. That's where payload logging (another Enterprise plan feature) comes in and what's really neat about that is every payload must be encrypted with a public key retained by Cloudflare whilst only you retain the private key. The setup looks like this:

Pretty self-explanatory and once done, right under where we previously saw the additional logs we now have the ability to decrypt the payload:

As promised, this requires the private key from earlier:

And now, finally, we have the actual payload that triggered the rule, seen here with my own test data:

[ " },\n \"billing_reason\": \"subscription_update\",\n \"charge\": null,\n \"collection_method\": \"charge_automatically\",\n \"created\": 1674351619,\n \"currency\": \"usd\",\n \"custom_fields\": null,\n \"customer\": \"cus_MkA71FpZ7XXRlt\",\n \"customer_address\": ", " },\n \"customer_email\": \"troy-hunt+1@troyhunt.com\",\n \"customer_name\": \"Troy Hunt 1\",\n \"customer_phone\": null,\n \"customer_shipping\": null,\n \"customer_tax_exempt\": \"none\",\n \"customer_tax_ids\": [\n\n ],\n \"default_payment_method\": null,\n \"default_source\": null,\n \"default_tax_rates\": [\n\n ],\n \"description\": \"You can manage your subscription (i.e. cancel it or regenerate the API key) at any time by verifying your email address here: https://haveibeenpwned.com/API/Key\",\n \"discount\": null,\n \"discounts\": [\n\n ],\n \"due_date\": null,\n \"ending_balance\": -11804,\n \"footer\": null,\n \"from_invoice\": null,\n \"hosted_invoice_url\": \"https://invoice.stripe.com/i/acct_1EdQYpEF14jWlYDw/test_YWNjdF8xRWRRWXBFRjE0aldsWUR3LF9OREo5SlpqUFFvVnFtQnBVcE91YUFXemtkRHFpQWNWLDY0ODkyNDIw02004bEyljdC?s=ap\",\n \"invoice_pdf\": \"https://pay.stripe.com/invoice/acct_1EdQYpEF14jWlYDw/test_YWNjdF8xRWRRWXBFRjE0aldsWUR3LF9OREo5SlpqUFFvVnFtQnBVcE91YUFXemtkRHFpQWNWLDY0ODkyNDIw02004bEyljdC/pdf?s=ap\",\n \"last_finalization_error\": null,\n \"latest_revision\": null,\n \"lines\": ", " ", " ],\n \"discountable\": false,\n \"discounts\": [\n\n ],\n \"invoice_item\": \"ii_1MSsXfEF14jWlYDwB1nfZvFm\",\n \"livemode\": false,\n \"metadata\": ", " },\n \"period\": ", " },\n \"plan\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4eLcJ7JYF02f\",\n \"tiers_mode\": null,\n \"transform_usage\": null,\n \"trial_period_days\": null,\n \"usage_type\": \"licensed\"\n },\n \"price\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4eLcJ7JYF02f\",\n \"recurring\": ", " },\n \"tax_behavior\": \"unspecified\",\n \"tiers_mode\": null,\n \"transform_quantity\": null,\n \"type\": \"recurring\",\n \"unit_amount\": 15000,\n \"unit_amount_decimal\": \"15000\"\n },\n \"proration\": true,\n \"proration_details\": ", " \"il_1MMjfcEF14jWlYDwoe7uhDPF\"\n ]\n }\n },\n \"quantity\": 1,\n \"subscription\": \"sub_1MMjfcEF14jWlYDwi8JWFcxw\",\n \"subscription_item\": \"si_N6xapJ8gSXdp7W\",\n \"tax_amounts\": [\n\n ],\n \"tax_rates\": [\n\n ],\n \"type\": \"invoiceitem\",\n \"unit_amount_excluding_tax\": \"-14304\"\n },\n ", " ],\n \"discountable\": true,\n \"discounts\": [\n\n ],\n \"livemode\": false,\n \"metadata\": ", " },\n \"period\": ", " },\n \"plan\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4lTSl4axd9mt\",\n \"tiers_mode\": null,\n \"transform_usage\": null,\n \"trial_period_days\": null,\n \"usage_type\": \"licensed\"\n },\n \"price\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4lTSl4axd9mt\",\n \"recurring\": ", " },\n \"tax_behavior\": \"unspecified\",\n \"tiers_mode\": null,\n \"transform_quantity\": null,\n \"type\": \"recurring\",\n \"unit_amount\": 2500,\n \"unit_amount_decimal\": \"2500\"\n },\n \"proration\": false,\n \"proration_details\": ", " },\n \"quantity\": 1,\n \"subscription\": \"sub_1MMjfcEF14jWlYDwi8JWFcxw\",\n \"subscription_item\": \"si_NDJ98tQrCcviJf\",\n \"tax_amounts\": [\n\n ],\n \"tax_rates\": [\n\n ],\n \"type\": \"subscription\",\n \"unit_amount_excluding_tax\": \"2500\"\n }\n ],\n \"has_more\": false,\n \"total_count\": 2,\n \"url\": \"/v1/invoices/in_1MSsXfEF14jWlYDwxHKk4ASA/lines\"\n },\n \"livemode\": false,\n \"metadata\": ", " },\n \"next_payment_attempt\": null,\n \"number\": \"04FC1917-0008\",\n \"on_behalf_of\": null,\n \"paid\": true,\n \"paid_out_of_band\": false,\n \"payment_intent\": null,\n \"payment_settings\": ", " },\n \"period_end\": 1674351619,\n \"period_start\": 1674351619,\n \"post_payment_credit_notes_amount\": 0,\n \"pre_payment_credit_notes_amount\": 0,\n \"quote\": null,\n \"receipt_number\": null,\n \"rendering_options\": null,\n \"starting_balance\": 0,\n \"statement_descriptor\": null,\n \"status\": \"paid\",\n \"status_transitions\": ", " },\n \"subscription\": \"sub_1MMjfcEF14jWlYDwi8JWFcxw\",\n \"subtotal\": -11804,\n \"subtotal_excluding_tax\": -11804,\n \"tax\": null,\n \"test_clock\": null,\n \"total\": -11804,\n \"total_discount_amounts\": [\n\n ],\n \"total_excluding_tax\": -11804,\n \"total_tax_amounts\": [\n\n ],\n \"transfer_data\": null,\n \"webhooks_delivered_at\": 1674351619\n }\n },\n \"livemode\": false,\n \"pending_webhooks\": 1,\n \"request\": ", " },\n \"type\": \"invoice.paid\"\n}" ]

But enough of what's present in the payload, it's what's absent that especially struck me. No obvious XSS patterns, nor SQL injection or any other suspicious looking strings. The request looked totally benign, so why did it trigger the rule?

I wanted to compare the payload of a blocked request with a similar request that wasn't blocked, but they're only logged at Cloudflare when they trigger a rule. No problem, it's easy to grab the full request from Stripe's webhook history so I found one that passed and one that failed and diff'd them both:

This clearly isn't the full 200 lines, but it's a very similar story over the remainder of the files; tiny differences largely down to dates, IDs, and of course, the customers themselves. No suspicious patterns, no funky characters, nothing visibly abnormal. It's a bit pointless to even mention it because they're near identical, but the payload on the left is the one that passed the firewall whilst the payload on the right was blocked.

Next rabbit hole!

Cloudflare's Internal Rules Engine 🐰 🐰 🐰 🐰 🐰

Completely running out of ideas and options, focus moved to the folks inside Cloudflare who were already aware there was an issue:

We are actively looking into this and will likely release an update to the Cloudflare OWASP ruleset soon
— Michael Tremante (@MichaelTremante) January 20, 2023

What followed was a period of back and forth initially with Cloudflare, then Stripe as well with everyone trying to nut out exactly where things were going wrong. Essentially, the process went like this:

Is Cloudflare inadvertently blocking the requests?

Is the OWASP ruleset raising false positives?

Is Stripe issuing requests that are deemed to be malicious?

And round and round we went. At one time, Cloudflare identified a change in the OWASP ruleset which appeared to have resulted in their implementation inadvertently triggering the WAF. They rolled it back and... the same thing happened. We deferred back to Stripe on the assumption that something must have changed on their end, but they couldn't identify any change that would have any sort of material impact. We were stumped, but we also had an easy fix just one last rabbit hole away...

Fine Tuning the Cloudflare WAF 🐰 🐰 🐰 🐰 🐰 🐰

The joy of a managed firewall is that someone else takes all the rigmarole of looking after it away. I'm going to talk more about that in the summary shortly but clearly, that also creates risk as you're delegating control of traffic flow to someone else. Fortunately, Cloudflare gives you a load of configurability with their managed rules which makes it easy to add custom exceptions:

This meant I could create a simple exception that was much more intelligent than the previous "just let all outbound Stripe IPs in" by filtering down to the specific path those webhooks were flowing in to:

And finally, because sequence matters, I dragged that rule right up to the top of the pile so it would cause matching inbound requests to skip all the other rules:

And finally, there were no more rabbits 😊

Lessons Learned

I know what you're thinking - "what was the actual root cause?" - and to be honest, I still don't know. I don't know if it was Cloudflare or OWASP or Stripe or if it even impacted other customers of these services and to be honest, yes, that's a little frustrating. But I learned a bunch of stuff and for that alone, this was a worthwhile exercise I took three big lessons away from:

Firstly, understanding the plumbing of how all these bits work together is super important. I was lucky this wasn't a time critical issue and I had the luxury of learning without being under duress; how rules, payload inspection and exception management all work together is really valuable stuff to understand. And just like that, as if to underscore my first point, I found this right before hitting the publish button on the blog post:

I added a couple more OWASP rules to the exception in Cloudflare (things like a MySQL rule that was adding 5 points), and we were back in business.

Secondly, I look at the managed WAF Cloudflare provides more favourably than I did before simply because I have a better understanding of how comprehensive it is. I want to write code and run apps on the web, that's my focus, and I want someone else to provide that additional layer on top that continuously adapts to block new and emerging threats. I want to understand it (and I now do, at least certainly better than before), but I don't want managing it day in and day out to be my job.

And finally, IMHO, Stripe needs a better mechanism to report on webhook failures:

In live mode you are notified after 3 days of trying. You can also query the events (https://t.co/0mujOPssV0) to create a running list of statuses on web hooks that have been sent and alert on that via your own app.
— Blake Krone (@blakekrone) January 19, 2023

Waiting until stuff breaks really isn't ideal and whilst I'm sure you could plug into the (very extensive) API ecosystem Stripe has, this feels like an easy feature for them to build in. So, Stripe friends, when you read this that's a big "yes" vote from me for some form of anomalous webhook response alerting.

This experience was equal parts frustration and fun and whilst the former is probably obvious, the latter is simply due to having an opportunity to learn something new that's a pretty important part of the service I run. May my frustrated fun story here make your life easier in the future if you face the same problems 😊

Related tags
February 20^th 2023 at 07:47

FreshRSS

Inside the Massive Alleged AT&T Data Breach

How Spoutible’s Leaky API Spurted out a Deluge of Personal Data

Safe, Secure, Anonymous, and Other Misleading Claims

Join my Twitter Subscription for the Inside Word on Data Breaches

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

The Back Story

The Failed API Key Upgrade 🐰

Cloudflare's Firewall and Logs 🐰 🐰

The Cloudflare OWASP Core Ruleset 🐰 🐰 🐰

Cloudflare Payload Logging 🐰 🐰 🐰 🐰

Cloudflare's Internal Rules Engine 🐰 🐰 🐰 🐰 🐰

Fine Tuning the Cloudflare WAF 🐰 🐰 🐰 🐰 🐰 🐰

Lessons Learned