FreshRSS

πŸ”’
❌ About FreshRSS
There are new available articles, click to refresh the page.
Before yesterdayTroy Hunt

Join my Twitter Subscription for the Inside Word on Data Breaches

By Troy Hunt
Join my Twitter Subscription for the Inside Word on Data Breaches

I want to try something new here - bear with me here:

Data breach processing is hard and the hardest part of all is getting in touch with organisations and disclosing the incident before I load anything into Have I Been Pwned (HIBP). It's also something I do almost entirely in isolation, sitting here on my own trying to put the pieces together to work out what happened. I don't want to just chuck data into HIBP and the first an organisation knows about it is angry customers smashing out their inbox, there's got to be a reasonable attempt from my side to get in touch, disclose and then coordinate on communication to impacted parties and the public at large. Very frequently, I end up reaching out publicly and asking for a security contact at the impacted company. I dislike doing this because it's a very public broadcast that regular followers easily read between the lines of and draw precisely the correct conclusion before the organisation has had a chance to respond. And the vast majority of the time, nobody has a contact anyway but a small handful of people trawl through the site and find obscure email addresses or look up employees on LinkedIn or similar. There has to be a better way.

Yesterday, I posted this tweet:

After I shared this, multiple people said "ah, but at least we have GDPR", as though that somehow fixes the problem. No, it doesn't, at least not in any absolute sense. Case in point: I'm now going through the disclosure process after someone sent me data from a company HQ'd well… https://t.co/yMYIlFXkCU

β€” Troy Hunt (@troyhunt) April 18, 2023

And around the same time I got to thinking about Twitter Subscriptions as a channel for communication with a much more carefully curated subset of the 214k people that follow my public feed. Tweets within a subscription are visible only to subscribers so the public broadcast problem goes away. (Of course, you'd always work on the assumption that a subscriber could take a tweet and share it more broadly, but the intention is to make content visible to a much smaller, more dedicated audience.) Issues around where to find contact details, verification of the breach, what's in it or all sorts of other discussions I'd rather not have with the masses prior to loading into HIBP can be had with a much more curated audience.

I don't know how well this will work and it's something I've come up with on a whim (hey, I'm nothing if not honest about it!) But that's also how HIBP started and sometimes the best ideas just emerge out of gut feel. So, I set up the subscription and of the 3 pricing options Twitter suggested ($3, $5 or $10 per month), I went middle of the road and made it 5 bucks (that's American bucks, YMMV). You can sign up directly from the big "Subscribe" button on my Twitter profile or follow the link behind this text. Just one suggestion from Twitter's "welcome on board" email if you do:

Encourage your followers to Subscribe on the web. Web Subscriptions go through Stripe, which takes a 3% fee from each purchase, compared to the 30% fee that Apple and Google currently take. Meaning web Subscriptions may potentially lead to more money in your pocket.

My hope is that this subscription helps me have much more candid discussions about data breaches with people that are invested in following them than the masses that see my other tweets. I also hope it helps me go through this process feeling a little less isolated from the world and with the support of some of the great people I regularly engage with more publicly. If that's you, then give it a go and if it isn't floating your boat, cancel the subscription. I think there's something in this and I'd appreciate all the support I can get to help make it a worthwhile exercise.

Seized Genesis Market Data is Now Searchable in Have I Been Pwned, Courtesy of the FBI and "Operation Cookie Monster"

By Troy Hunt
Seized Genesis Market Data is Now Searchable in Have I Been Pwned, Courtesy of the FBI and "Operation Cookie Monster"

A quick summary first before the details: This week, the FBI in cooperation with international law enforcement partners took down a notorious marketplace trading in stolen identity data in an effort they've named "Operation Cookie Monster". They've provided millions of impacted email addresses and passwords to Have I Been Pwned (HIBP) so that victims of the incident can discover if they have been exposed. This breach has been flagged as "sensitive" which means it is not publicly searchable, rather you must demonstrate you control the email address being searched before the results are shown. This can be done via the free notification service on HIBP and involves you entering the email address then clicking on the link sent to your inbox. Specific guidance prepared by the FBI in conjunction with the Dutch police on further steps you can take to protect yourself are detailed at the end of this blog post on the gold background. That's the short version, here's the whole story:


Ever heard that saying about how "data is the new oil"? Or that "data is the currency of the digital economy"? You've probably seen stories and infographics about how much your personal information is worth, both to legitimate organisations and criminal networks. Like any valuable commodity, marketplaces selling data inevitably emerge, some operating as legal businesses and others, well, not so much. In its simplest form, the illegal data marketplace has long involved the exchange of currency for personal records containing attributes such as email addresses, passwords, names, etc. Cybercriminals then use this data for purposes ranging from identity theft to phishing attacks to credential stuffing. So, we (the good guys) adapt and build better defences. We block known breached passwords. We implement two factor authentication. We roll out user behavioural analytics that identifies abnormalities in logins (why is Joe suddenly logging in from the other side of the world with a new machine?) And in turn, the criminals adapt, which brings us to Genesis Market.

Until this week, Genesis had been up and running for 4 years. This is an excellent primer from Catalin Cimpanu, and it describes how in order to circumvent the aforementioned fraud protection measures, cybercriminals are increasingly relying on obtaining more abstract pieces of information from victims in order to gain access to their accounts. Rather than relying on the credentials themselves and then being subject to all the modern fraud detection services mentioned above, criminals instead began to trade in a combination of "fingerprints" and "cookies". The latter will be a familiar term to most people (and was obviously the inspiration for the name behind the FBI's operation), whilst the former refers to observable attributes of the user and their browser. To see a very easy demonstration of what fingerprinting involves, go and check out amiunique.org and hit the "View my browser fingerprint" button. You'll get something similar to this:

Seized Genesis Market Data is Now Searchable in Have I Been Pwned, Courtesy of the FBI and "Operation Cookie Monster"

Among more than 1.6M sampled clients, nobody has the same fingerprint as me. Somehow, using the current version of Chrome on the current version of Windows, I am a unique snowflake. Why I'm so unique is partly explained by my time zone which is shared by less than half a percent of people, but it's when that's combined with the other observable fingerprint attributes that you realise just how special I really am. For example, less than 0.01% of people have a content language request header of "en-US,en,en-AU". Only 0.12% of people share a screen width of 5,120 pixel (I'm using an ultrawide monitor). And so on and so forth. Because they're so unique, fingerprints are increasingly used as a fraud detection method such that if a malicious party attempts to impersonate a legitimate users with otherwise correct attributes (for example, the correct cookies) but the wrong fingerprint, they're rejected. Which is why we now have IMPaaS.

There's an excellent IMPaaS explanation from the Eindhoven University of Technology in the Netherlands via a paper titled Impersonation-as-a-Service: Characterising the Emerging Criminal Infrastructure for User Impersonation at Scale. Released only a year and a half after the emergence of Genesis, the paper explains the mechanics of IMPaaS:

IMPaaS allows attackers to systematically collect and enforce user profiles (consisting of user credentials, cookies, device and behavioural fingerprints, and other metadata) to circumvent risk-based authentication system and effectively bypass multifactor authentication mechanisms

In other words, if you have all the bits of information a website requires to persist authenticated state after the login process has successfully completed (including after any 2FA requirements), you can perform a modern equivalent of session hijacking. Obtaining this level of information is typically done via malicious software running on the victim's machine which can then grab anything useful and send it off to a C2 server where it can then be sold and used to commit fraud (from the IMPaaS paper):

Seized Genesis Market Data is Now Searchable in Have I Been Pwned, Courtesy of the FBI and "Operation Cookie Monster"

Catalin's story from the early days of Genesis showed how buyers could browse through a list of compromised victims and pick their target based on the various services they had authenticated too, along with their operating system and location. Pricing was inevitably based on the value of those services with the examples below going for $41.30 each (and just like a legitimate marketplace, these were marked down prices so a real bargain!)

Seized Genesis Market Data is Now Searchable in Have I Been Pwned, Courtesy of the FBI and "Operation Cookie Monster"

To make things as turn-key as possible for the criminals, buyers would then run a browser extension from Genesis that would reconstruct the required fingerprint based on the information the malware had obtained and grant them access to the victims' accounts (I'm having flashbacks of Firesheep here). It was that simple... until this week. As of now, the following banner greets anyone browsing to the Genesis website:

Seized Genesis Market Data is Now Searchable in Have I Been Pwned, Courtesy of the FBI and "Operation Cookie Monster"

The aptly named "Operation Cookie Monster" is a joint effort between the FBI and a coalition of law enforcement agencies across the globe who have now put an abrupt end to Genesis. I imagine they'll be having some "discussions" with those involved in running the service, but what about the individuals who are the victims? These are the people whose identities have been put up for sale, purchased by other criminals and then abused to their detriment. The FBI approached me and asked if HIBP could be used as a mechanism to help warn victims of their exposure in the same way as we'd previously done with the Emotet malware a couple of years ago. This is well aligned with the mantra of HIBP - to do good and constructive things with data breaches after they occur - and I was happy to provide support.

There are 2 separate things that have now been loaded into HIBP, each disassociated from the other:

  1. Millions of compromised passwords that are now searchable via Pwned Passwords
  2. Millions of email addresses that are now searchable after verifying control of the address using the notification service

The Pwned Passwords API is presently hit more than 4 billion times each month, and the downloadable data set is hit, well, I don't know because anyone can grab it run it offline. The point is that password corpuses loaded into HIBP have huge reach and are used by thousands of different online services to help people make better password choices. You're probably using it without even knowing it when you signup or login to various services but if you want to check it directly, you can browse to the web interface. (If you're worried about the privacy of your password, there's a full explainer on how the service preserves anonymity but I also suggest testing it after you've changed it as a generally good practice.)

The email address search is what HIBP is so well known for and that's obviously what will help you understand if you've been impacted. Per the opening paragraph, this breach is flagged as "sensitive" so you will not get a result when searching directly from the front page or via the API, rather you'll need to use the free notification service. This approach was chosen to avoid the risk of people being further targeted as a result of their inclusion in Genesis. All existing HIBP subscribers have been sent notification emails and between individuals and those monitoring domains, tens of thousands of emails have now been sent out. Whilst the volume of accounts represented is "8M", please note that this is merely an approximation (hence the perfectly round number on HIBP), intended to be an indicative representation of scale as many of the breached accounts didn't include email addresses. This number only represents the number of unique email addresses which showed up in the data set so consider it a subset of a much larger corpus.

Let me add some final context and this is important if you do find yourself in the Genesis data: due to the nature of how the malware collected personal information and the broad range of different services victims may have been using at the time, the exposed data can differ significantly person by person. What's been provided by the FBI is one set of passwords (incidentally, as SHA-1 and NTLM hash pairs fed into the law enforcement ingestion pipeline), one set of email addresses and a list of meta data. Beyond the data already listed here, the meta data includes names, physical addresses, phone numbers and full credit card details among other personal attributes. This does not mean that all impacted individuals had each of those data classes exposed. The hope is that by listing these fields it will help victims understand, for example, why they may have observed fraudulent transactions on their card, and they can then take informed and appropriate steps to better protect themselves.

Lastly, as flagged in the intro, following is the guidance prepared by the FBI and Dutch police on how people can safeguard themselves if they get a hit in the Genesis data or frankly, just want to better protect themselves in future:

The FBI reached out to Have I Been Pwned (HIBP) to continue sharing efforts to help victims determine if they've been victimized. In this instance, the data shared emanates from the Initial Access Broker Marketplace Genesis Market. The FBI has taken action against Genesis Market, and in the process has been able to extract victim information for the purposes of alerting victims.

In all, millions of passwords and email addresses were provided which span a wide range of countries and domains. These emails and passwords were sold on Genesis Market and were used by Genesis Market users to access the various accounts and platforms that were for sale.

Prepared in conjunction with the FBI, following is the recommended guidance for those that find themselves in this collection of data:

To safeguard yourself against fraud in the future, it is important that you immediately remove the malware from your computer and then change all your passwords. Do this as follows:

  1. Log out of all open sessions in all web browsers on your computer.
  2. Remove all cookies and temporary internet files.
  3. Then choose one of the following two options:
    1. Update the virus scanner on your computer.
      1. Then carry out a virus scan on your computer.
      2. The malware will be removed.
      3. Then (and only then) change all your passwords. Don’t do this any earlier, as otherwise the cybercriminals will see the new passwords.

        OR

    2. Reset the infected computer to the factory default settings:
      1. Then (and only then) change all your passwords. Don’t do this any earlier, as otherwise the cybercriminals will see the new passwords.

How can I prevent my data being stolen (again)?

  1. Use a virus scanner and keep it up to date.
  2. Use strong passwords that are unique for each account/website.
  3. Use multifactor authentication. If you use a fingerprint, facial recognition, or approval on another device (such as a phone) to confirm your identity on login, it is harder for someone to access your accounts.
  4. Never download or install illegal software. This is a very common source of malware infection.
  5. When installing legal software, always check that the website is genuine.

Just one more thing to end on a lighter note: a quick shoutout to whoever at the bureau slipped a half-eaten cookie into the takedown image, having been munched on by what I can only assume is a very satisfied FBI agent after a successful "Operation Cookie Monster" 😊

Seized Genesis Market Data is Now Searchable in Have I Been Pwned, Courtesy of the FBI and "Operation Cookie Monster"

To Infinity and Beyond, with Cloudflare Cache Reserve

By Troy Hunt
To Infinity and Beyond, with Cloudflare Cache Reserve

What if I told you... that you could run a website from behind Cloudflare and only have 385 daily requests miss their cache and go through to the origin service?

To Infinity and Beyond, with Cloudflare Cache Reserve

No biggy, unless... that was out of a total of more than 166M requests in the same period:

To Infinity and Beyond, with Cloudflare Cache Reserve

Yep, we just hit "five nines" of cache hit ratio on Pwned Passwords being 99.999%. Actually, it was 99.9998% but we're at the point now where that's just splitting hairs, let's talk about how we've managed to only have two requests in a million hit the origin, beginning with a bit of history:

Optimising Caching on Pwned Passwords (with Workers)- @troyhunt - https://t.co/KjBtCwmhmT pic.twitter.com/BSfJbWyxMy

β€” Cloudflare (@Cloudflare) August 9, 2018

Ah, memories 😊 Back then, Pwned Passwords was serving way fewer requests in a month than what we do in a day now and the cache hit ratio was somewhere around 92%. Put another way, instead of 2 in every million requests hitting the origin it was 85k. And we were happy with that! As the years progressed, the traffic grew and the caching model was optimised so our stats improved:

There it is - Pwned Passwords is now doing north of 2 *billion* requests a month, peaking at 91.59M in a day with a cache-hit ratio of 99.52%. All free, open source and out there for the community to do good with 😊 pic.twitter.com/DSJOjb2CxZ

β€” Troy Hunt (@troyhunt) May 24, 2022

And that's pretty much where we levelled out, at about the 99-and-a-bit percent mark. We were really happy with that as it was now only 5k requests per million hitting the origin. There was bound to be a number somewhere around that mark due to the transient nature of cache and eviction criteria inevitably meaning a Cloudflare edge node somewhere would need to reach back to the origin website and pull a new copy of the data. But what if Cloudflare never had to do that unless explicitly instructed to do so? I mean, what if it just stayed in their cache unless we actually changed the source file and told them to update their version? Welcome to Cloudflare Cache Reserve:

To Infinity and Beyond, with Cloudflare Cache Reserve

Ok, so I may have annotated the important bit but that's what it feels like - magic - because you just turn it on and... that's it. You still serve your content the same way, you still need the appropriate cache headers and you still have the same tiered caching as before, but now there's a "cache reserve" sitting between that and your origin. It's backed by R2 which is their persistent data store and you can keep your cached things there for as long as you want. However, per the earlier link, it's not free:

To Infinity and Beyond, with Cloudflare Cache Reserve

You pay based on how much you store for how long, how much you write and how much you read. Let's put that in real terms and just as a brief refresher (longer version here), remember that Pwned Passwords is essentially just 16^5 (just over 1 million) text files of about 30kb each for the SHA-1 hashes and a similar number for the NTLM ones (albeit slight smaller file sizes). Here are the Cache Reserve usage stats for the last 9 days:

To Infinity and Beyond, with Cloudflare Cache Reserve

We can now do some pretty simple maths with that and working on the assumption of 9 days, here's what we get:

To Infinity and Beyond, with Cloudflare Cache Reserve

2 bucks a day 😲 But this has taken nearly 16M requests off my origin service over this period of time so I haven't paid for the Azure Function execution (which is cheap) nor the egress bandwidth (which is not cheap). But why are there only 16M read operations over 9 days when earlier we saw 167M requests to the API in a single day? Because if you scroll back up to the "insert magic here" diagram, Cache Reserve is only a fallback position and most requests (i.e. 99.52% of them) are still served from the edge caches.

Note also that there are nearly 1M write operations and there are 2 reasons for this:

  1. Cache Reserve is being seeded with source data as requests come in and miss the edge cache. This means that our cache hit ratio is going to get much, much better yet as not even half all the potentially cacheable API queries are in Cache Reserve. It also means that the 48c per day cost is going to come way down πŸ™‚
  2. Every time the FBI feeds new passwords into the service, the impacted file is purged from cache. This means that there will always be write operations and, of course, read operations as the data flows to the edge cache and makes corresponding hits to the origin service. The prevalence of all this depends on how much data the feds feed in, but it'll never get to zero whilst they're seeding new passwords.

An untold number of businesses rely on Pwned Passwords as an integral part of their registration, login and password reset flows. Seriously, the number is "untold" because we have no idea who's actually using it, we just know the service got hit three and a quarter billion times in the last 30 days:

To Infinity and Beyond, with Cloudflare Cache Reserve

Giving consumers of the service confidence that not only is it highly resilient, but also massively fast is essential to adoption. In turn, more adoption helps drive better password practices, less account takeovers and more smiles all round 😊

As those remaining hash prefixes populate Cache Reserve, keep an eye on the "cf-cache-status" response header. If you ever see a value of "MISS" then congratulations, you're literally one in a million!

Full disclosure: Cloudflare provides services to HIBP for free and they helped in getting Cache Reserve up and running. However, they had no idea I was writing this blog post and reading it live in its entirety is the first anyone there has seen it. Surprise! πŸ‘‹

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

By Troy Hunt
Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

I found myself going down a previously unexplored rabbit hole recently, or more specifically, what I thought was "a" rabbit hole but in actual fact was an ever-expanding series of them that led me to what I refer to in the title of this post as "6 rabbits deep". It's a tale of firewalls, APIs and sifting through layers and layers of different services to sniff out the root cause of something that seemed very benign, but actually turned out to be highly impactful. Let's go find the rabbits!

The Back Story

When you buy an API key on Have I Been Pwned (HIBP), Stripe handles all the payment magic. I love Stripe, it's such an awesome service that abstracts away so much pain and it's dead simple to integrate via their various APIs. It's also dead simple to configure Stripe to send notices back to your own service via webhooks. For example, when an invoice is paid or a customer is updated, Stripe sends information about that event to HIBP and then lists each call on the webhooks dashboard in their portal:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

There are a whole range of different events that can be listened to and webhooks fired, here we're seeing just a couple of them that are self explanatory in name. When an invoice is paid, the callback looks something like this:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

HIBP has received this call and updated it's own DB such that for a new customer, they can now retrieve an API key or for an existing customer whose subscription has renewed, the API key validity period has been extended. The same callback is also issued when someone upgrades an API key, for example when going from 10RPM (requests per minute) to 50RPM. It's super important that HIBP gets that callback so it can appropriately upgrade the customer's key and they can immediately begin making more requests. When that call doesn't happen, well, let's go down the first rabbit hole.

The Failed API Key Upgrade 🐰

This should never happen:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

This came in via HIBP's API key support portal and is pretty self-explanatory. I checked the customer's account on Stripe and it did indeed show an active 50RPM subscription, but when drilling down into the associated payment, I found the following:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Ok, so at least I know where things have started to go wrong, but why? Over to the webhooks dashboard and into the failed payments and things look... suboptimal:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Dammit! Fortunately this is only a small single-digit percentage of all callbacks, but every time this fails it's either stopping someone like the guy above from making the requests they've paid for or potentially, causing someone's API key to expire even though they've paid for it. The latter in particular I was really worried about as it would nuke their key and whatever they'd built on top of it would cease to function. Fortunately, because that's such an impactful action I'd built in heaps of buffer for just such an occurrence and I'd gotten onto this issue quickly, but it was disconcerting all the same.

So, what's happening? Well, the response is HTTP 403 "Forbidden" and the body is clearly a Cloudflare challenge page so something at their end is being triggered. Looks like it's time to go down the next rabbit hole.

Cloudflare's Firewall and Logs 🐰 🐰

Desperate just to quickly restore functionality, I dropped into Cloudflare's WAF and allowed all Stripe's outbound IPs used for webhooks to bypass their security controls:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

This wasn't ideal, but it only created risk for requests originating from Stripe and it got things up and running again quickly. With time up my sleeve I could now delve deeper and work out precisely what was going on, starting with the logs. Cloudflare has a really extensive set of APIs that can control a heap of features of the service, including pulling back logs (note: this is a feature of their Enterprise plan). I queried out a slice of the logs corresponding to when some of the 403s from Stripe's dashboard occurred and found 2 entries similar to this one:

{"BotScore":1,"BotScoreSrc":"Verified Bot","CacheCacheStatus":"unknown","ClientASN":16509,"ClientCountry":"us","ClientIP":"54.187.205.235","ClientRequestHost":"haveibeenpwned.com","ClientRequestMethod":"POST","ClientRequestReferer":"","ClientRequestURI":"[redacted]","ClientRequestUserAgent":"Stripe/1.0 (+https://stripe.com/docs/webhooks)","EdgeRateLimitAction":"","EdgeResponseStatus":403,"EdgeStartTimestamp":1674073983931000000,"FirewallMatchesActions":["managedChallenge"],"FirewallMatchesRuleIDs":["6179ae15870a4bb7b2d480d4843b323c"],"FirewallMatchesSources":["firewallManaged"],"OriginResponseStatus":0,"WAFAction":"unknown","WorkerSubrequest":false}

That's one of Stripe's outbound IP's on 54.187.205.235 and the "FirewallMatchesRuleIDs" collection has a value in it. Ergo, something about this request triggered the firewall and caused it to be challenged. I'm sure many of us have gone through the following thought process before:

What did I change?

Did I change anything?

Did they change something?

Except "they" could have been either Cloudflare or Stripe; if it wasn't me (and I was fairly certain it wasn't), was it a Cloudflare change to the rules or a Stripe change to a webhook payload that was now triggering an existing rule? Time to dig deeper again so it's over to the Cloudflare dashboard and down into the WAF events for requests to the webhook callback path:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Yep, something proper broke! Let's drill deeper and look at recent events for that IP:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

As you dig deeper through troubleshooting exercises like this, you gradually turn up more and more information that helps piece the entire puzzle together. In this case, it looks like the "Inbound Anomaly Score Exceeded" rule was being triggered. What's that? And why? Time to go down another rabbit hole.

The Cloudflare OWASP Core Ruleset 🐰 🐰 🐰

So, deeper and deeper down the rabbit holes we go, this time into the depths of the requests that triggered the managed rule:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Well that's comprehensive πŸ™‚

There's a lot to unpack here so let's begin with the ruleset that the previously identified "Inbound Anomaly Score Exceeded" rule belongs to, the Cloudflare OWASP Core Ruleset:

The Cloudflare OWASP Core Ruleset is Cloudflare’s implementation of the OWASP ModSecurity Core Rule SetOpen external link (CRS). Cloudflare routinely monitors for updates from OWASP based on the latest version available from the official code repository.

That link is yet another rabbit hole altogether so let me summarise succinctly here: Cloudflare uses OWASP's rules to identify anomalous traffic based on a customer-defined paranoia level (how strict you want to be) and then applies a score threshold (also customer-defined) at which an action will be taken, for example challenging the request. What I learned as this saga progressed is that the "Inbound Anomoly Score Exceeded" rule is actually a rollup of the rules beneath it. The OWASP score of "26" is the sum of the 6 rules listed beneath it and once it exceeds 25, the superset rule is triggered.

Further - and this is the really important bit - Cloudflare routinely updates the rules from OWASP which makes sense because these are ever-evolving in response to new threats. And when did they last upgrade the rules? It looks like they announced it right before I started having issues:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Whilst it's not entirely clear from above when this release was scheduled to occur, I did reach out to Cloudflare support and was advised it had already taken place:

Please note that we did bump the OWASP version, which we are integrating with to 3.3.4 as noted on our scheduled changes.

So maybe it's not Cloudflare's fault or Stripe's fault, but OWASP's fault? In fairness to all, I don't think it's anyone's fault per se and is instead just an unfortunate result of everyone doing their best to keep the bad guys out. Unless... it really is Stripe's fault because there's something in the request payload that was always fishy and is now being caught? But why for only some requests and not others? Next rabbit!

Cloudflare Payload Logging 🐰 🐰 🐰 🐰

Sometimes, people on the internet lose their minds a bit over things they really shouldn't. One of those things, in my experience, is Cloudflare's interception of traffic and it's something I wrote about in detail nearly 7 years ago now in my piece on security absolutism. Cloudflare plays an enormously valuable role in the internet's ecosystem and a substantial part of the value comes from being able to inspect, cache, optimise, and yes, even reject traffic. When you use Cloudflare to protect your website, they're applying rulesets like the aforementioned OWASP ones and in order to do that, they must be able to inspect your traffic! But they don't log it, not all of it, rather just "metadata generated by our products" as they refer to it on their logs page. We saw an example of that earlier on with Stripe's request from their IP showing it triggered a firewall rule, but what we didn't see is the contents of that POST request, the actual payload that triggered the rule. Let's go grab that.

Because the contents of a POST request can contain sensitive information, Cloudflare doesn't log it. Obviously they see it in transit (that's how OWASP's rules can be applied to it), but it's not stored anywhere and even if you want to capture it, they don't want to be able to see it. That's where payload logging (another Enterprise plan feature) comes in and what's really neat about that is every payload must be encrypted with a public key retained by Cloudflare whilst only you retain the private key. The setup looks like this:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

Pretty self-explanatory and once done, right under where we previously saw the additional logs we now have the ability to decrypt the payload:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

As promised, this requires the private key from earlier:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

And now, finally, we have the actual payload that triggered the rule, seen here with my own test data:

[ " },\n \"billing_reason\": \"subscription_update\",\n \"charge\": null,\n \"collection_method\": \"charge_automatically\",\n \"created\": 1674351619,\n \"currency\": \"usd\",\n \"custom_fields\": null,\n \"customer\": \"cus_MkA71FpZ7XXRlt\",\n \"customer_address\": ", " },\n \"customer_email\": \"troy-hunt+1@troyhunt.com\",\n \"customer_name\": \"Troy Hunt 1\",\n \"customer_phone\": null,\n \"customer_shipping\": null,\n \"customer_tax_exempt\": \"none\",\n \"customer_tax_ids\": [\n\n ],\n \"default_payment_method\": null,\n \"default_source\": null,\n \"default_tax_rates\": [\n\n ],\n \"description\": \"You can manage your subscription (i.e. cancel it or regenerate the API key) at any time by verifying your email address here: https://haveibeenpwned.com/API/Key\",\n \"discount\": null,\n \"discounts\": [\n\n ],\n \"due_date\": null,\n \"ending_balance\": -11804,\n \"footer\": null,\n \"from_invoice\": null,\n \"hosted_invoice_url\": \"https://invoice.stripe.com/i/acct_1EdQYpEF14jWlYDw/test_YWNjdF8xRWRRWXBFRjE0aldsWUR3LF9OREo5SlpqUFFvVnFtQnBVcE91YUFXemtkRHFpQWNWLDY0ODkyNDIw02004bEyljdC?s=ap\",\n \"invoice_pdf\": \"https://pay.stripe.com/invoice/acct_1EdQYpEF14jWlYDw/test_YWNjdF8xRWRRWXBFRjE0aldsWUR3LF9OREo5SlpqUFFvVnFtQnBVcE91YUFXemtkRHFpQWNWLDY0ODkyNDIw02004bEyljdC/pdf?s=ap\",\n \"last_finalization_error\": null,\n \"latest_revision\": null,\n \"lines\": ", " ", " ],\n \"discountable\": false,\n \"discounts\": [\n\n ],\n \"invoice_item\": \"ii_1MSsXfEF14jWlYDwB1nfZvFm\",\n \"livemode\": false,\n \"metadata\": ", " },\n \"period\": ", " },\n \"plan\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4eLcJ7JYF02f\",\n \"tiers_mode\": null,\n \"transform_usage\": null,\n \"trial_period_days\": null,\n \"usage_type\": \"licensed\"\n },\n \"price\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4eLcJ7JYF02f\",\n \"recurring\": ", " },\n \"tax_behavior\": \"unspecified\",\n \"tiers_mode\": null,\n \"transform_quantity\": null,\n \"type\": \"recurring\",\n \"unit_amount\": 15000,\n \"unit_amount_decimal\": \"15000\"\n },\n \"proration\": true,\n \"proration_details\": ", " \"il_1MMjfcEF14jWlYDwoe7uhDPF\"\n ]\n }\n },\n \"quantity\": 1,\n \"subscription\": \"sub_1MMjfcEF14jWlYDwi8JWFcxw\",\n \"subscription_item\": \"si_N6xapJ8gSXdp7W\",\n \"tax_amounts\": [\n\n ],\n \"tax_rates\": [\n\n ],\n \"type\": \"invoiceitem\",\n \"unit_amount_excluding_tax\": \"-14304\"\n },\n ", " ],\n \"discountable\": true,\n \"discounts\": [\n\n ],\n \"livemode\": false,\n \"metadata\": ", " },\n \"period\": ", " },\n \"plan\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4lTSl4axd9mt\",\n \"tiers_mode\": null,\n \"transform_usage\": null,\n \"trial_period_days\": null,\n \"usage_type\": \"licensed\"\n },\n \"price\": ", " },\n \"nickname\": null,\n \"product\": \"prod_Mk4lTSl4axd9mt\",\n \"recurring\": ", " },\n \"tax_behavior\": \"unspecified\",\n \"tiers_mode\": null,\n \"transform_quantity\": null,\n \"type\": \"recurring\",\n \"unit_amount\": 2500,\n \"unit_amount_decimal\": \"2500\"\n },\n \"proration\": false,\n \"proration_details\": ", " },\n \"quantity\": 1,\n \"subscription\": \"sub_1MMjfcEF14jWlYDwi8JWFcxw\",\n \"subscription_item\": \"si_NDJ98tQrCcviJf\",\n \"tax_amounts\": [\n\n ],\n \"tax_rates\": [\n\n ],\n \"type\": \"subscription\",\n \"unit_amount_excluding_tax\": \"2500\"\n }\n ],\n \"has_more\": false,\n \"total_count\": 2,\n \"url\": \"/v1/invoices/in_1MSsXfEF14jWlYDwxHKk4ASA/lines\"\n },\n \"livemode\": false,\n \"metadata\": ", " },\n \"next_payment_attempt\": null,\n \"number\": \"04FC1917-0008\",\n \"on_behalf_of\": null,\n \"paid\": true,\n \"paid_out_of_band\": false,\n \"payment_intent\": null,\n \"payment_settings\": ", " },\n \"period_end\": 1674351619,\n \"period_start\": 1674351619,\n \"post_payment_credit_notes_amount\": 0,\n \"pre_payment_credit_notes_amount\": 0,\n \"quote\": null,\n \"receipt_number\": null,\n \"rendering_options\": null,\n \"starting_balance\": 0,\n \"statement_descriptor\": null,\n \"status\": \"paid\",\n \"status_transitions\": ", " },\n \"subscription\": \"sub_1MMjfcEF14jWlYDwi8JWFcxw\",\n \"subtotal\": -11804,\n \"subtotal_excluding_tax\": -11804,\n \"tax\": null,\n \"test_clock\": null,\n \"total\": -11804,\n \"total_discount_amounts\": [\n\n ],\n \"total_excluding_tax\": -11804,\n \"total_tax_amounts\": [\n\n ],\n \"transfer_data\": null,\n \"webhooks_delivered_at\": 1674351619\n }\n },\n \"livemode\": false,\n \"pending_webhooks\": 1,\n \"request\": ", " },\n \"type\": \"invoice.paid\"\n}" ]

But enough of what's present in the payload, it's what's absent that especially struck me. No obvious XSS patterns, nor SQL injection or any other suspicious looking strings. The request looked totally benign, so why did it trigger the rule?

I wanted to compare the payload of a blocked request with a similar request that wasn't blocked, but they're only logged at Cloudflare when they trigger a rule. No problem, it's easy to grab the full request from Stripe's webhook history so I found one that passed and one that failed and diff'd them both:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

This clearly isn't the full 200 lines, but it's a very similar story over the remainder of the files; tiny differences largely down to dates, IDs, and of course, the customers themselves. No suspicious patterns, no funky characters, nothing visibly abnormal. It's a bit pointless to even mention it because they're near identical, but the payload on the left is the one that passed the firewall whilst the payload on the right was blocked.

Next rabbit hole!

Cloudflare's Internal Rules Engine 🐰 🐰 🐰 🐰 🐰

Completely running out of ideas and options, focus moved to the folks inside Cloudflare who were already aware there was an issue:

We are actively looking into this and will likely release an update to the Cloudflare OWASP ruleset soon

β€” Michael Tremante (@MichaelTremante) January 20, 2023

What followed was a period of back and forth initially with Cloudflare, then Stripe as well with everyone trying to nut out exactly where things were going wrong. Essentially, the process went like this:

Is Cloudflare inadvertently blocking the requests?

Is the OWASP ruleset raising false positives?

Is Stripe issuing requests that are deemed to be malicious?

And round and round we went. At one time, Cloudflare identified a change in the OWASP ruleset which appeared to have resulted in their implementation inadvertently triggering the WAF. They rolled it back and... the same thing happened. We deferred back to Stripe on the assumption that something must have changed on their end, but they couldn't identify any change that would have any sort of material impact. We were stumped, but we also had an easy fix just one last rabbit hole away...

Fine Tuning the Cloudflare WAF 🐰 🐰 🐰 🐰 🐰 🐰

The joy of a managed firewall is that someone else takes all the rigmarole of looking after it away. I'm going to talk more about that in the summary shortly but clearly, that also creates risk as you're delegating control of traffic flow to someone else. Fortunately, Cloudflare gives you a load of configurability with their managed rules which makes it easy to add custom exceptions:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

This meant I could create a simple exception that was much more intelligent than the previous "just let all outbound Stripe IPs in" by filtering down to the specific path those webhooks were flowing in to:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

And finally, because sequence matters, I dragged that rule right up to the top of the pile so it would cause matching inbound requests to skip all the other rules:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

And finally, there were no more rabbits 😊

Lessons Learned

I know what you're thinking - "what was the actual root cause?" - and to be honest, I still don't know. I don't know if it was Cloudflare or OWASP or Stripe or if it even impacted other customers of these services and to be honest, yes, that's a little frustrating. But I learned a bunch of stuff and for that alone, this was a worthwhile exercise I took three big lessons away from:

Firstly, understanding the plumbing of how all these bits work together is super important. I was lucky this wasn't a time critical issue and I had the luxury of learning without being under duress; how rules, payload inspection and exception management all work together is really valuable stuff to understand. And just like that, as if to underscore my first point, I found this right before hitting the publish button on the blog post:

Down the Cloudflare / Stripe / OWASP Rabbit Hole: A Tale of 6 Rabbits Deep 🐰 🐰 🐰 🐰 🐰 🐰

I added a couple more OWASP rules to the exception in Cloudflare (things like a MySQL rule that was adding 5 points), and we were back in business.

Secondly, I look at the managed WAF Cloudflare provides more favourably than I did before simply because I have a better understanding of how comprehensive it is. I want to write code and run apps on the web, that's my focus, and I want someone else to provide that additional layer on top that continuously adapts to block new and emerging threats. I want to understand it (and I now do, at least certainly better than before), but I don't want managing it day in and day out to be my job.

And finally, IMHO, Stripe needs a better mechanism to report on webhook failures:

In live mode you are notified after 3 days of trying. You can also query the events (https://t.co/0mujOPssV0) to create a running list of statuses on web hooks that have been sent and alert on that via your own app.

β€” Blake Krone (@blakekrone) January 19, 2023

Waiting until stuff breaks really isn't ideal and whilst I'm sure you could plug into the (very extensive) API ecosystem Stripe has, this feels like an easy feature for them to build in. So, Stripe friends, when you read this that's a big "yes" vote from me for some form of anomalous webhook response alerting.

This experience was equal parts frustration and fun and whilst the former is probably obvious, the latter is simply due to having an opportunity to learn something new that's a pretty important part of the service I run. May my frustrated fun story here make your life easier in the future if you face the same problems 😊

Pwned Passwords Adds NTLM Support to the Firehose

By Troy Hunt
Pwned Passwords Adds NTLM Support to the Firehose

I think I've pretty much captured it all in the title of this post but as of about a day ago, Pwned Passwords now has full parity between the SHA-1 hashes that have been there since day 1 and NTLM hashes. We always had both as a downloadable corpus but as of just over a year ago with the introduction of the FBI data feed, we stopped maintaining downloadable behemoths of data.

A little later, we added the downloader to make it easy to pull down the latest and greatest complete data set directly from the same API that so many of you have integrated into your own apps. But because we only had an API for SHA-1 hashes, the downloader couldn't grab the NTLM versions and increasingly, we had 2 corpuses well out of parity.

I don't know exactly why, but just over the last few weeks we've had a marked uptick in requests for an updated NTLM corpus. Obviously there's still a demand to run this against local Active Directory environments and clearly, the more up to date the hashes are the more effective they are at blocking the use of poor passwords.

So, Chief Pwned Passwords Wrangler StefÑn Jâkull Sigurðarson got to work and just went ahead and built it all for you. For free. In his spare time. As a community contribution. Seriously, have a look through the public GitHub repos and it's all his work ranging from the API to the Cloudflare Worker to the downloader so if you happen to come across him say, at NDC Oslo in a few months' time, show your appreciation and buy the guy a beer 🍺

Lastly, every time I look at how much this tool is being used, I'm a bit shocked at how big the numbers are getting:

Pwned Passwords Adds NTLM Support to the Firehose

That's well more than double the number of monthly requests from when I wrote the blog post about the FBI and NCA only just over a year ago, and I imagine that will only continue to increase, especially with today's announcement about NTLM hashes. Thank you to everyone that has taken this data and done great things with it, we're grateful that it's been put to good use and has undoubtedly helped an untold number of people to make better password choices 😊

Pwned or Bot

By Troy Hunt
Pwned or Bot

It's fascinating to see how creative people can get with breached data. Of course there's all the nasty stuff (phishing, identity theft, spam), but there are also some amazingly positive uses for data illegally taken from someone else's system. When I first built Have I Been Pwned (HIBP), my mantra was to "do good things after bad things happen". And arguably, it has, largely by enabling individuals and organisations to learn of their own personal exposure in breaches. However, the use cases go well beyond that and there's one I've been meaning to write about for a while now after hearing about it firsthand. For now, let's just call this approach "Pwned or Bot", and I'll set the scene with some background on another problem: sniping.

Think about Miley Cyrus as Hannah Montana (bear with me, I'm actually going somewhere with this!) putting on shows people would buy tickets to. We're talking loads of tickets as back in the day, her popularity was off the charts with demand well in excess of supply. Which, for enterprising individuals of ill-repute, presented an opportunity:

Ticketmaster, the exclusive ticket seller for the tour, sold out numerous shows within minutes, leaving many Hannah Montana fans out in the cold. Yet, often, moments after the shows went on sale, the secondary market Β flourished with tickets to those shows. The tickets, whose face value ranged from $21 to $66, were resold on StubHub for an average of $258, plus StubHub’s 25% commission (10% paid by the buyer, 15% by the seller).

This is called "sniping", where an individual jumps the queue and snaps up products in limited demand for their own personal gain and consequently, to the detriment of others. Tickets to entertainment events is one example of sniping, the same thing happens when other products launch with insufficient supply to meet demand, for example Nike shoes. These can be massively popular and, par for the course of this blog, released in short demand. This creates a marketplace for snipers, some of whom share their tradecraft via videos such as this one:

"BOTTER BOY NOVA" refers to himself as a "Sneaker botter" in the video and demonstrates a tool called "Better Nike Bot" (BnB) which sells for $200 plus a renewal fee of $60 every 6 months. But don't worry, he has a discount code! Seems like hackers aren't the only ones making money out of the misfortune of others.

Have a look at the video and watch how at about the 4:20 mark he talks about using proxies "to prevent Nike from flagging your accounts". He recommends using the same number of proxies as you have accounts, inevitably to avoid Nike's (automated) suspicions picking up on the anomaly of a single IP address signing up multiple times. Proxies themselves are a commercial enterprise but don't worry, BOTTER BOY NOVA has a discount code for them too!

The video continues to demonstrate how to configure the tool to ultimately blast Nike's service with attempts to purchase shoes, but it's at the 8:40 mark that we get to the crux of where I'm going with this:

Pwned or Bot

Using the tool, he's created a whole bunch of accounts in an attempt to maximise his chances of a successful purchase. These are obviously just samples in the screen cap above, but inevitably he'd usually go and register a bunch of new email addresses he could use specifically for this purpose.

Now, think of it from Nike's perspective: they've launched a new shoe and are seeing a whole heap of new registrations and purchase attempts. In amongst that lot are many genuine people... and this guy πŸ‘† How can they weed him out such that snipers aren't snapping up the products at the expense of genuine customers? Keeping in mind tools like this are deliberately designed to avoid detection (remember the proxies?), it's a hard challenge to reliably separate the humans from the bots. But there's an indicator that's very easy to cross-check, and that's the occurrence of the email address in previous data breaches. Let me phrase it in simple terms:

We're all so comprehensively pwned that if an email address isn't pwned, there's a good chance it doesn't belong to a real human.

Hence, "Pwned or Bot" and this is precisely the methodology organisations have been using HIBP data for. With caveats:

If an email address hasn't been seen in a data breach before, it may be a newly created one especially for the purpose of gaming your system. It may also be legitimate and the owner has just been lucky to have not been pwned, or it may be that they're uniquely subaddressing their email addresses (although this is extremely rare) or even using a masked email address service such as the one 1Password provides through Fastmail. Absence of an email address in HIBP is not evidence of possible fraud, that's merely one possible explanation.

However, if an email address has been seen in a data breach before, we can say with a high degree of confidence that it did indeed exist at the time of that breach. For example, if it was in the LinkedIn breach of 2012 then you can conclude with great confidence that the address wasn't just set up for gaming your system. Breaches establish history and as unpleasant as they are to be a part of, they do actually serve a useful purpose in this capacity.

Think of breach history not as a binary proposition indicating the legitimacy of an email address, rather as one of assessing risk and considering "pwned or bot" as one of many factors. The best illustration I can give is how Stripe defines risk by assessing a multitude of fraud factors. Take this recent payment for HIBP's API key:

Pwned or Bot

There's a lot going on here and I won't run through it all, the main thing to take away from this is that in a risk evaluation rating scale from 0 to 100, this particular transaction rated a 77 which puts it in the "highest risk" bracket. Why? Let's just pick a few obvious reasons:

  1. The IP address had previously raised early fraud warnings
  2. The email was only ever once previously seen on Stripe, and that was only 3 minutes ago
  3. The customers name didn't match their email address
  4. Only 76% of transactions from the IP address had previously been authorised
  5. The customer's device had previously had 2 other cards associated with it

Any one of these fraud factors may not have been enough to block the transaction, but all combined it made the whole thing look rather fishy. Just as this risk factor also makes it look fishy:

Pwned or Bot

Applying "Pwned or Bot" to your own risk assessment is dead simple with the HIBP API and hopefully, this approach will help more people do precisely what HIBP is there for in the first place: to help "do good things after bad things happen".

Data Breach Misattribution, Acxiom & Live Ramp

By Troy Hunt
Data Breach Misattribution, Acxiom & Live Ramp

If you find your name and home address posted online, how do you know where it came from? Let's assume there's no further context given, it's just your legitimate personal data and it also includes your phone number, email address... and over 400 other fields of data. Where on earth did it come from? Now, imagine it's not just your record, but it's 246 million records. Welcome to my world.

This is a story about a massive corpus of data circulating widely within the hacking community and misattributed to a legitimate organisation. That organisation is Acxiom, and their business hinges on providing their customers with data on their customers. By the very nature of their business, they process large volumes of data that includes a broad set of personal attributes. By pure coincidence, there is nominal commonality between Acxiom’s records and the ones in the 246M corpus I mentioned earlier. But I'm jumping ahead to the conclusion, let's go back to the beginning:

Disclosure and Attribution Debunking

In June last year, I received an email from someone I trust who had sent me data for Have I Been Pwned (HIBP) in the past:

Have you seen Axciom [sic] data? It was just sent to us. Seems to being traded/sold on some forums. Have you received it yet? If not i can upload it for you. It's quite large tho, ~250M Records.

A corpus of data that size is particularly interesting as it impacts such a huge number of people. So, I reviewed the data and concluded... pretty much nothing. Looks legit, smells legit but there was absolutely nothing beyond the word of one person to tie it to Acxiom (and who knows who they got that word from). Burdened by other more immediately actionable data breaches, I filed it away until recently when that name popped up again, this time on a popular hacking forum:

Data Breach Misattribution, Acxiom & Live Ramp

It was referred to as "LiveRamp (Formerly Acxiom)" and before I go any further, let's just clarify the problem with that while you're looking at the image above: LiveRamp was previously a subsidiary of Acxiom, but that hasn't been the case since they separated businesses in 2018 so whoever put this together is referring back to a very old state of play. Regardless, those downloading it from the forum were clearly very excited about it. Seeing this for the second time and spreading far more broadly, I decided to reach out to the (alleged) source and ask Acxiom what was going on.

I dread this process - contacting an organisation about a breach - because I usually get either no response whatsoever or a standoffish one. Rarely do I find a receptive organisation willing to fully investigate an alleged incident, but that's exactly what I found on this occasion. Much of the reason why I wanted to write this post is because whilst I hate breached organisations not properly investigating an incident, I also hate seeing misattribution of a breach to an innocent party. That's a particularly sore point for me right now because of this incident just last week:

This is the dumbest infosec story I’ve read in… forever? It is so profoundly incorrect, poorly researched, never verified, rambling and indistinguishable from parody that I literally went looking for the parody reference. I think he’s actually serious! https://t.co/oLyIHxb8D3

β€” Troy Hunt (@troyhunt) November 15, 2022

I've had various public users of HIBP, commercial users and even governments reach out to ask what's going on because they were concerned about their data. Whilst this incident won't do HIBP any actual harm (and frankly, I'm stunned anyone took that story seriously), I can very easily see how misattribution can be damaging to an organisation, indeed that's a key reason why I invest so much effort into properly investigating these claims before putting anything into HIBP. But that ridiculous example is nothing compared to the amount of traction some misattributions get. Remember how just recently a couple of billion TikTok accounts had been "breached"? This made massive news headlines until...

The thread on the hacking forum with the samples of alleged TikTok data has been deleted and the user banned for β€œlying about data breaches” https://t.co/9ZKkKvu8JT

β€” Troy Hunt (@troyhunt) September 5, 2022

"Lying about data breaches". Ugh, criminals are so untrustworthy! This happens all the time and when I'm not sure of the origin of a substantial breach, I often write a blog post like this and on many occasions, the masses help establish the origin. So, here goes:

The Data

Let's jump into the data, starting with 2 of the most obvious things I look for in any new data breach:

  1. The total number of unique email addresses is 51,730,831 (many records don't have this field populated)
  2. The most recent data I can find is from mid-2020 (which also speaks to the inaccuracy of the LiveRamp association)

As to the aforementioned attributes, they total 410 different columns:

To my eye, this data is very generic and looks like a superset of information that may be collected across a large number of people. For example, the sort of data requested when filling out dodgy online competitions. However, unlike many large corpuses of aggregated data I've seen in the past, this one is... neat. For example, here's a little sample of the first 5 columns (redaction of some chars with a dash), note how the names are all uniformly presented:

120321486,4,BE-----,B,TAYLOR
120321487,2,JOY,M,----EY
120321466,1,DOYLE,E,------HAM
120321486,3,L----,,TAYLOR
120321486,2,R---,M,TAYLOR

Sure, this is just uppercasing characters but over and over again, I found data that was just too neat. The addresses. The phone numbers. Everything about it was far to curated to simply be text entered by humans. My suspicion is that it's likely a result of either a very refined collection process or in the case of addresses, matched using a service to resolve the human-entered address to a normalised form stored centrally.

Perhaps what I was most interested in though was the URL column as that seems to give some indication of where the data might have come from. I queried out the top 100 most common ones and took a look:

Eyeballing them, I couldn't help feel that my earlier hunch was on the money - "dodgy online competitions". Not just competitions but a general theme of getting stuff for cheap or more specifically, services that look like they've been built to entice people to part with their personal data.

Take the first one, for example, DIRECTEDUCATIONCENTER.COM. That's a dead domain as of now but check out what it looked line in March last year:

Data Breach Misattribution, Acxiom & Live Ramp

"I may be contacted by trusted partners and others". What's "others"? Untrusted partners? πŸ€·β€β™‚οΈ

Let's try the next one being originalcruisegiveaway.com and again, the site is now gone so it's back over to archive.org:

Data Breach Misattribution, Acxiom & Live Ramp

It's different, but somehow the same. Clicking through to the claim form, it seems the only way you can enter is if you agree to receive comms from all sorts of other parties:

Data Breach Misattribution, Acxiom & Live Ramp

Ok, one more, this time free-ukstuff.com which is also now a dead site, and not even indexed by archive.org. Next then, is findyourdegreenow.com which is - you're not gonna believe this - a dead site! Here's what it used to look like:

Data Breach Misattribution, Acxiom & Live Ramp

And again, it feels the same. Same same, but different.

To try and get a sense of how localised this data was, I queried out all the values in the "state" column. Is this a US-only data set? If that column is anything to go by, yes:

Something didn't add up when I first saw that and after a quick check of the population of each US state, it become immediately obvious: there's no California, the most populous state in the country. Nor Texas, the second most populous state. In fact, with only 35 rows there's a bunch of US states missing. Why? Who knows, the only thing I can say for sure is that this is a subset of the population with some glaring geographical omissions.

Then there's another curveball - what about the URL quickquid.co.uk, that doesn't look very US-centric. Heading over there redirects to casheuronetukadministration.grantthornton.co.uk which advises that as of last month, "The Administration of CashEuroNet UK, LLC has closed and the Joint Administrators have ceased to act". So something has obviously been wound up, wonder what was there originally? I had to go back a few years to find this:

Data Breach Misattribution, Acxiom & Live Ramp

To my mind, this is more of the same ilk in terms of a service targeted at people after quick money. But it's clearly all in GBP and with a .co.uk TLD, this being right after I've just said all the states are in the US, what gives? Back to the source data, filter out the records based on that URL and sure enough, everyone has a US address. Grabbing a random selection of IP addresses had them all resolving to the US too so I have absolutely no idea how his geographically inconsistent set of data came to being.

And that's really the theme across the data set when doing independent analysis - how is this so? What service or process could have pulled the data together in this way? Maybe the people who this data actually refers to will have the answers, let's go and ask them.

Responses From Impacted HIBP Subscribers

We're approaching 4.5M subscribers to HIBP's free notification service now which makes for a great corpus of people I can reach out to when doing breach verification. I grabbed a handful of addresses from this data set and asked them if they could help out. I sent those that responded positively their full record and asked some questions about the legitimacy of the data, and where they thought it might have come from, here's what they said:


1. The data is mostly accurate.

A few things are off, such as date of birth (could very well be a fake one I've entered before) and details of household members.

There are a lot of columns with single-letter values, which I can't verify without knowing what they mean.

But overall, it's quite accurate.

2. No idea where it came from, sorry. There is a URL in the third-to-last column, but it doesn't seem like a website I would have used before.


I looked through the csv file and couldn't find anything I recognized. I saw the names [redacted], [redacted] and [redacted]- I don't know anyone by those names. I live in Ontario, Canada, but addresses in the file were located in the united states.

Data says I have one child between the ages of 0 and 2, but that's not true - my only son is five. Birth date is wrong - my birthday is [redacted], but the file says [redacted].

There were a few urls in the file and I don't recognize any of them.

Not sure if this last thing is relevant or not. I sometimes get emails intended for other people. I searched my inboxes for the names [redacted] and [redacted]. Nothing came up for [redacted], but I do see an email for [redacted] from [redacted]. I searched through the csv to see if anything matched the data in the email (member number, confirmation number), but nothing matched.

I also noticed that although my email address ([redacted]) is in the csv data, there's also another email address ([redacted]) which is not mine.

I'm not sure if that's helpful or not, but if there's anything more I can do, let me know. :)


As far as name and address they are correct. Β number of ppl living at the house has changed. Β The other information I can't seem to understand what the information for example under column AQ row 2 it has a U and I don't know what the U is for. Β I have noticed that some information is really outdated, so I wouldn't know where the data originated from.


Thank you for sharing, I took a look at the data, let me see if I can answer your questions:

1. While that is my email, the rest of the data actually belongs to an immediate family member. With the exception of a few outdated fields, the data on my family member is correct.

2. I am unfamiliar with Acxiom and am unsure of where this data originated from. I want to note that I have recently been doxxed and have reason to believe data breaches may have been used; however, the data you've provided here was not used in the attacks, to my knowledge.

Please let me know if you have any other questions, or if there is anything else I may do to help.


"Mostly accurate". The feeling I have when reading this is that whoever is responsible for this corpus of data has put it together from multiple sources and quite likely made some assumptions along the way. I can picture how that would happen; imagine trying to match various sources of data based on human-provided text fields in order to "enrich" the collection.

Analysis by Acxiom

This isn't the fist time Acxiom has had to deal with misattribution, and they'd seen exactly the same data set passed around before. Think about it from their perspective: every time there's a claim like this they need to treat it as though it could be legitimate, because we've all seen what happens when an organisation brushes off a disclosure attempt (I could literally write a book about this!) Thus it becomes a burdensome process for them as they repeat the same analysis over and over again, each time drawing the same conclusion.

And what was that conclusion? Simply put, the circulating data didn't align with their own. They're in the best position of all of us to draw that conclusion as they have access to both data sets and whilst I suspect some people may retort with "how do you know you can trust them", not only do I not have a good reason to doubt their findings, I also don't have a good reason to attribute it to them. Every reference I've seen to Acxiom has been from whoever is handing the data around; I've been able to find absolutely nothing within the data set itself to tie it back to them. In almost all breaches I've processed, the truth is in the data and there's nothing here that points the finger at them.

I offered Acxiom the opportunity to further clarify their position with a statement which I've included in its entirety here:

β€œAcxiom has worked to build a reputation over the course of fifty years for having the highest standards around data privacy, data protection and security. In the past, questionable organizations have falsely attached our name to a data file in an attempt to create a deceitful sense of legitimacy for an asset. In every instance, Acxiom conducts an extensive analysis under our cyber incident response and privacy programs. These programs are guided by stakeholders including working with the appropriate authorities to inform them of these crimes. Β The forensic review of the case that Troy has looked into, along with our continuous monitoring of security, means we can conclusively attest that the claims are indeed false and that the data, which has been readily available across multiple environments, does not come from Acxiom and is in no way the subject of an Acxiom breach.

Acxiom’s Commitment To Data Protection/ Data Privacy:
We value consumer privacy. Β  U.S. consumers who would like to know what information Acxiom has collected about them and either delete it or opt out of Acxiom’s marketing products, may visit acxiom.com/privacy for more information.”

Summary

The email addresses from the data set have now been loaded into HIBP and are searchable. One point of note that became evident after loading the data is that 94% of the email addresses has already been pwned. That's a very high number (a quick look through the HIBP Twitter feed shows the count is normally between 40% and 80%), and it suggests that this corpus of data may be at least partially constructed from other data already in circulation.

Because the question will inevitably come up, no, I won't send you your full record, I simply don't have the capacity to operate as a personal data lookup and delivery service. I know it's frustrating finding yourself in a breach like this and not being able to take any action, all you can really do at this point is treat it as another reminder of how our data spread around the web and often, we have no idea about it.

Full disclosure: I have absolutely no commercial interest in Acxiom, no money has changed hands and I wasn't incentivised in any way, I just want everyone to have a much healthier suspicion when alleging the source of a data breach πŸ™‚

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

By Troy Hunt
The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

A couple of weeks ago I wrote about some big changes afoot for Have I Been Pwned (HIBP), namely the introduction of annual billing and new rate limits. Today, it's finally here! These are two of the most eagerly awaited, most requested features on HIBP's UserVoice so it's great to see them finally knocked off after years of waiting. In implementing all this, there are changes to the existing "one size fits all" model so if you're using the HIBP API, please make sure you read this carefully and understand the impact (if any) on you. Here goes:

The Rate Limits and (Some) Pricing is Different

The launch blog post for the authenticated API explained the original rationale behind the $3.50 per month price and most importantly, how I wanted to ensure it didn't pose a barrier:

In choosing the $3.50 figure, I wanted to ensure it was a number that was inconsequential to a legitimate user of the service

As I said in the previous blog post, what I didn't understand at the time was that paradoxically, the low amount was a barrier to many organisations! But equally, it's made the API super accessible to the masses so that price stays. The rate limit, however, needed revisiting and to understand why, let's go back to the beginning:

The "1 request per 1,500ms" rate dated all the way back to 2016 where I'd initially attempted to combat abuse by applying the limit per IP. This was an entirely non-empirical, gut feel, "let's just try and fix the problem right now" decision and it was only very recently I actually started trawling through the data and looking at how the API was being consumed. 1 request every 1,500ms is a maximum of 57,600 requests in a day; here's the number of requests by the top 20 consumers of the service in a recent 24 hour mid-week period:

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

Keeping in mind that you're never going to achieve the full 57,600 requests in a day as you'd have to time every single one of them perfectly so as not to hit the rate limit, only 1 subscriber even achieved half that potential. In fact, only 9 subscribers achieved even a quarter of the potential with everyone else very quickly falling back to a small fraction of even that. To be fair, I'm conscious that I'm taking a full day of data and talking about requests as if they were evenly distributed across the entire period when there are inevitably use cases where it's more a short burst rather than a prolonged, even distribution. Regardless, what the data is saying is that the default "one size fits all" rate limit is way above and beyond what almost every single subscriber is actually consuming, and by a significant order of magnitude too. In a way, what we ended up with is the little guys subsidising the big guys.

The bottom line is that we're simultaneously adding a bunch of higher rate limits whilst reducing the entry level rate limit. It's easier if you see it all in context so let's just jump straight into the pricing (all in USD):

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

This is from Stripe's embeddable pricing table I mentioned in the previous post and it's what you see when you first sign up for a key. With new limits, it's easier to talk about "requests per minute" or RPM so that's the nomenclature we're sticking with now. That entry level 10RPM model will work for well in excess of 90% of current subscribers and it's only a very small percentage of the existing subscriber base exceeding it. (And yes, again, I know these requests are sometimes made in bursts but even still, 10RPM is far in excess of the vast majority of use cases.)

There are economies of scale that have been factored in here. Going from 10RPM to 100RPM isn't a 10x increase, it's about a 7x increase. Going to 5 times more requests is only 4 times the price, and so on and so forth. The hope is that this makes it easier for the folks who were previously buying multiple keys to justify scratching all the kludge previously used to do that and replacing it with a single key at a higher RPM.

To get to this outcome, we trawled back through heaps of data ranging from the high-level aggregated stats in the earlier chart to the nature of the organisations buying multiple keys (which we can obviously determine based on the email address used). I also chatted with a bunch of API users both during this process and over the preceding years and have a pretty good sense of the use cases. A few trends became immediately clear:

Firstly, use cases that are genuinely personal have a very low rate limit requirement. Checking your own address(es) or those of your family by a custom app, for example. Or one of my favourite uses (and one I definitely use), the Home Assistant integration:

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

On an ongoing basis, HA makes 1 request every 15 minutes. That's all. Each time we looked at genuine personal use cases, 10RPM was plenty.

Next, we found a bunch of use cases used within internal corporate environments, for example to monitor staff exposure in breaches. Now we're talking larger numbers of requests, but it's also something that's way more efficiently done via the existing domain search feature on the website. It's an on-demand, self-service and totally free feature that's been there for years. I know it's not API-based and there are good reasons for that (see the comment from me on that idea), but there's also the Enterprise route if API access is really that important (more on that later). Other examples included things like scanning customer emails to assess exposure at points where, for example, account takeover was a risk. In each of these cases, we're primarily talking about business entities using the service and I'm comfortable with commercial ventures wearing a greater cost.

And finally, there were the "heavy hitters", the ones with large volumes of keys. One such example using the API en masse provides security services to the big end of town and was funded to the tune of a figure that looks like a phone number. And again, I'm perfectly comfortable with them wearing a cost that's more commensurate to the value as opposed to a figure that was originally arrived at just to keep the bad guys out.

Existing Subscribers are Grandfathered in for 60 Days

Before I talk about the annual pricing, I want to make sure this headline is clear. Nothing changes for existing subscribers until the 6th of Jan next year, which is 60 days from today. On that date, the legacy rate limit of 1 request every 1,500ms will roll to the new 10RPM limit at exactly the same price. For that handful of big users for whom the 10RPM limit will be insufficient, you've got a couple of months to work out the best path forward. I'll be emailing every single active subscriber today to ensure everyone is notified well in advance (there's also an updated Terms of Use which requires a notification email to be sent).

What does this mean in practical terms? If you want annual billing or a higher rate limit, you can go and implement that whenever you're ready (more on that soon). Alternatively, if you just want to stick with 10 RPM then you don't have to do anything, nothing will change. What I do strongly suggest though (and this hasn't changed, it's always been the guidance), is to make sure you're handling HTTP 429 responses gracefully. Regardless of what your rate limit is, if you're consuming the API in a fashion where you're not directly controlling the rate yourself, make sure you handle those responses appropriately.

Billing Can Now Occur Annually

This is the easy one to explain: annual payments are now a thing 😊 As I explained in the previous blog post, frequent payments of small amounts can play havoc with reimbursements in the corporate environment. It sucks, I've been there, but it is what it is. Annual billing alleviates that through a combination of a 12x reduction in the frequency of an expense claim and a larger single sum that's easier to explain to your procurement people than $3.50.

So, what do you charge for annual rather than monthly billing? My initial temptation was just to make it literally 12 times more because I don't have a lot of patience for spivvy marketing guff. However, there's a valid case to be made that a 12x reduction on individual payments warrants a discount as it removes overhead from our end (there's a constant percentage of all payments that are disputed or fail or cause other demands on our time), plus there's an argument to be made along the lines of customer loyalty warranting a discount. There's also just the very simple mathematics of the whole thing, best illustrated by a recent payment in Stripe:

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

That's 8.5% that disappears on every transaction, largely due to the 30c AUD charge no matter what the price of the transaction is:

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

The point is that there's merit for all in incentivising annual rather than monthly payments. We decided to look at what a typical annual discount was and time and time again, found the same thing:

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing
The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing
The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

Or in other words, a couple of months for free when you sign up for a year. In fact, coincidentally, that's exactly what I just signed up for with Nabu Casa (Home Assistant cloud) after receiving an email saying annual billing was now available 😊

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

It's never exactly 17%, rather it's like each example took 17% off 12 month's worth of a normal monthly fee then moved the number to something that looked pretty πŸ™‚ Some examples were less (Pluralsight is 14%) and others were more (the higher tiers of Zendesk are 20%), but ultimately we decided to work to that 17% number and came up with the following:

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

In keeping with the "pay for 10, get 12" theme, these prices are exactly 10 times the monthly ones. Easy peasy.

Stripe Customer Portal Magic Makes Changing Plans Easy

As I mentioned in the "big changes ahead" blog post, I've been deleting code like crazy in favour of deferring more processing back to Stripe themselves. By using their Customer Portal paradigm, it's now easy to change an existing plan:

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

The change can be to a different rate limit or to a different renewal cadence:

The Have I Been Pwned API Now Has Different Rate Limits and Annual Billing

Stripe automatically proratas everything too so whilst you can upgrade immediately to a higher RPM or from monthly to annually, you'll only pay for the difference between the previous plan and the new one. Or, you can downgrade and on next renewal the lower plan will be automatically applied. It's super simple and it's all self-service.

Enterprise

For more than 7 years now, a small handful of organisations have used HIBP in a larger scale commercial fashion. Some of them you're familiar with, for example both 1Password and Mozilla do email address searches using k-Anonymity and that's not something that's a self-service "put your card into Stripe" sort of model (in part because k-Anonymity returns a huge number of results for each search). Infosec firms use Enterprise to support customers via domain level API searches. Identity theft companies use it to advise customers when they're exposed in a breach. One firm even uses it to help detect bot signups; it turns out that so many of us are so pwned, if someone signs up for their service and they're not pwned, that's a little bit suspicious (that's just one of many indicators they use).

This is a fundamentally different model, one that involves a close working relationship, lots of legal documents, procurement people, invoicing instead of credit cards and all sorts of other "Enterprisey" things. That still exists and nothing in today's blog post changes that. I mention this now in today's post simply because some of the folks from those organisations with Enterprise subscriptions will read this post and wonder where they sit. Likewise, I suspect those "100+ key" subscribers of the public API really should be on Enterprise and I'll be reaching out to them separately given the rate limit change will have a bigger impact on them than most.

In Closing

For that vast majority of users who are only at a fraction of the old rate limit, nothing changes other than there now being a key available for 17% less than before on an annual subscription. Meanwhile, for the folks battling corporate bureaucracy around small, frequent payments, this will sort you out and give you choices around rate limits you didn't have before.

There will be some people that fall between the cracks of the use cases outlined above and won't be happy with the changes. I expect that - I know it will happen - but I hope the rationale outlined here demonstrates the volume of thought and consideration that has gone into trying the find the sweet spot for pricing and rate limits. I also expect people will ask about adding other rate limits, for example to fill the gaps between say, 100RPM and 500RPM. We started out with more options, but a combination of that creating the whole paradox of choice problem and deeper analysis of how the API was actually being used led us to simplifying things. But who knows over the longer term, feedback is certainly welcome.

Lastly, if you're watching closely, you'll notice a lot more structure going in around the way HIBP is run. Last week I wrote about rolling out Zendesk for support so there's now a formal ticketing system in place. I also explained how Charlotte is playing a very active role in the management of HIBP and in the coming months, you'll see more around other initiatives to make the project more sustainable. I'm thinking of it like this: what must HIBP do to be sustainable in a post-Troy world? Or in other words, how can we get what has increasingly become an essential service for so many to be more robust and more self-sustaining beyond what one person can do as a sole operator devoting spare time to a passion project.

Stay tuned, there's much more to come πŸ™‚

Better Supporting the Have I Been Pwned API with Zendesk

By Troy Hunt
Better Supporting the Have I Been Pwned API with Zendesk

I've been investing a heap of time into Have I Been Pwned (HIBP) lately, ranging from all the usual stuff (namely trawling through masses of data breaches) to all new stuff, in particular expanding and enhancing the public API. The API is actually pretty simple: plug in an email address, get a result, and that's a very clearly documented process. But where things get more nuanced is when people pay money for it because suddenly, there are different expectations. For example, how do you cancel a subscription once it's started? You could read the instructions when signing up for a key, but who remembers what they read months ago? There's also a greater expectation of support for everything from how to construct an API request to what to do when you keep getting 429 responses because you're (allegedly) making too many requests. And yes, some of these queries are, um, "basic", but they're still things people want support with.

In the beginning, all emails from HIBP came from noreply@haveibeenpwned.com because I simply wasn't geared up to provide support. In my naivety, I assumed people would see "noreply" and not reply. Instead, they'd send email to that address, get frustrated when there was no reply (from the "noreply" address...) and seek out my personal contact info. Or they'd lodge a dispute with Stripe because they'd emailed noreply@ asking for their subscription to be cancelled and it wasn't. So, back in September I started looking for a better solution:

I’m thinking of setting up a more formal support process for @haveibeenpwned, especially for folks buying API keys and having queries around billing or implementation. Any suggestions on a service? Something that can triage requests, perhaps also have FAQs. Thoughts?

β€” Troy Hunt (@troyhunt) September 29, 2022

This was a non-trivial exercise. We've all used support services before, so we have an idea of what to expect from an end user perspective, but it's a different story once you dive into all the management bits behind them. Frankly, I find this sort of thing mind-numbing but fortunately it's a task my amazing wife Charlotte picked up with gusto. She has become increasingly involved in all things troyhunt.com and HIBP lately as she brings order, calm and frankly, much needed sanity into my otherwise crazy, demanding professional life. We also figured that if we did this right, she'd be able to handle a lot of the support queries I previously did myself, so she was always going to play a big part in choosing the support platform.

Largely based on Charlotte's work, we settled on Zendesk and about a week ago, silently pushed out support.haveibeenpwned.com:

Better Supporting the Have I Been Pwned API with Zendesk

There are FAQs that cover a bunch of frequent questions, troubleshooting that addresses common problems and, of course, the ability to submit a request if you still need help. These are all a work in progress, and we'll add a lot more content in response to queries, just so long as they're about the right thing. Speaking of which:

This service is only for users of the public commercial API key, not for general HIBP queries.

Why? Because I constantly get queries like this:

Uh… and why am I sleeping during the day?! pic.twitter.com/BUGTJtgl7t

β€” Troy Hunt (@troyhunt) November 1, 2022

Is that even a query?! I don't know! But I do know that someone took the time to track down my personal email address this week and send it to me, and it's not the sort of thing we're going to be responding to on Zendesk. Nor are queries along the lines of the following:

I've been pwned, now what?

Or:

How do I remove my data from data breaches?

Or one of my personal favourites:

I demand you delete all my data from the data breaches or you'll get a letter from my lawyer!

This whole data breach landscape is a foreign concept for many people, and I understand there being questions, but Charlotte and I can't simultaneously run a free service and reply to queries like this from the masses. But the queries that come in via Zendesk are something we can manage as it's clearly scoped, there's lots of supporting docs and for the most part, we're dealing with tech professionals who understand this world a bit better than your average punter in the first place.

As I announced in last week's blog post, we're pushing ahead with new rate limits and annual billing for the API key and getting this piece out first was always an important prerequisite. It's all part of gearing up for bigger things ahead for HIBP 😊

Big Changes are Afoot: Expanding and Enhancing the Have I Been Pwned API

By Troy Hunt
Big Changes are Afoot: Expanding and Enhancing the Have I Been Pwned API

Just over 3 years ago now, I sat down at a makeshift desk (ok, so it was a kitchen table) in an Airbnb in Olso and built the authenticated API for Have I Been Pwned (HIBP). As I explained at the time, the primary goal was to combat abuse of the service and by adding the need to supply a credit card, my theory was that the bad guys would be very reluctant to, well, be bad guys. The theory checked out, and now with the benefit of several years of data, I can confidently say abuse is near non-existent. I just don't see it. Which is awesome 😊

But there were other things I also didn't see, and it's taken a while for me to get around to addressing them. Some of them are fixed now (like right now, already in production), and some of them will be fixed very, very soon. I think it's all pretty cool, let me explain:

Payments Can Be Hard... if You Don't Stripe Right

A little more background will help me explain this better: in the opening sentence of this blog post I mentioned building the original authenticated API out on a kitchen table at an Airbnb in Oslo. By that time, everyone knew I was going through an M&A process with HIBP I called Project Svalbard, which ultimately failed. What most people didn't know at the time was the other very stressful goings on in my life which combined, had me on a crazy rollercoaster ride I had little control over. It was in that environment that I created the authenticated API, complete with the Azure API Management (APIM) component and Stripe integration. It was rough, and I wish I'd done it better. Now, I have.

In the beginning, I pushed as much of the payment processing as possible to the HIBP website. This was due to a combination of me wanting to create a slick UX and frankly, not understanding Stripe's own UI paradigms. It looked like this:

Big Changes are Afoot: Expanding and Enhancing the Have I Been Pwned API

Cards never ended up hitting HIBP directly, rather the site did a dance with Stripe that involved the card data going to them directly from the client side, a token coming back and then that being used for the processing. It worked, but it had numerous problems ranging from lack of support for things like 3D Secure payments, no support for other payments mechanisms such as Google Pay and Apple Pay and increasingly, large amounts of plumbing required to tie it all together. For example, there were hundreds of lines of code on my end to process payments, change the default card and show a list of previous receipts. The Stripe APIs are extraordinarily clever, but I couldn't escape writing large troves of my own code to make it work the way I originally designed it.

Two new things from Stripe since I originally wrote the code have opened up a whole new way of doing this:

  1. Customer Portal: This is a fully hosted environment where payments are made, cards and subscriptions are managed, invoices and receipts are retrieved and basically, a huge amount of the work I'd previously hand-built can be managed by them rather than by me
  2. Embeddable Pricing Table: This brings the products and prices defined in Stripe into the UI of third party services (such as HIBP) such that customers can select their product then head off to Stripe and do the purchasing there

Rolling to these services removed a huge amount of code from HIBP with the bulk of what's left being email address verification, API key management and handling callbacks from Stripe when a payment is successful. What all this means is that when you first create a subscription, after verifying your email address, you see these two screens:

Big Changes are Afoot: Expanding and Enhancing the Have I Been Pwned API
Big Changes are Afoot: Expanding and Enhancing the Have I Been Pwned API

That's the embeddable pricing table following by Stripe's own hosted payment page. I left the browser address bar in the latter to highlight that this is served by Stripe rather than HIBP. I love distancing myself from any sort of card processing and what's more, everything to do with actually taking the payment is now Stripe's problem 😊 If you're interested in the mechanics of this, a successful payment calls a webhook on HIBP with the customer's details which updates their account with a month of API key whilst the screen above redirects them over to the HIBP website where they can grab their key. Easy peasy.

I silently rolled this out a week ago, watched it being used, made a few little tweaks and then waited until now to write about it. The rollout coincided with a typical email I've received so many times before:

First of all I would like to thank you for the wonderful service that helps people to keep track of their email breaches. I was trying to build a product to provide your services via my website, something similar to Firefox, avast and 100's of other companies doing. We were trying to do it according to the guidelines mentioned in the website. However I am not able to renew my purchase due to payment gateway failures at stripe payment. Requesting you to kindly check the same and advise me on alternate methods for making the payment.

The old model often caused payments to be rejected, especially from subscribers in India. The painful thing for me when trying to help folks is that Stripe would simply report the failed payment as follows:

Big Changes are Afoot: Expanding and Enhancing the Have I Been Pwned API

However, going back to the individual who raised the query above after rolling out this update, things changed very dramatically:

Big Changes are Afoot: Expanding and Enhancing the Have I Been Pwned API

To the title of this section, I simply wasn't "Striping" right. I'm sure there's a way with enough plumbing that it's feasible, but why bother? I cut hundreds of lines of code out just by delegating more of the workload back to them. Further, with ever tightening PCI DSS standards (read Scott's piece, interesting stuff) the less I have to do with cards, the better.

This was a "penny drop" moment for me and it's already made a big difference in a positive way. But there's another penny that dropped for me at the same time: one-off keys were an unnecessary problem.

There Are No More One-Off Keys

It was at the moment I was ripping out those hundreds of lines of code that I wondered: why do I have all the additional kludge to support the paradigm of a one-off key that only lasts a month? Why had I built (and was now maintaining) server side code to handle different types of purchases and UX paradigms to represent one-off versus recurring keys? My gut feel was that most payments formed part of an ongoing subscription but hey, who needs gut feels when you have real data?! So I pulled the numbers:

Only 7% of payments were one-offs, with 93% of payments forming part of ongoing subscriptions.

And so I killed the one-off keys. Kinda, because you can still have a key for only one month, you just purchase a monthly subscription then immediately cancel it via the Stripe Customer Portal:

Big Changes are Afoot: Expanding and Enhancing the Have I Been Pwned API

That's linked into from the API key dashboard on HIBP and it'll take all of 5 seconds to do (also note the ability to change payment method directly on the Stripe site). I've added text to that effect on the HIBP website (you may have spotted that in the earlier screen cap) so in practice, the ability to purchase a one-off key is still there and the main upside of this is that I've just killed a trove of code I no longer have to worry about πŸ™‚ Because this is the internet, I'm sure someone will still be upset, but if you only want a key for a month then that capability still well and truly exists.

All of this so far amounts to doing the same things that were always there but better. Now let's talk about the all new stuff!

Annual Billing and Different Rate Limits are Coming... Very Soon!

The title is self-explanatory and "very soon" is in about 2 weeks from now 😎

Let me illustrate the first part of that title with a message I received recently:

Is there a way to procure a 10 year API key? Our client wants to use the Have I been Pwned plugin for [redacted service name]; however, the $3.50 monthly subscription is too small to go through procurement.

What's that saying about no good deed going unpunished? In my naivety, I made the pricing low with the thinking that was a good thing, yet here we are with that posing barriers! This was a recurring message over and over again with folks simply struggling to get their $3.50 reimbursed. I should have seen this coming after years of living the corporate life myself (I have vivid flashbacks of how hard it was to get small sums reimbursed), and filling out an untold number of expense reports. Speaking of which, this was another recurring theme:

Is there a way to pay yearly for HIBP API access vs monthly? Β Monthly adds overhead in paperwork.

And again, I get it, this is a painful process. It somehow feels even more painful due to the fact the sum is so low; how much time are people burning trying to justify $3.50 to their boss?! It's painful, and this likely explains why the request for annual payments is the second most requested idea on HIBP's UserVoice. The comments there speak for themselves, and I'm having corporate PTSD flashbacks just reading them again now!

Sticking with the UserVoice theme, the 5th most requested feature is for different pricing on different rate limits. This is mostly self-explanatory but what I wasn't aware of until I went and pulled the stats was just how many people were hacking around the rate limit problem. There are heaps of API accounts like this:

hibp+1@domain.com
hibp+2@domain.com
hibp+3@domain.com
...

Because there can only be one key per email address, organisations are creating heaps of unique sub-addressed emails in order to buy multiple keys. This would have been a manual, laborious process; there's no automated way to do this, quite the contrary with anti-automation controls built into the process. Further, each key has it's own rate limit so I imagine they were also building a bunch of plumbing in the back end to then distribute requests across a collection of keys which, yeah, I get it, but man that seems like hard work! When I say "a collection of keys", I'm not just talking about a few of them either; the largest number of active in-use keys by a single organisation is 112. One hundred and twelve! The next largest is 110. I never expected that 🀯 (Incidentally, these orgs and the others obtaining multiple keys are all precisely the kinds I want using the API to do good things.)

Building the mechanics of annual billing and different rate limits is only part of the challenge and most of that is already done, the harder part is pricing it. I'm pulling troves of analytics from APIM at present to better understand the usage patterns, and it's quite interesting to see the data as it relates to requests for the API:

Big Changes are Afoot: Expanding and Enhancing the Have I Been Pwned API

There's no persistent logging of the actual queries themselves, but APIM makes it easy to understand both the volume of queries and how many of them are successful versus failed, namely because they exceed the existing rate limit or were made with an invalid (likely expired) key. So, that's what I need to work out over the next couple of weeks when I'll launch everything and write it up, as always, in detail πŸ™‚

Summary

The HIBP API has become an increasingly important part of all sorts of different tools and systems that use the data to help protect people impacted by data breaches. The changes I've pushed out over the last week help make the service more accessible and easier to manage, but it's the coming changes I'm most excited about. These are the ones that will make life so much easier on so many people integrating the service and, I sincerely hope, will enable them to do things that make a much more profound impact on all of us who've been pwned before.

Go and check out how the whole API key process works, I'd love to hear your feedback 😊

❌