There's a "hidden" API on HIBP. Well, it's not "hidden" insofar as it's easily discoverable if you watch the network traffic from the client, but it's not meant to be called directly, rather only via the web app. It's called "unified search" and it looks just like this:
It's been there in one form or another since day 1 (so almost a decade now), and it serves a sole purpose: to perform searches from the home page. That is all - only from the home page. It's called asynchronously from the client without needing to post back the entire page and by design, it's super fast and super easy to use. Which is bad. Sometimes.
To understand why it's bad we need to go back in time all the way to when I first launched the API that was intended to be consumed programmatically by other people's services. That was easy, because it was basically just documenting the API that sat behind the home page of the website already, the predecessor to the one you see above. And then, unsurprisingly in retrospect, it started to be abused so I had to put a rate limit on it. Problem is, that was a very rudimentary IP-based rate limit and it could be circumvented by someone with enough IPs, so fast forward a bit further and I put auth on the API which required a nominal payment to access it. At the same time, that unified search endpoint was created and home page searches updated to use that rather than the publicly documented API. So, 2 APIs with 2 different purposes.
The primary objective for putting a price on the public API was to tackle abuse. And it did - it stopped it dead. By attaching a rate limit to a key that required a credit card to purchase it, abusive practices (namely enumerating large numbers of email addresses) disappeared. This wasn't just about putting a financial cost to queries, it was about putting an identity cost to them; people are reluctant to start doing nasty things with a key traceable back to their own payment card! Which is why they turned their attention to the non-authenticated, non-documented unified search API.
Let's look at a 3 day period of requests to that API earlier this year, keeping in mind this should only ever be requested organically by humans performing searches from the home page:
This is far from organic usage with requests peaking at 121.3k in just 5 minutes. Which poses an interesting question: how do you create an API that should only be consumed asynchronously from a web page and never programmatically via a script? You could chuck a CAPTCHA on the front page and require that be solved first but let's face it, that's not a pleasant user experience. Rate limit requests by IP? See the earlier problem with that. Block UA strings? Pointless, because they're easily randomised. Rate limit an ASN? It gets you part way there, but what happens when you get a genuine flood of traffic because the site has hit the mainstream news? It happens.
Over the years, I've played with all sorts of combinations of firewall rules based on parameters such as geolocations with incommensurate numbers of requests to their populations, JA3 fingerprints and, of course, the parameters mentioned above. Based on the chart above these obviously didn't catch all the abusive traffic, but they did catch a significant portion of it:
If you combine it with the previous graph, that's about a third of all the bad traffic in that period or in other words, two thirds of the bad traffic was still getting through. There had to be a better way, which brings us to Cloudflare's Turnstile:
With Turnstile, we adapt the actual challenge outcome to the individual visitor or browser. First, we run a series of small non-interactive JavaScript challenges gathering more signals about the visitor/browser environment. Those challenges include, proof-of-work, proof-of-space, probing for web APIs, and various other challenges for detecting browser-quirks and human behavior. As a result, we can fine-tune the difficulty of the challenge to the specific request and avoid ever showing a visual puzzle to a user.
"Avoid ever showing a visual puzzle to a user" is a polite way of saying they avoid the sucky UX of CAPTCHA. Instead, Turnstile offers the ability to issue a "non-interactive challenge" which implements the sorts of clever techniques mentioned above and as it relates to this blog post, that can be an invisible non-interactive challenge. This is one of 3 different widget types with the others being a visible non-interactive challenge and a non-intrusive interactive challenge. For my purposes on HIBP, I wanted a zero-friction implementation nobody saw, hence the invisible approach. Here's how it works:
Get it? Ok, let's break it down further as it relates to HIBP, starting with when the front page first loads and it embeds the Turnstile widget from Cloudflare:
<script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script>
The widget takes responsibility for running the non-interactive challenge and returning a token. This needs to be persisted somewhere on the client side which brings us to embedding the widget:
<div ID="turnstileWidget" class="cf-turnstile" data-sitekey="0x4AAAAAAADY3UwkmqCvH8VR" data-callback="turnstileCompleted"></div>
Per the docs in that link, the main thing here is to have an element with the "cf-turnstile" class set on it. If you happen to go take a look at the HIBP HTML source right now, you'll see that element precisely as it appears in the code block above. However, check it out in your browser's dev tools so you can see how it renders in the DOM and it will look more like this:
Expand that DIV tag and you'll find a whole bunch more content set as a result of loading the widget, but that's not relevant right now. What's important is the data-token attribute because that's what's going to prove you're not a bot when you run the search. How you implement this from here is up to you, but what HIBP does is picks up the token and sets it in the "cf-turnstile-response" header then sends it along with the request when that unified search endpoint is called:
So, at this point we've issued a challenge, the browser has solved the challenge and received a token back, now that token has been sent along with the request for the actual resource the user wanted, in this case the unified search endpoint. The final step is to validate the token and for this I'm using a Cloudflare worker. I've written a lot about workers in the past so here's the short pitch: it's code that runs in each one of Cloudflare's 300+ edge nodes around the world and can inspect and modify requests and responses on the fly. I already had a worker to do some other processing on unified search requests, so I just added the following:
const token = request.headers.get('cf-turnstile-response');
if (token === null) {
return new Response('Missing Turnstile token', { status: 401 });
}
const ip = request.headers.get('CF-Connecting-IP');
let formData = new FormData();
formData.append('secret', '[secret key goes here]');
formData.append('response', token);
formData.append('remoteip', ip);
const turnstileUrl = 'https://challenges.cloudflare.com/turnstile/v0/siteverify';
const result = await fetch(turnstileUrl, {
body: formData,
method: 'POST',
});
const outcome = await result.json();
if (!outcome.success) {
return new Response('Invalid Turnstile token', { status: 401 });
}
That should be pretty self-explanatory and you can find the docs for this on Cloudflare's server-side validation page which goes into more detail, but in essence, it does the following:
And because this is all done in a Cloudflare worker, any of those 401 responses never even touch the origin. Not only do I not need to process the request in Azure, the person attempting to abuse my API gets a nice speedy response directly from an edge node near them 🙂
So, what does this mean for bots? If there's no token then they get booted out right away. If there's a token but it's not valid then they get booted out at the end. But can't they just take a previously generated token and use that? Well, yes, but only once:
If the same response is presented twice, the second and each subsequent request will generate an error stating that the response has already been consumed.
And remember, a real browser had to generate that token in the first place so it's not like you can just automate the process of token generation then throw it at the API above. (Sidenote: that server-side validation link includes how to handle idempotency, for example when retrying failed requests.) But what if a real human fails the verification? That's entirely up to you but in HIBP's case, that 401 response causes a fallback to a full page post back which then implements other controls, for example an interactive challenge.
Time for graphs and stats, starting with the one in the hero image of this page where we can see the number of times Turnstile was issued and how many times it was solved over the week prior to publishing this post:
That's a 91% hit rate of solved challenges which is great. That remaining 9% is either humans with a false positive or... bots getting rejected 😎
More graphs, this time how many requests to the unified search page were rejected by Turnstile:
That 990k number doesn't marry up with the 476k unsolved ones from before because they're 2 different things: the unsolved challenges are when the Turnstile widget is loaded but not solved (hopefully due to it being a bot rather than a false positive), whereas the 401 responses to the API is when a successful (and previously unused) Turnstile token isn't in the header. This could be because the token wasn't present, wasn't solved or had already been used. You get more of a sense of how many of these rejected requests were legit humans when you drill down into attributes like the JA3 fingerprints:
In other words, of those 990k failed requests, almost 40% of them were from the same 5 clients. Seems legit 🤔
And about a third were from clients with an identical UA string:
And so on and so forth. The point being that the number of actual legitimate requests from end users that were inconvenienced by Turnstile would be exceptionally small, almost certainly a very low single-digit percentage. I'll never know exactly because bots obviously attempt to emulate legit clients and sometimes legit clients look like bots and if we could easily solve this problem then we wouldn't need Turnstile in the first place! Anecdotally, that very small false positive number stacks up as people tend to complain pretty quickly when something isn't optimal, and I implemented this all the way back in March. Yep, 5 months ago, and I've waited this long to write about it just to be confident it's actually working. Over 100M Turnstile challenges later, I'm confident it is - I've not seen a single instance of abnormal traffic spikes to the unified search endpoint since rolling this out. What I did see initially though is a lot of this sort of thing:
By now it should be pretty obvious what's going on here, and it should be equally obvious that it didn't work out real well for them 😊
The bot problem is a hard one for those of us building services because we're continually torn in different directions. We want to build a slick UX for humans but an obtrusive one for bots. We want services to be easily consumable, but only in the way we intend them to... which might be by the good bots playing by the rules!
I don't know exactly what Cloudflare is doing in that challenge and I'll be honest, I don't even know what a "proof-of-space" is. But the point of using a service like this is that I don't need to know! What I do know is that Cloudflare sees about 20% of the internet's traffic and because of that, they're in an unrivalled position to look at a request and make a determination on its legitimacy.
If you're in my shoes, go and give Turnstile a go. And if you want to consume data from HIBP, go and check out the official API docs, the uh, unified search doesn't work real well for you any more 😎
You’ve probably never heard of “16Shop,” but there’s a good chance someone using it has tried to phish you.
A 16Shop phishing page spoofing Apple and targeting Japanese users. Image: Akamai.com.
The international police organization INTERPOL said last week it had shuttered the notorious 16Shop, a popular phishing-as-a-service platform launched in 2017 that made it simple for even complete novices to conduct complex and convincing phishing scams. INTERPOL said authorities in Indonesia arrested the 21-year-old proprietor and one of his alleged facilitators, and that a third suspect was apprehended in Japan.
The INTERPOL statement says the platform sold hacking tools to compromise more than 70,000 users in 43 countries. Given how long 16Shop has been around and how many paying customers it enjoyed over the years, that number is almost certainly highly conservative.
Also, the sale of “hacking tools” doesn’t quite capture what 16Shop was all about: It was a fully automated phishing platform that gave its thousands of customers a series of brand-specific phishing kits to use, and provided the domain names needed to host the phishing pages and receive any stolen credentials.
Security experts investigating 16Shop found the service used an application programming interface (API) to manage its users, an innovation that allowed its proprietors to shut off access to customers who failed to pay a monthly fee, or for those attempting to copy or pirate the phishing kit.
16Shop also localized phishing pages in multiple languages, and the service would display relevant phishing content depending on the victim’s geolocation.
Various 16Shop lures for Apple users in different languages. Image: Akamai.
For example, in 2019 McAfee found that for targets in Japan, the 16Shop kit would also collect Web ID and Card Password, while US victims will be asked for their Social Security Number.
“Depending on location, 16Shop will also collect ID numbers (including Civil ID, National ID, and Citizen ID), passport numbers, social insurance numbers, sort codes, and credit limits,” McAfee wrote.
In addition, 16Shop employed various tricks to help its users’ phishing pages stay off the radar of security firms, including a local “blacklist” of Internet addresses tied to security companies, and a feature that allowed users to block entire Internet address ranges from accessing phishing pages.
The INTERPOL announcement does not name any of the suspects arrested in connection with the 16Shop investigation. However, a number of security firms — including Akamai, McAfee and ZeroFox, previously connected the service to a young Indonesian man named Riswanda Noor Saputra, who sold 16Shop under the hacker handle “Devilscream.”
According to the Indonesian security blog Cyberthreat.id, Saputra admitted being the administrator of 16Shop, but told the publication he handed the project off to others by early 2020.
16Shop documentation instructing operators on how to deploy the kit. Image: ZeroFox.
Nevertheless, Cyberthreat reported that Devilscream was arrested by Indonesian police in late 2021 as part of a collaboration between INTERPOL and the U.S. Federal Bureau of Investigation (FBI). Still, researchers who tracked 16Shop since its inception say Devilscream was not the original proprietor of the phishing platform, and he may not be the last.
It is not uncommon for cybercriminals to accidentally infect their own machines with password-stealing malware, and that is exactly what seems to have happened with one of the more recent administrators of 16Shop.
Constella Intelligence, a data breach and threat actor research platform, now allows users to cross-reference popular cybercrime websites and denizens of these forums with inadvertent malware infections by information-stealing trojans. A search in Constella on 16Shop’s domain name shows that in mid-2022, a key administrator of the phishing service infected their Microsoft Windows desktop computer with the Redline information stealer trojan — apparently by downloading a cracked (and secretly backdoored) copy of Adobe Photoshop.
Redline infections steal gobs of data from the victim machine, including a list of recent downloads, stored passwords and authentication cookies, as well as browser bookmarks and auto-fill data. Those records indicate the 16Shop admin used the nicknames “Rudi” and “Rizki/Rizky,” and maintained several Facebook profiles under these monikers.
It appears this user’s full name (or at least part of it) is Rizky Mauluna Sidik, and they are from Bandung in West Java, Indonesia. One of this user’s Facebook pages says Rizky is the chief executive officer and founder of an entity called BandungXploiter, whose Facebook page indicates it is a group focused mainly on hacking and defacing websites.
A LinkedIn profile for Rizky says he is a backend Web developer in Bandung who earned a bachelor’s degree in information technology in 2020. Mr. Rizky did not respond to requests for comment.
John Clifton Davies, a convicted fraudster estimated to have bilked dozens of technology startups out of more than $30 million through phony investment schemes, has a brand new pair of scam companies that are busy dashing startup dreams: A fake investment firm called Equity-Invest[.]ch, and Diligere[.]co.uk, a scam due diligence company that Equity-Invest insists all investment partners use.
A native of the United Kingdom, Mr. Davies absconded from justice before being convicted on multiple counts of fraud in 2015. Prior to his conviction, Davies served 16 months in jail before being cleared on suspicion of murdering his third wife on their honeymoon in India.
The scam artist John Bernard (left) in a recent Zoom call, and a photo of John Clifton Davies from 2015.
John Clifton Davies was convicted in 2015 of swindling businesses throughout the U.K. that were struggling financially and seeking to restructure their debt. For roughly six years, Davies ran a series of firms that pretended to offer insolvency services. Instead, he simply siphoned what little remaining money these companies had, spending the stolen funds on lavish cars, home furnishings, vacations and luxury watches.
In a three-part series published in 2020, KrebsOnSecurity exposed how Davies — wanted by authorities in the U.K. — had fled the country, taken on the surname Bernard, remarried, and moved to his new (and fourth) wife’s hometown in Ukraine.
After eluding justice in the U.K., Davies reinvented himself as The Private Office of John Bernard, pretending to be a billionaire Swiss investor who made his fortunes in the dot-com boom 20 years ago and who was seeking private equity investment opportunities.
In case after case, Bernard would promise to invest millions in hi-tech startups, only to insist that companies pay tens of thousands of dollars worth of due diligence fees up front. However, the due diligence company he insisted on using — another Swiss firm called The Inside Knowledge — also was secretly owned by Bernard, who would invariably pull out of the deal after receiving the due diligence money.
Bernard found a constant stream of new marks by offering extraordinarily generous finders fees to investment brokers who could introduce him to companies seeking an infusion of cash. Inside Knowledge and The Private Office both closed up shop not long after being exposed here in 2020.
In April 2023, KrebsOnSecurity wrote about Codes2You, a recent Davies venture which purports to be a “full cycle software development company” based in the U.K. The company’s website no longer lists any of Davies’ known associates, but the site does still reference software and cloud services tied to those associates — including MySolve, a “multi-feature platform for insolvency practitioners.”
Earlier this month, KrebsOnSecurity heard from an investment broker who found out his client had paid more than $50,000 in due diligence fees related to a supposed multi-million dollar investment offer from a Swiss concern called Equity-Invest[.]ch.
The investment broker, who spoke on condition that neither he nor his client be named, said Equity-Invest began getting cold feet after his client plunked down the due diligence fees.
“Things started to go sideways when the investor purportedly booked a trip to the US to meet the team but canceled last minute because ‘his pregnant wife got in a car accident,'” the broker explained. “After that, he was radio silent until the contract expired.”
The broker said he grew suspicious when he learned that the Equity-Invest domain name was less than six months old. The broker’s suspicions were confirmed after he discovered the due diligence company that Equity-Invest insisted on using — Diligere[.]co.uk — included an email address on its homepage for another entity called Ardelis Solutions.
A corporate entity in the UK called Ardelis Solutions was key to showing the connection to Davies’ former scam investment and due diligence firms in the Codes2You investigation published earlier this year.
Although Diligere’s website claims the due diligence firm has “13 years of experiance” [sic], its domain name was only registered in April 2023. What’s more, virtually all of the vapid corporate-speak published on Diligere’s homepage is identical to text on the now-defunct InsideKnowledge[.]ch — the fake due diligence firm secretly owned for many years by The Private Office of John Bernard (John Clifton Davies).
A snippet of text from the now-defunct website of the fake Swiss investor John Bernard, in real life John Clifton Davies.
“Our steadfast conviction and energy for results is what makes us stand out,” both sites state. “We care for our clients’ and their businesses, we share their ambitions and align our goals to complement their objectives. Our clients know we’re in this together. We work in close partnership with our clients to deliver palpable results regardless of geography, complexity or controversy.”
The copy on Diligere’s homepage is identical to that once on Insideknowledge[.]com, a phony due diligence company run by John Clifton Davies.
I've been teaching my 13-year old son Ari how to code since I first got him started on Scratch many years ago, and gradually progressed through to the current day where he's getting into Python in Visual Studio Code. As I was writing the new domain search API for Have I Been Pwned (HIBP) over the course of this year, I was trying to explain to him how powerful APIs are:
Think of HIBP as one website that does pretty much one thing; you load it in your browser and search through data breaches which then display on the screen. But when you have an API, it's no longer just locked into your browser, it's in all sorts of other systems. Mobile apps, other websites, dashboards and if you really want, you can even integrate the lights in your room with HIBP! Why? How? Well, there's a Home Assistant integration for HIBP and being pwned in a new breach could raise an event there you can then use YAML to perform an action with, for example flashing a light red. That might be weird and unnecessary, but when you have an API, suddenly all these things you never thought of are possible.
It took Brett Adams less than a day after we released the new domain search API last Monday for him to reach out to me with one of those ideas. He wanted to build a Splunk app (Brett is a Splunk MVP so this was right up his alley) to surface breached data about an organisation's domains right into the place where so many security engineers spend their days. He just wanted 2 new APIs to make the user experience the best it could be:
That seems so ridiculously obvious, why didn't I think of that originally?! But hey, easy fix, so the next day Brett had his APIs. And today, you also have the APIs because they're now all publicly documented and ready for you to consume. You also have Brett's Splunk app and because he's published it to Splunkbase, you can go and pull it into your own Splunk instance, plug in your HIBP API key and it's job done!
I'll leave you with a bunch of screen caps from Brett's work, starting with a zoomed in grab of what I suspect folks will find the most valuable - the addresses on their domains and their appearances across breaches:
That's a fragment of the broader dashboard that also breaks down the incidents over time:
The starting point for this is simply plugging your API key into the interface:
I like these headline figures and I picture particularly large organisations that have gone through various acquisitions of different brands with various domains finding this really useful:
And speaking of breaches, there's a lot of them which Brett has visualised across the course of time:
So that's it, you can see all the APIs documented on the HIBP website and you can grab Brett's app right now from Splunkbase. You can also find all the code for this in Brett's GitHub repo should you wish to have a read through it.
The HIBP APIs are there for other people to build awesome things. If you're one of those people, please get in touch with me and show me what you've created, I can't wait to see more integrations like Brett's 😊