When children are breached—inside the massive VTech hack

Troy Hunt is the founder of haveibeenpwned.com. This post originally appeared on his website and is reprinted with his permission.

Breach source, verification, and (attempted) disclosure

Let me set some context first because this is clearly a very serious incident, and it all began when I was contacted by Lorenzo Bicchierai earlier this week. Lorenzo writes for Motherboard and has often approached me for comments on security incidents in the past. This time, he wanted some help verifying a data breach that had allegedly come from VTech and contained millions of customer records. Someone had gotten in touch with him (I assume as they thought it might make a good story) and he was doing his journo due diligence thing.

Lorenzo passed on the data and I checked it out. I found 4.8 million unique customer e-mail addresses in one of the files and it “smelled” good, that is it didn’t have the typical hallmarks that often accompany a fabricated breach. However it wasn’t quite clear where the data had come from, I mean it’s not like you can just go to vtech.com and there’s a login box that tells you whether or not the account exists (incidentally, I did later discover an API that confirms the presence of an e-mail address at login time). I needed further verification, so I invoked the help of some Have I been pwned? (HIBP) subscribers.

In order to verify the data via HIBP, I had to call on some supporters. One of the features I added to HIBP very early on was the ability to subscribe to notifications:

This is a free service that sends you an e-mail if your account pops up in a data breach. To date, 290,000 people have signed up and verified their e-mail addresses (they need to receive an e-mail at that address and click a unique link). Now I’d always intended for this to be a feature that simply notifies people of breaches as appropriate, but I’ve realized lately that it also means I have an excellent source of individuals supporting the project who can help me verify future data breaches as well.

I took the e-mail addresses from the alleged VTech breach and found 18 recent HIBP subscribers who had a comprehensive set of data in the dump. I e-mailed them asking for support, essentially saying that I’d been passed a data breach that included their details and if they were willing to assist, I’d send them some non-sensitive data attributes to verify. This was usually their month of birth, the city they live in and the name of their ISP based on their IP address. All of these attributes were in the data breach and if the HIBP subscriber could confirm them and acknowledge they had a VTech account, I’d be confident it was legitimate.

I received six responses within 24 hours, every one of them confirming their data:

Yes. That's accurate. I did register at vtech so I could download addons for a toy laptop.

Yes. That's my data.

no doubt about that, I registered a vtech account within the last few months.

Yes, That looks like legitimate information. The service would be VTech’s Learning Lodge.

Yes, that looks like me. I lived near [redacted city] at the time and my daughter had one of their pads. I believe we logged in so that we could download apps from their app store and possibly for firmware updates etc.

Yes that is correct. It's an old address, I was with [redacted ISP] at the time so can verify this info ! I would have used the VTECH website for my daughter around that time too !

Yes I did access the VTech learning lodge in 2014 after purchasing a "Cora Cub" for my child. In order to personalize it's voice activated feature, you had to join the learning lodge.

I was with the broadband provider [redacted ISP] at that time. I have since changed services, unfortunately to TALKTALK!

Can’t help but feel sorry for the last person!

This was more than enough to now have complete confidence in the legitimacy of the data. But before loading it into HIBP, it was essential that VTech be aware of the incident, too, so I pushed Lorenzo on what steps he’d taken. He’s detailed his attempts to get in touch with VTech in the article he published Friday titled "One of the Largest Hacks Yet Exposes Data on Hundreds of Thousands of Kids."

For many days, he simply couldn’t get anyone to talk with him despite the fact they did actually respond (and redirect him) multiple times. As we discussed this incident in the days following his initial contact, at multiple points we talked about means of getting in touch with them and he reached out via various channels time and time again.

Understanding the data breach

Here’s what was originally provided to Lorenzo:

The file that immediately jumps out is the big guy at the top—parent.csv. This file has 4,862,625 rows and column headings as follows:

id email encrypted_password first_name last_name password_hint secret_question secret_answer email_promotion active first_login last_login login_count free_order_count pay_order_count client_ip client_location registration_url country address city state zip updated_datetime

One of the first things I look at in a breach like this is how many unique e-mail addresses there are, as it helps establish whether there are duplicate records. In this case there were 4,833,678 occurrences matching the pattern I extract e-mail addresses on, a few less than the total rows, which is normal due to either duplicates, missing addresses or strings that don’t conform to what an e-mail address looks like (I have a pretty liberal regular expression pattern I use).

The next thing I checked was the passwords, and, while the column heading implies they’re encrypted, they’re not. The easiest way to check what’s going on with password storage is just to Google a few of the values stored in the database. For example, let’s take the very first one in the dump: 835af17f41292ba8ea3270f6859757ab

And here it is:

Their password is “welcome81.” It’s that simple. It’s just a straight MD5 hash, not even an attempt at salting or using a decent hashing algorithm. The vast majority of these passwords would be cracked in next to no time; it’s about the next worst thing you do next to no cryptographic protection at all. Speaking of which…

All secret questions and answers are in plain text. The questions are typical (albeit poor) examples such as your favorite color, where you were born, and your first school. In fact, you can see them in context in this screen from a video I’ll show a bit later:

This aligns with the columns from the parent.csv file I referenced earlier and gives me a high degree of confidence it’s at least one of the locations where parents would have entered data.

Normally this would be the end of the story when it comes to processing a data breach. I’d make the data searchable on HIBP, notify impacted subscribers, and that would be it. But it’s a different story this time and it’s because of those member CSV files. Let’s take a closer look.

Biz & IT —

When children are breached—inside the massive VTech hack

4.8 million records from a Hong Kong toy company were compromised.

Further Reading

Breach source, verification, and (attempted) disclosure

Further Reading

Understanding the data breach

Channel Ars Technica

Further Reading

Breach source, verification, and (attempted) disclosure

Further Reading

Understanding the data breach

reader comments

Channel Ars Technica