World

Huge data leak of 1 billion records exposes China’s vast surveillance state – TechCrunch

Huge data leak of 1 billion records exposes China's vast surveillance state - TechCrunch
Written by admin

A huge store of data containing information about about a billion Chinese residents could be one of the largest personal data breaches in history.

Portions of the leaked data surfaced last week on a well-known cybercrime forum by someone who sold the cache for 10 bitcoins, or about $200,000, and was allegedly siphoned from a Shanghai police database stored on Alibaba’s cloud.

Although details of the injury remain sparse, portions of the data have been removed verified as real, suggesting that at least some of the data is genuine. Where the data came from and how it got into the hands of an underground trafficker whose motives are unknown is still unclear.

In mainland China, where speech and expression restrictions are tightly controlled and internet access is censored and severely limited, news of the alleged breach has remained largely hidden.

The breach, if authentic, raises questions about the sheer scale of China’s surveillance state, the largest and most expansive in the world, and Beijing’s ability to keep that data safe.

Here’s what we’ve learned so far.

How was the data leaked?

In a now-deleted Cybercrime forum post, the seller claimed to have downloaded the data from a cloud storage server hosted by Alibaba, the cloud computing arm of the Chinese e-commerce giant. When reached by TechCrunch on Monday, Alibaba said it was looking into the claims.

Exactly how the data was leaked is unclear, but experts say the database may have been misconfigured and exposed through human error since April 2021 before it was discovered. This seems impossible a claim that the database credentials were inadvertently published as part of a technical blog post on a Chinese developer site in 2020 and later used to siphon off the billions of records from the police database, as access did not require passwords.

Bob Diachenko, a Ukrainian security researcher, told TechCrunch that his own surveillance footage showed the database was also exposed in late April via a Kibana dashboard, web-based software for visualizing and searching huge Elasticsearch databases. If the database didn’t require a password, as assumed, anyone could have accessed the data if they knew the web address.

Security researchers often scan the Internet for accidentally exposed databases or other sensitive data, often to collect bounty offered by the companies they help secure. But threat actors are also performing the same scans, often with the aim of copying data from an exposed database, deleting it, and offering to return the data in exchange for a ransom—and increasingly so common tactic used by criminal dumpster divers in recent years. Diachenko said that happened on that occasion; A malicious actor found, looted, and wiped the exposed database, leaving a ransom note demanding 10 bitcoins for its return.

“My hypothesis here is that the ransom note didn’t work and the perpetrator chose to get money elsewhere. Or another malicious actor came across the data and decided to put it up for sale,” Diachenko said.

Little is known about the seller or why the data was placed online. It’s not uncommon for large amounts of personal data to be up for sale on cybercrime forums and the dark web, but rarely for such sensitive data or amounts.

How does the data look?

TechCrunch reviewed a larger sample of data uploaded by the seller, which contained three files totaling about 500 megabytes in size, each containing 250,000 individual records.

The data itself is formatted in JSON, a standard file format for Elasticsearch databases, making it easy to read and analyze. The database’s format suggests that it was meticulously maintained and downloaded, rather than created by purely gathering information from multiple data sources, a common technique used by information vendors and data brokers. However, some data may come from external sources, e.g. B. from delivery orders for groceries.

What also likely makes the data real is the sheer size of the data and that level of detail would be difficult – if not impossible – to fake.

TechCrunch translated the police files, which were written in Chinese, and redacted personal data.

The files appear to contain detailed police reports from 1995 to 2019, including names, addresses, phone numbers, ID numbers, gender, and why the police were called. The records viewed by TechCrunch include detailed coordinates of where incidents occurred or police reports were made – and the names of the whistleblowers who made the reports – which match the exact addresses also listed in each record, as well as the race and ethnicity of the people. (The Chinese government has imprisoned more than a million native citizens, mostly from Muslim ethnic minorities, including Uyghurs and Kazakhs, that the Biden government has declared “genocide”.)

The records contain complaints and criminal allegations, ranging from serious violent crimes to the relatively mundane, such as detailed accounts of credit card fraud, internet fraud and gambling, which is illegal in China. Several records viewed by TechCrunch show police reports cracking down on the use of VPNs, or virtual private networks Access websites blocked by China’s censorship system and as such banned in China. A recording showed that a Shanghai resident was accused of using a VPN to post critical remarks about the government on Twitter, which is banned in China. What happened to the person after that is not known.

The data also included full web addresses to photos stored on the same server, none of which were accessible at the time of writing, but the associated data often indicates what was uploaded, such as: B. a person’s residence documents or their passport upon departure. These web addresses are formatted to match the way Alibaba’s cloud service stores files.

Many of the records we examined appeared to contain information about children based on their dates of birth and ages listed in the data.

Without the (unlikely) confirmation of the Chinese government, it is difficult to determine with certainty whether the seller’s claims are genuine and whether the data is from the Shanghai Police Department as claimed. That The Wall Street Journal, The New York Times and CNN have verified parts of the data by calling people whose information was found in the database, giving weight to its authenticity.

What is the impact?

This alleged breach, if carried out lawfully, could cause major damage to Beijing and raises questions about the government’s cybersecurity measures and the impact the breach will have on individuals.

It comes at a time when China is strengthening personal data protection. Last September, China passed the law on the protection of personal data, its first comprehensive data protection legislation, widely regarded as China’s equivalent to Europe’s data protection rules of the GDPR. The law restricts how companies can and should have collected personal data a far-reaching effect on the advertising business of the country’s biggest tech giants, but allows sweeping exemptions for government agencies and departments that make up China’s vast surveillance capabilities.

Beijing is already reported Censoring messages about the alleged breach, and Chinese messaging apps WeChat and Weibo are blocking messages and mentions such as “data leak” and “database breach.” The Chinese government has not yet commented on the violation.

It’s not the first security breach that has exposed a massive set of passwordless Chinese residents’ data to the broader internet. In 2019, TechCrunch reported that a smart city installation was taking place in China Spilling the contents of a facial recognition database the local resident.


You can contact this reporter on Signal and WhatsApp at +1 646-755-8849 or email zack.whittaker@techcrunch.com.

About the author

admin

Leave a Comment