Organizations depend on personal data. However, some companies seem to have missed the memo that they need to protect this data. TikTok’s two recent data breaches – likely in connection to data scraping – may have exposed severe flaws within the organization’s security, which would pose severe ramifications for the app’s multi-billion-strong user base. But what is data scraping anyway? And how will this breach evolve into further attacks?
The Single Click Exploit
TikTok has dominated the consumer app landscape since its launch in 2016. Considered the most downloaded app throughout 2021, TikTok has been installed a grand total of 3 billion times.
However, recent research by Microsoft bug hunters found that TikTok’s Android app was recently shipped with a major flaw. The Android app alone has been installed at least 1.5 billion times, representing a solid chunk of the app’s user base.
The attack itself was a string of vulnerabilities and would have allowed wily attackers to access any account they chose to, completely bypassing any authentication requirements. With a payload that could be condensed into a single click, a victim would only have to click a certain loaded URL.
Once clicked, the vulnerable application would load the app’s WebView, which then allowed access to its JavaScript bridges. Once leveraged, attackers would be granted functionality, allowing them to alter the victim’s TikTok profile in a variety of ways. This includes publicizing private videos, sending messages, and uploading videos.
TikTok has already been battling a background of deep suspicion. TikTok’s Chief of Security recently had to step down, following allegations around TikTok employees being able to access American user data. In an attempt to placate the suspicious, TikTok also began the process of moving its data over to Oracle Cloud.
Though no instances of in-the-wild exploitation were discovered, the public bug notice from Microsoft completed the stage for the following rumors of one of the most significant data breaches this year.
The New Data Breach
Breach Forums provides a censorship-free, patchwork quilt of breaches, hacking, and vulnerability information. On September 3rd, user AgainstTheWest posted what he claimed was a screenshot of a joint TikTok and WeChat breach. AgainstTheWest was still deciding whether to sell the data or release it to the public – either of these options would create significant legal and social issues for the social media giant ByteDance.
Alongside the screenshots, they included a few other attachments to the post: a video of a set of database tables, and a link to two samples of user data. AgainstTheWest claimed to have extracted a total of 2 billion records.
On Twitter, the potential attacker went on to give an idea on how he apparently attained so many records. Insulting the company’s Cloud implementation, and calling the password protecting this database ‘trashy’, points toward a potential credential theft perhaps via phishing, or even brute-force methods.
However, it’s easy to cause a stir by simply stating you’ve hacked a major international corporation. Far more difficult is the task of actually committing these cybercrimes. Cybersecurity expert Troy Hunt – creator of haveibeenpwned data breach tracker – chimed into the online discourse. By analyzing the data included in the various screenshots and videos, he noted that the probability of a real breach is pretty inconclusive.
Some data released did match production code; some data was complete junk, though he admitted it could be test data or simply non-production code. He concluded that it’s a pretty ‘mixed bag’.
The most likely source of these leaked files is the process of data scraping TikTok’s publicly-accessible servers. This technically misses the definition of a data breach – scraping data from public servers requires no illicit breaking and entering. It can still, however, be highly immoral.
Data scraping is the industrial-scale process of importing data into separate files or spreadsheets, from a certain site or server. By utilizing a scraping operator, the data is shifted en masse, almost completely automatically. This can be for personal use, commercial applications, or nefarious ends.
The Security Risks of Data Scraping
Data scraping is not inherently illegal. Industry-standard recognized tech giants such as Amazon AWS offer APIs that lend customers the ability to securely scrape web pages for content. Much like any software, data scraping only becomes dangerous when the tools fall into the wrong hands.
For instance, data scraping that focuses on phone numbers and email addresses is highly immoral, and actively forbidden in the UK and Europe via the General Data Protection Regulation (GDPR) act. If cybercriminals get hold of this data, it becomes a goldmine for phishing and other forms of fraud.
Whilst scraping may technically be less intrusive than breaking into an account, the long-term ramifications can be just as ugly.
This year’s large-scale LinkedIn data breach/scrape incident fed the databases of numerous phishing organizations. Holding the information that allows attackers to conduct spear-phishing attacks on C-suite executives is uniquely powerful.
Unfortunately for TikTok, 25% of their active users are aged between 10 and 19 years. This age range is far more susceptible to phishing attacks, thanks to their still-developing understanding of secure browsing practices. Not only does data scraping allow attackers direct contact with underage users, but the rest of the billions-strong user base is still at risk.
Data scraping can open the door to far more convincing phishing attacks, as hackers can learn the names of online friends, ongoing hobbies and passions, and more. Everything a malicious actor could need to know, in order to craft a highly-convincing message, is accessible via data scraping.
Data scraping may not represent a breach, but it certainly infers a company is not responsible with its user’s data. One potent way to protect your sites from data scraping is to limit user requests. The rate of interaction as human visitors click through your site is pretty predictable.
It’s physically impossible for humans to browse at 100 web pages per second. Automated scraping tools, however, can handle this with ease. By limiting the number of requests that each IP address makes per second can have a drastic impact on the exploitation of your website. Organizations owe it to their customers to keep personal data private.