
Microsoft has accidentally made 38 Tb of internal data, including passwords, publicly accessible through a GitHub repository.
Researchers discovered the issue on June 22 and reported it to Microsoft shortly thereafter and Microsoft fixed the issue on June 24.
The data leak affected a GitHub repository that Microsoft’s AI research group uses to host open-source projects. The repository contains image recognition models and training datasets that can be used to build new neural networks. The information leak was caused by one of the training data files hosted in an Azure Storage account.
Microsoft had meant to share publicly only an AI training dataset but accidentally opened access to the entire Azure Storage account that contained the dataset.
The misconfigured account exposed 38 terabytes’ worth of internal Microsoft files. Among those files were backups of two employee workstations that contained over 30,000 internal Microsoft Teams messages from 359 staffers along with passwords, encryption keys, and other sensitive files.
The AI training dataset could have potentially enabled hackers to not only steal internal Microsoft files but also launch cyberattacks against users of the GitHub repository through which the dataset was made accessible.
The latter issue was caused by two separate security weaknesses.
The first issue is in the Azure Storage account that hosted the AI training dataset. Threat actors could only download the 38 terabytes of data in the account but also change or delete existing files.
The second issue that could have made cyberattacks possible has to do with the AI training dataset itself. Microsoft packaged the dataset into a file format called ckpt using an open-source tool known as pickle. This tool is susceptible to arbitrary code execution, meaning hackers can upload malicious code.
Had the Azure Storage account not allowed users to change files it contained, the arbitrary code execution vulnerability would have been impossible to exploit. But because file changes weren’t blocked, it was theoretically possible for hackers to launch cyberattacks before the issue was fixed.
This research was documented by researchers from Wiz.