DarkBERT – LLM for Dark Side of Internet
A team of South Korean researchers has indulged in developing and training an AI on the Dark Web called DarkBERT.
It was unleashed to trawl and index what it could find to help shed light on ways to combat cybercrime.
The Dark Web is notorious for its anonymous websites and marketplaces that facilitate illegal activities, such as drug and weapon trading, stolen data sales, and a haven for cybercriminals.
The Dark Web employs complex systems that mask the IP address of its users, making it difficult to trace the websites they have visited. Accessing this web section requires specialized software, the most popular of which is Tor, which is used by approximately 2.5 million individuals every day.
With the rise of natural language processing programs like ChatGPT, such technology is increasingly used as a new kind of cybercrime. By developing an AI that can fight fire with fire, the researchers wanted to discover how large language models (LLM) could help.
Reseaechers noted that their LLM was far better at making sense of the dark web than other models that were trained to complete similar tasks, including RoBERTa, which Facebook researchers designed back in 2019.
DarkBERT has the potential to be employed for diverse cybersecurity purposes, including identifying websites that vend ransomware or release confidential data. Additionally, it can scour through the numerous dark web forums updated daily and keep an eye on any illegal information exchange.
The researchers have published a paper titled “DarkBERT: A Language Model for the Dark Side of the Internet” on their findings. They connected their model to the Tor network and collected raw data to create a database. However, the paper has yet to be peer-reviewed.
The preprint, which is a preliminary version of a study that has not yet been peer-reviewed, can be found on the arXiv.