
The Download That Took 18 Hours to Become a Crisis
On May 7, 2026, a model called Open-OSS/privacy-filter appeared on Hugging Face.
Within 18 hours it was the #1 trending model on the platform. 244,000 downloads. 667 likes. The model card was copied word-for-word from OpenAI’s real privacy-filter release — same wording, same structure, same confident tone.
Buried inside was a file called loader.py. It quietly fetched PowerShell commands from a remote server and ran an infostealer on every Windows machine that loaded it.
No phishing email. No suspicious link. Just a model file, downloaded the same way every ML engineer downloads a model file — and trusted for the same reason everyone trusts a trending repo with a clean-looking model card: because it looked exactly like the thousands of legitimate ones they download every month.
That is the entire attack. That is also the entire problem with how the AI industry currently decides what to trust.
Why This Feels Familiar
If you’ve followed TheCyberThrone’s coverage of Mini Shai-Hulud, the Bitwarden CLI hijack, or the Axios npm compromise — this story will sound familiar, because it’s the same attacker playbook, just pointed at a new target.
In March 2026, the LiteLLM package on PyPI was compromised, exposing roughly 500,000 credentials — API keys for Meta, OpenAI, and Anthropic among them. Meta froze parts of its AI data work because training secrets were at risk.
In April, Bitwarden’s CLI package on npm was hijacked for 90 minutes with a payload built specifically to steal credentials from AI coding tools — Claude Code, Cursor, Codex CLI, Aider. A few days later, PyTorch Lightning was compromised for 42 minutes with a similar credential-stealing payload, part of the same “Mini Shai-Hulud” campaign.
Same attacker logic each time: find the place where AI developers download things without thinking twice, and put something malicious there.
npm went through this exact pattern for over a decade. AI model repositories are going through it now — except at a much faster pace, with a much bigger blast radius, because a poisoned model doesn’t just run code on your machine. It can run wrong forever, quietly, inside whatever you build with it.
Why a Model File Is Not “Just Data”
Here’s the part most people outside ML engineering don’t realize: most AI model weights are stored using Python’s pickle format. And pickle files don’t just hold data — loading one can execute arbitrary code.
That’s not a bug. It’s how pickle has always worked. It means a model weights file and a malicious script can be, technically, the exact same kind of file.
JFrog’s security team found about 100 malicious models on Hugging Face back in 2024 with code execution payloads baked in — some of them opening reverse shells straight back to an attacker’s server the moment someone loaded the model.
By 2025, scans across Hugging Face’s catalog found over 3,300 models — out of 400,000+ — carrying payloads capable of running unauthorized operations. By April 2025, a broader scan of 4.47 million model versions on the platform found 352,000 unsafe or suspicious files across 51,700 models.
The platform isn’t ignoring this. Hugging Face scans for known-malicious pickle patterns and supports model signing. But adoption is the gap: roughly 80% of organizations using third-party models don’t verify signatures or scan for malicious code before using them. The seatbelt exists. Almost nobody is wearing it.
Five Ways Attackers Are Actually Doing This
1. Hiding code inside the weights file itself
The classic move — embed malicious code in the pickle file, it runs the moment someone loads the model.
2. Outsmarting the scanners
Researchers found malicious models in early 2025 using a trick called “nullifAI” — specifically engineered to slip past Hugging Face’s own scanning tool. Hugging Face patched it within 24 hours of discovery, but it shows attackers are now building for detection evasion as a first-class requirement, not an afterthought.
3. Pretending to be someone trustworthy
Fake organizations have been created on Hugging Face that mimic the naming style of major AI labs — close enough that someone in a hurry, copy-pasting a model ID, doesn’t notice the difference. The May 2026 OpenAI-impersonation case is the highest-profile version of this so far.
4. Hiding in the “lightweight” files everyone trusts more
LoRA adapters — small fine-tuning files, often just tens of megabytes — are everywhere now. Because they’re small, people assume they’re lower risk than a full model download. They’re not. A LoRA adapter has full power to change how a model behaves once applied, and a backdoor hidden inside one is just as dangerous as a backdoor in the base model — just far less scrutinized.
5. Going after AI agents directly, not just models
This is the newest front. In February 2026, researchers found 341 malicious “skills” in ClawHub — a registry that AI agents use to plug into outside tools and services. The campaign was distributing credential-stealing malware aimed at crypto wallets and trading platforms. Agentic AI tooling has its own version of the npm problem now, and it’s brand new.
A Backdoor You Can’t Just “Read the Code” To Find
Traditional malware review involves reading code. You can audit a script and see what it does.
A backdoored model doesn’t work that way. The malicious behavior isn’t a line of code you can point to — it’s encoded across billions of numerical weights. The model behaves completely normally until a very specific trigger shows up in the input, and then it does something else entirely.
There is no human-readable version of “this model has a backdoor.” You can’t Ctrl+F your way to it. And these backdoors have been shown to survive fine-tuning and even format conversion — meaning cleaning the model up doesn’t necessarily clean out the backdoor.
This is the part that should worry security teams more than the pickle exploits. Pickle attacks are fixable with format changes. Backdoors baked into weights are a much harder, much more open-ended problem.
What This Looks Like on a Normal Friday Afternoon
Picture a data science team at a healthcare analytics company. Friday deadline. They need a privacy-preserving text classifier, fast.
Someone searches Hugging Face, finds a model trending at #1 with 200,000+ downloads, model card written in familiar, professional language. Every signal says “trustworthy.” Trending position. High download count. Clean documentation.
They run the standard three lines of loading code. The pickle file deserializes. The embedded loader script quietly pulls down a payload and installs an infostealer — on the same laptop that has VPN access to the company’s healthcare environment, cached credentials for internal systems, and SSH keys for production infrastructure.
Nothing alerts. The network traffic looks like a routine package download. The compromise just sits there, quietly harvesting credentials, until it’s time to use them.
Nobody in this story did anything careless by normal standards. They did exactly what thousands of ML engineers do every week. That’s what makes this threat category dangerous — it doesn’t require a mistake. It just requires normal behavior in an ecosystem that hasn’t built trust verification into the default workflow yet.
What Actually Helps
Stop using pickle format, period.
A safer format called SafeTensors exists and is gaining adoption — but pickle is still the default most people download without thinking. Any model that only ships as a pickle file should be treated the same way you’d treat an unsigned .exe from a stranger.
Scan before you load — every time.
Tools like Fickling exist specifically to inspect pickle files for malicious content before you ever run them. Treat a new model download exactly like a new software dependency — because that’s exactly what it is.
Sandbox the first run.
No production data. No real credentials. No network access out. Just load it, watch what it does, and only promote it to real use once it’s behaved exactly as expected.
Verify who actually published it.
Don’t trust the name in the URL. Confirm the organization through an official channel before trusting anything with their name on it.
Watch the model’s behavior over time, not just at load time.
Specialized tools can fingerprint a model’s normal activation patterns and flag it if outputs suddenly shift in a way that looks like a hidden trigger firing.
Why This Is Now a Governance Question, Not Just an Engineering One
OWASP’s official ranking of LLM risks moved “Supply Chain Vulnerabilities” up from 5th place to 3rd this year — a direct response to exactly this pattern of incidents.
Regulators are catching up too. The EU AI Act now requires supply chain documentation for high-risk AI systems. NIST’s secure development guidance now explicitly covers AI and ML components. None of this is hypothetical anymore — it’s becoming a compliance line item.
Practically, that means every model your organization uses needs an answer to three questions: where did it come from, has anyone verified it, and who owns the decision to trust it. If nobody can answer those three questions for a model currently running in your environment, that model is a liability sitting quietly inside your infrastructure — whether or not it’s actually been tampered with.
The One-Line Takeaway
A trending repo and a polished model card are popularity signals, not security signals.
The npm ecosystem took more than a decade of repeated incidents to build real supply chain discipline — and it’s still not finished. AI model repositories don’t have a decade. The pace of adoption is too fast, and the May 2026 Hugging Face incident shows exactly how fast a fake can climb to the top of the trending list before anyone notices.
Verify before you trust it. Sandbox before you deploy it. Scan before you load it.
Every defense covered earlier in this series — guardrails, red teaming, governance frameworks — assumes the model underneath them is sound. This is the one piece of the stack you can’t patch your way around after the fact.
Check where your model came from before you check what it can do.