Remember when the Microsoft Chatbot “Microsoft Tay” turned Racist? It was programmed to engage in casual conversation on Twitter, and train on that engagement data. Shortly after its release, attackers fed offensive tweets into Tay’s algorithm and maliciously trained it to talk that way.

Also Between 2017 and 2018 Attackers sent millions of specially crafted emails that threw off Google’s spam classifier and changed its definition of a spam email. This allowed attackers to send malicious emails without being detected by Google’s spam filter...more on this later.

For pentesters like us, data poisoning is of particular interest because we could corrupt and manipulate models to disrupt the availability of the AI/ML product under conditions of our choosing or get the AL/ML to perform a specific action of our choosing under conditions of our choosing. The latter can be very stealthy since the model will perform normally unless our conditions are met. 

Good examples of this is getting IDS/IDPS, DDoS detectors, and other network security functions to recognize our attacks as benign, after all these security functions are trained on what is considered normal for each organization's network, which can vary a lot, and Artificial Intelligence is increasingly used for security. Another concerning examples are getting a next gen antivirus to recognize our malware as benign.

There's a reason Data Poisoning is in the OWASP AI Top 10: LLM03: Training Data Poisoning

How Data Poisoning Works?

Data poisoning works by injecting small amounts of maliciously crafted data into the data sets that are used to train the models; this can be anything anything from false numbers, to false qualitative info, to false labels.

The maliciously crafted "poison data", ironically, is generated with Artificial Intelligence. So we don’t need to be a statistician to make this work.

The Data Poisoning Attack Vectors: 

Direct Interaction:

This is by far the most likely vector. And it has been done before.

Remember when users turned Microsoft Chatbot “Microsoft Tay” Racist? It was programmed to engage in casual conversation on Twitter, and learn from the engagement; basically users were training the model. In an unsophisticated attack, attackers fed offensive tweets into Tay’s algorithm by engaging with the chatbot, and trained it to talk that way. Microsoft had to pull the product and issue an apology.

An attack was successfully carried out against Google’s spam filter. Elie Bursztein, who leads Google's Cybersecurity & Anti-Abuse Research, posted in a blog that there were at least 4 large-scale data poisoning attacks on Gmail’s spam filter between 2017 and 2018. Attackers sent millions of specially crafted emails designed to throw off the classifier and change its definition of a spam email. This allowed attackers to send malicious emails without being detected.

The Live Internet:

We could place poison data in places you’re likely to source data. Depending on the application, this could be feasible. 

Data Supply Chain:

This could be done by placing poisoned data in Public Data Sets or Public Repositories that data scientists pull from to train models. Supply chain attacks against data companies could be used to inject poisoned data too.

Pre-trained models: For neural networks many developers pull open source pretrained models from public repositories and then fine tune them for their specific use case. These models are often stacked on top of one another to create neural networks. This is called a transfer learning attack. Experiments have been successful against self driving cars, antivirus, etc.

Some tasks require techniques that take weeks of computation time on high end GPUs so some organizations choose to outsource the training. This opens up another supply chain vector.


Sensor devices are often placed remotely throughout the field to measure various values and send measurements that are often then fed into data sets that are used to train Models. 

An attack is relatively easy since many of these IoT sensors are very small, low power, low compute, low memory, and low storage devices that may not have encryption and/or authentication capabilities. GPS signals, and GPS atomic clock (a NTP like protocol) lack this too. So we could spoof a sensor's signal to send false data. This does require physical access, so it would likely be highly targeted think critical infrastructure (which relies on all of the above).

IMO: Right now if you’re not a BIG target…think government or huge company, the only attack vector that seems likely is direct user interaction or maybe live data.

How effective are data poisoning attacks?

Data poisoning attacks have been proven effective against both supervised and unsupervised learning. Researchers have established proof of concept from poisoning models produced by just about every type of learning algorithm.
They've established the proof by performing experiments that use a relatively small amount of malicious data that cause a relatively large degradation in the various model’s performance or success of an individual behavior. 

SPAM Filters and Antivirus:

Naïve Bayes Classifier Algorithms used in signature based antivirus and spam filters are highly susceptible to data poisoning attacks.  

In Random Forest classification models often used for SPAM Filters and Antivirus, experiments showed that malware classification accuracy could be reduced by more than 30% with less than 5% of the training data set consisting of injected malicious data.

Image Recognition: 

Researchers at the University of Maryland showed that poisoning a 10th of a percent of the training data results in a roughly 50% success rate at getting their target image to be incorrectly classified.^These were binary classifiers, like its a dog or a bird, frog or an airplane.

Self Driving Cars:

Researchers from IEEE successfully attacked a neural network used for self-driving cars. They added a small yellow square to traffic signs in the training data. They were able to get the algorithm to work as intended, except when it saw a stop sign with a specific marking like a square yellow sticker. It would identify that stop sign with the yellow sticker as a speed limit sign about 90% of the time. Yikes.

Proven success with Black Box Attacks:

We can perform this attack over time; we could put poisons into multiple training data sets, then measure the results of the poison, and then back into a conclusion about the algorithm being used. A sort of active recon.

Researchers at the University of Maryland used this method to construct a successful attack against a classifier to manipulate a model to confuse an image of a bird as a dog. This is effective for any model being trained by interaction.

 An Attack on Google Auto Machine Learning

Those same UMD researchers were then able to replicate that attack on Google Auto Machine Learning. This black box attack poisoned a fifth of a percent (.2%) of the training data, in a way undetectable to the human eye. They added bird-like features to an image of a dog. The attack changed the Google Auto Machine Learning’s results from 82% accuracy to classifying the image of a bird as a dog 69% of the time.

How to Protect from Data Poisoning:

It's much easier for data poisoning attacks to be successful if attackers have knowledge of the algorithm and training data set. So basic security and cyber hygiene are important here; secure your development environment. This is far too often neglected.

Validate the training data: If the models are being trained on data from the live internet, implement some sort of filtering and validation mechanisms. Vet your data supply chain if you’re sourcing data elsewhere, validate the integrity and trustworthiness of pre-developed models. 

Test the Security your AI with Penetration Testing

We'll test your AI application against the OWASP LLM Top Ten Vulnerabilities.

Blog Author Image