Backdoors in Machine Learning Models


Article from Issue 268/2023

Machine learning can be maliciously manipulated – we'll show you how.

Interest in machine learning has grown incredibly quickly over the past 20 years due to major advances in speech recognition and automatic text translation. Recent developments (such as generating text and images, as well as solving mathematical problems) have shown the potential of learning systems. Because of these advances, machine learning is also increasingly used in safety-critical applications. In autonomous driving, for example, or in access systems that evaluate biometric characteristics. Machine learning is never error-free, however, and wrong decisions can sometimes lead to life-threatening situations. The limitations of machine learning are very well known and are usually taken into account when developing and integrating machine learning models. For a long time, however, less attention has been paid to what happens when someone tries to manipulate the model intentionally.

Adversarial Examples

Experts have raised the alarm about the possibility of adversarial examples [1] – specifically manipulated images that can fool even state-of-the-art image recognition systems (Figure 1). In the most dangerous case, people cannot even perceive a difference between the adversarial example and the original image from which it was computed. The model correctly identifies the original, but it fails to correctly classify the adversial example. Even the category in which you want the adversial example to be erroneously classified can be predetermined. Developments [2] in adversarial examples have shown that you can also manipulate the texture of objects in our reality such that a model misclassifies the manipulated objects – even when viewed from different directions and distances.

Data Poisoning Attacks

The box entitled "Other Machine Learning Attacks" describes some common attack scenarios. Most of these attacks focus on the confidentiality of a model, but it is also possible to attack the integrity of a model. In other words, you could design an attack that could influence the behavior of the model. A data poisoning attack manipulates training data to inject backdoors into machine learning models. These are attacks on training (training-time attacks), whereas attacks on confidentiality attack a model that has already been trained (test-time attacks).


Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Honeynet

    Security-conscious admins can use a honeynet to monitor, log, and analyze intrusion techniques.

  • Backdoors

    Backdoors give attackers unrestricted access to a zombie system. If you plan to stop the bad guys from settling in, you’ll be interested in this analysis of the tools they might use for building a private entrance.

  • Spam-Detecting Neural Network

    Build a neural network that uncovers spam websites.

  • Howdy

    Howdy brings the convenience of facial authentication to Linux.

  • Unsupervised Learning

    The most tedious part of supervised machine learning is providing sufficient supervision. However, if the samples come from a restricted sample space, unsupervised learning might be fine for the task.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Find SysAdmin Jobs