Detecting spam users automatically with a neural network
Spam Stopper

© Lead Image © Kirsty Pargeter, 123RF.com
Build a neural network that uncovers spam websites.
Website builders – online hosting services that provide tools for non-technical users to build their own websites – are frequently exploited by spammers looking for a convenient launching pad. Checking thousands, or sometimes millions, of web pages manually to look for evidence of a spammer is both tedious and inefficient.
In this article, I show how to build a suitable spam-searching neural network with help from Google's TensorFlow machine learning library [2] [3] and TFLearn [4], a library with a high-level API for TensorFlow. Even if you don't spend your days searching for spammers, the techniques described in this article will give you some insights on how to harness the power of neural networks for other complex problems.
Training Day
The neural network needs both positive and negative samples in order to learn. This solution starts with a manually compiled list of sample users divided into spammers and legitimate users, taking care to distribute both types in equal numbers. Alongside this classification (spammer or not spammer), the data set contained the user's name or the website that belongs to the user, the IP address with which the site is registered, and the language version associated with the site.
As a result of the solution described here, the neural network now automatically recognizes new spammers as they register. The next step is to combine this automatic check with a manual check. A Python script automatically blocks sites that the network classifies with a very high probability of being spam, and an employee manually checks the sites that are deemed high probability.
Sound Network
Neural networks are mathematical models that can approximate any function. A neural network is guided by networked neurons similar to those in the human brain, such as in the visual cortex. What makes these networks special is that you do not have to model their behavior explicitly; instead, you train the network using sample data.
Neural networks help out when it is difficult to model functions manually, and they are often used in image and speech recognition. You need to provide the neural network with training data that has already been classified, and it will then attempt to classify new data in a similar way.
A single artificial neuron comprises several weighted inputs and an activation function, which is usually non-linear and helps to determine the output value of the neuron. There is also a threshold value or bias, which complements the weighted inputs, thus influencing the activation function. The mathematical formula behind this concept is as follows:
The formula uses the w
vector to weight the input vector x
and calculate the sum of both. It then adds the bias b
, using the activation function phi. When developers skillfully combine several neurons, they can compute more complex functions (see the box titled "Solving Problems with Neural Networks").
Solving Problems with Neural Networks
A single neuron can already solve linearly separable problems. The binary OR function is an example of such a linearly separable problem. If you enter the possible inputs in a coordinate system, the two output values can be separated with a straight line (the top right neuron in Figure 1).
Few problems, however, are so easy to solve. A single neuron is typically not enough to make a classification. Full networks composed of neurons are used in practice, because more complex challenges can largely be split into separable sub-problems, which individual neurons can then solve.
Figure 1 shows the binary exclusive OR, which proves not to be linearly separable. A single line is not sufficient to separate the two ones from the zeros. As the propositional logic is aware, the XOR function consists of a combination of two conjunctions:
Both the two conjunctions and the disjunction can in turn be separated linearly. It is therefore possible to model the binary exclusive OR with three neurons, with one of them receiving the outputs of the two others. This combination of neurons forms a small, two-layered neural network.
As the small example demonstrates, deep learning experts can calculate complex functions with ease by combining multiple neurons. The strength of a neural network increases with the number of layers used. The layers allow experts to compute more functions.

Networked Learning
Several layers of interconnected neurons form a neural network (Figure 2). These layers consist of at least an input layer, which receives the input values, and an output layer, on which the data arrives after passing through several hidden layers. All the neurons on a particular layer generally use the same activation function.
Neural networks learn through an optimization process that determines the parameters of the network, the weightings of the connections, and the bias of all the neurons, then refines these values step by step. The process that determines parameters is one of the optimization problems. This process involves the use of many traditional numerical analysis methods (e.g., the gradient method [5]).
The script first initializes the network using random parameters. Next, the script applies the training data set to the neural network and determines the difference between the network's results and the correct results from the training data. The gap between these results is the loss, which the script attempts to minimize in the course of the optimization process.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
First Release Candidate for Linux Kernel 6.14 Now Available
Linus Torvalds has officially released the first release candidate for kernel 6.14 and it includes over 500,000 lines of modified code, making for a small release.
-
System76 Refreshes Meerkat Mini PC
If you're looking for a small form factor PC powered by Linux, System76 has exactly what you need in the Meerkat mini PC.
-
Gnome 48 Alpha Ready for Testing
The latest Gnome desktop alpha is now available with plenty of new features and improvements.
-
Wine 10 Includes Plenty to Excite Users
With its latest release, Wine has the usual crop of bug fixes and improvements, along with some exciting new features.
-
Linux Kernel 6.13 Offers Improvements for AMD/Apple Users
The latest Linux kernel is now available, and it includes plenty of improvements, especially for those who use AMD or Apple-based systems.
-
Gnome 48 Debuts New Audio Player
To date, the audio player found within the Gnome desktop has been meh at best, but with the upcoming release that all changes.
-
Plasma 6.3 Ready for Public Beta Testing
Plasma 6.3 will ship with KDE Gear 24.12.1 and KDE Frameworks 6.10, along with some new and exciting features.
-
Budgie 10.10 Scheduled for Q1 2025 with a Surprising Desktop Update
If Budgie is your desktop environment of choice, 2025 is going to be a great year for you.
-
Firefox 134 Offers Improvements for Linux Version
Fans of Linux and Firefox rejoice, as there's a new version available that includes some handy updates.
-
Serpent OS Arrives with a New Alpha Release
After months of silence, Ikey Doherty has released a new alpha for his Serpent OS.