Pattern-matching tools for chasing down malicious software
Gate Search

© Lead Image © varijanta, 123RF.com
The big antivirus companies offer a myriad of malware scanning utilities, but it is often difficult to see what they are really doing or to customize them for specific needs. Beyond the giants are a class of more versatile tools that let you choose the rulesets – and even write your own rules.
The words virus and malware are often used in the same breath, although they refer to different things. Malicious software (typically shortened to "malware") covers a vast array of unwelcome software threats that fall into multiple categories. The term malware essentially refers to any code designed to cause harm. A virus, on the other hand, is a type of malware that generally becomes active when a legitimate piece of software is executed. Like a real virus, a software virus has a way of replicating, allowing it to spread to any system it is capable of infecting.
In this article, I will look at software that allows anyone familiar with Linux and a willingness to learn to create rules for detecting and classifying malware.
YARA
YARA [1] is provided by VirusTotal [2] and is released under the permissive BSD 3-Clause "New" License, which means that you can use it for commercial purposes [3]. Google acquired VirusTotal in 2012 [4]. The name YARA apparently stands for "Yet Another Ridiculous Acronym"!
The YARA documentation [5] describes YARA as a tool that can "…create descriptions of malware families (or whatever you want to describe) based on textual or binary patterns." The YARA website [6] includes an impressive list of some of the companies using the tool, including Trend Micro, Kaspersky Lab, SonicWall, ESET, and Avast. The YARA project refers to its toolset as "The pattern matching swiss knife for malware researchers (and everyone else)."
You can install YARA via the package manager on several Linux distributions, but I will focus on Debian Linux derivatives (in this case, Ubuntu 22.04 "Jammy Jellyfish" installed in a VirtualBox virtual machine). YARA will also run on Mac and Windows platforms, and if you want the very latest version, you can compile it yourself [7].
The following command gets the installation process started:
$ apt install -y yara
Two new packages are installed, taking up less than half a megabyte of disk space. In addition to the yara package, the installer sets up the libyara8 library.
The YARA documentation is detailed and easy to follow. The docs encourage you to write your first malware rule, which I will do right now:
$ echo "rule not_really_malware { condition: true }" > my_first_rule
The next step is to run YARA against this rule. The usual way to run YARA is to pass it a rule name and the name of a file you wish to scan for malware. I have copied the file /etc/issue
into a temporary directory that I will run YARA in. /etc/issue
is a file that gets called (unless you configure your system not to do so) when a user logs in, prior to the login prompt. In my case, the contents is simply:
Ubuntu 22.04.3 LTS \n \l
The \n
option displays the hostname or node name, and the \l
seems to insert the name of the current teletypewriter (TTY) interface name afterwards.
I am ready to run YARA for the first time, as shown here, with the rule name first and then the file to be scanned:
$ yara my_first_rule issue not_really_malware issue
As you can see the rule with the filename my_first_rule
, which will always report as true thanks to the condition, has declared a finding for the threat not_really_malware
in our example test file issue
.
Rules Are Rules
Each rule always starts with the word rule
and then follows the popular YAML (YAML Ain't Markup Language) file format. Rules are then normally constructed with a strings definition section and then a condition section. Text is enclosed within double quotes (inverted commas) and hexadecimal strings are surrounded by curly brackets.
The rule name is referred to as an identifier. Figure 1 shows reserved identifiers that cannot be used, thanks to the fact that YARA uses these as keywords in its own rule processing. Case-sensitive rule names cannot start with a number, but other than that, the name can consist of any alphanumeric character, as long as the name is less than 128 characters long.
Listing 1 shows an example of a typical, two-sectioned rule. As you might guess from the condition section in Listing 1, the rule will trigger and find a positive result if either the plain text string or the hexadecimal is matched during a file scan. Note the or
keyword used in the condition. I'm sure you are starting to appreciate that YARA rules are clean and logical in their construction. It is possible to use global rules that you can pass to other rules, such as this example from the YARA documentation:
global rule SizeLimit { condition: filesize < 2MB }
Listing 1
Two-Sectioned Rule
rule two_main_sections { strings: $some_text = "identifying malware text" $some_hexadecimal = { 44 B3 45 ED A2 14 C4 44 B3 45 ED A4 23 } condition: $some_text or $some_hexadecimal }
Additionally, modules are supported – and some modules are bundled with the tool directly. Modules extend the main functionality that YARA provides and can create much more fine-grained rules.
Populating your rules with variables is also possible, and you can pull in and include files too if you wish. YARA follows much of the same syntax as the C language, so the format for multiple-line comments is as follows:
/* Comment for blocks of code */
And, for single line comments, you can use the following format:
// nothing to see here, move along
It might have already occurred to you that running YARA in an enterprise environment could mean iterating through tens of thousands of rules. Fret not – YARA lets you compile the rules, which makes them highly performant.
YARA bundles a binary called yarac
that you can use to compile rules. Figure 2 shows the output from the file
command after compiling my_first_rule
.
You can load up compiled rules by using the -C
option:
$ yara -C compiled_my_first_rule issue not_really_malware issue
Cats and Dogs
To show you YARA in action, the malware that I will test against is called mimikatz
. Hosted on GitHub [8], mimikatz
claims to be able to "extract plaintext passwords, hash, PIN code, and Kerberos tickets from memory."
As soon as I click the mimikatz link, Google Chrome panics with the page shown in Figure 3.
In Figure 3, you can see that Chrome's built-in security is working as hoped. (Opening the page in Firefox yields a similar warning.)
I found an excellent site called Full Security Engineer [9] with some very useful security information and used the rule on display there, as shown in Listing 2.
Listing 2
Matching mimikatz Malware
rule mimikatz_test { meta: author = "fullsecurityengineer.com" description = "A rule for detecting ParrotSec mimikatz" strings: $s1 = {4d 5a} // Windows Executable File Magic Numbers $s2 = "benjamin@gentilkiwi.com0" ascii fullword $s3 = "$http://blog.gentilkiwi.com/mimikatz 0" ascii fullword condition: $s1 at 0 and $s2 and $s3 }
And, as if by magic, YARA displays multiple findings (Listing 3). The -r
option recursively scans all files in a directory tree.
Listing 3
Yara Output
$ yara -r ~/mimikatz_test mimikatz/ mimikatz_test mimikatz//Win32/mimilove.exe mimikatz_test mimikatz//Win32/mimilib.dll mimikatz_test mimikatz//x64/mimilib.dll mimikatz_test mimikatz//Win32/mimikatz.exe mimikatz_test mimikatz//x64/mimikatz.exe
As should be obvious from the output, multiple files have been matched using some of the strings in the rule in Listing 2, specifically the following:
$s1 = {4d 5a} // Windows Executable File Magic Numbers $s2 = "benjamin@gentilkiwi.com0" ascii fullword $s3 = "$http://blog.gentilkiwi.com/mimikatz 0" ascii fullword
Running the command yara -help
, as you would expect, offers lots of other options. The -s
switch, for instance, will print matching strings, which is ideal for debugging.
Having compiled the mimikatz_test
rule, I can now pull together a recursive scan with pattern-matches, using the following command with the abbreviated output as shown in Figure 4:
$ yara -sCr compiled_mimikatz_test mimikatz/
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News
-
System76 Releases COSMIC Alpha 7
With scores of bug fixes and a really cool workspaces feature, COSMIC is looking to soon migrate from alpha to beta.
-
OpenMandriva Lx 6.0 Available for Installation
The latest release of OpenMandriva has arrived with a new kernel, an updated Plasma desktop, and a server edition.
-
TrueNAS 25.04 Arrives with Thousands of Changes
One of the most popular Linux-based NAS solutions has rolled out the latest edition, based on Ubuntu 25.04.
-
Fedora 42 Available with Two New Spins
The latest release from the Fedora Project includes the usual updates, a new kernel, an official KDE Plasma spin, and a new System76 spin.
-
So Long, ArcoLinux
The ArcoLinux distribution is the latest Linux distribution to shut down.
-
What Open Source Pros Look for in a Job Role
Learn what professionals in technical and non-technical roles say is most important when seeking a new position.
-
Asahi Linux Runs into Issues with M4 Support
Due to Apple Silicon changes, the Asahi Linux project is at odds with adding support for the M4 chips.
-
Plasma 6.3.4 Now Available
Although not a major release, Plasma 6.3.4 does fix some bugs and offer a subtle change for the Plasma sidebar.
-
Linux Kernel 6.15 First Release Candidate Now Available
Linux Torvalds has announced that the release candidate for the final release of the Linux 6.15 series is now available.
-
Akamai Will Host kernel.org
The organization dedicated to cloud-based solutions has agreed to host kernel.org to deliver long-term stability for the development team.