Pattern-matching tools for chasing down malicious software

Gate Search

© Lead Image © varijanta, 123RF.com

© Lead Image © varijanta, 123RF.com

Article from Issue 292/2025
Author(s):

The big antivirus companies offer a myriad of malware scanning utilities, but it is often difficult to see what they are really doing or to customize them for specific needs. Beyond the giants are a class of more versatile tools that let you choose the rulesets – and even write your own rules.

The words virus and malware are often used in the same breath, although they refer to different things. Malicious software (typically shortened to "malware") covers a vast array of unwelcome software threats that fall into multiple categories. The term malware essentially refers to any code designed to cause harm. A virus, on the other hand, is a type of malware that generally becomes active when a legitimate piece of software is executed. Like a real virus, a software virus has a way of replicating, allowing it to spread to any system it is capable of infecting.

In this article, I will look at software that allows anyone familiar with Linux and a willingness to learn to create rules for detecting and classifying malware.

YARA

YARA [1] is provided by VirusTotal [2] and is released under the permissive BSD 3-Clause "New" License, which means that you can use it for commercial purposes [3]. Google acquired VirusTotal in 2012 [4]. The name YARA apparently stands for "Yet Another Ridiculous Acronym"!

The YARA documentation [5] describes YARA as a tool that can "…create descriptions of malware families (or whatever you want to describe) based on textual or binary patterns." The YARA website [6] includes an impressive list of some of the companies using the tool, including Trend Micro, Kaspersky Lab, SonicWall, ESET, and Avast. The YARA project refers to its toolset as "The pattern matching swiss knife for malware researchers (and everyone else)."

You can install YARA via the package manager on several Linux distributions, but I will focus on Debian Linux derivatives (in this case, Ubuntu 22.04 "Jammy Jellyfish" installed in a VirtualBox virtual machine). YARA will also run on Mac and Windows platforms, and if you want the very latest version, you can compile it yourself [7].

The following command gets the installation process started:

$ apt install -y yara

Two new packages are installed, taking up less than half a megabyte of disk space. In addition to the yara package, the installer sets up the libyara8 library.

The YARA documentation is detailed and easy to follow. The docs encourage you to write your first malware rule, which I will do right now:

$ echo "rule not_really_malware { condition: true }" > my_first_rule

The next step is to run YARA against this rule. The usual way to run YARA is to pass it a rule name and the name of a file you wish to scan for malware. I have copied the file /etc/issue into a temporary directory that I will run YARA in. /etc/issue is a file that gets called (unless you configure your system not to do so) when a user logs in, prior to the login prompt. In my case, the contents is simply:

Ubuntu 22.04.3 LTS \n \l

The \n option displays the hostname or node name, and the \l seems to insert the name of the current teletypewriter (TTY) interface name afterwards.

I am ready to run YARA for the first time, as shown here, with the rule name first and then the file to be scanned:

$ yara my_first_rule issue
not_really_malware issue

As you can see the rule with the filename my_first_rule, which will always report as true thanks to the condition, has declared a finding for the threat not_really_malware in our example test file issue.

Rules Are Rules

Each rule always starts with the word rule and then follows the popular YAML (YAML Ain't Markup Language) file format. Rules are then normally constructed with a strings definition section and then a condition section. Text is enclosed within double quotes (inverted commas) and hexadecimal strings are surrounded by curly brackets.

The rule name is referred to as an identifier. Figure 1 shows reserved identifiers that cannot be used, thanks to the fact that YARA uses these as keywords in its own rule processing. Case-sensitive rule names cannot start with a number, but other than that, the name can consist of any alphanumeric character, as long as the name is less than 128 characters long.

Figure 1: Reserved yara keywords that aren't allowed as rule names (identifiers).

Listing 1 shows an example of a typical, two-sectioned rule. As you might guess from the condition section in Listing 1, the rule will trigger and find a positive result if either the plain text string or the hexadecimal is matched during a file scan. Note the or keyword used in the condition. I'm sure you are starting to appreciate that YARA rules are clean and logical in their construction. It is possible to use global rules that you can pass to other rules, such as this example from the YARA documentation:

global rule SizeLimit
{
    condition:
        filesize < 2MB
}

Listing 1

Two-Sectioned Rule

rule two_main_sections
{
    strings:
        $some_text = "identifying malware text"
        $some_hexadecimal = { 44 B3 45 ED A2 14 C4 44 B3 45 ED A4 23 }
    condition:
        $some_text or $some_hexadecimal
}

Additionally, modules are supported – and some modules are bundled with the tool directly. Modules extend the main functionality that YARA provides and can create much more fine-grained rules.

Populating your rules with variables is also possible, and you can pull in and include files too if you wish. YARA follows much of the same syntax as the C language, so the format for multiple-line comments is as follows:

/*
    Comment for blocks of code
*/

And, for single line comments, you can use the following format:

// nothing to see here, move along

It might have already occurred to you that running YARA in an enterprise environment could mean iterating through tens of thousands of rules. Fret not – YARA lets you compile the rules, which makes them highly performant.

YARA bundles a binary called yarac that you can use to compile rules. Figure 2 shows the output from the file command after compiling my_first_rule.

Figure 2: Compiling a rule in yara with the yarac binary.

You can load up compiled rules by using the -C option:

$ yara -C compiled_my_first_rule issue
not_really_malware issue

Cats and Dogs

To show you YARA in action, the malware that I will test against is called mimikatz. Hosted on GitHub [8], mimikatz claims to be able to "extract plaintext passwords, hash, PIN code, and Kerberos tickets from memory."

As soon as I click the mimikatz link, Google Chrome panics with the page shown in Figure 3.

Figure 3: Google Chrome is doing its job and alerting about the presence of malware.

In Figure 3, you can see that Chrome's built-in security is working as hoped. (Opening the page in Firefox yields a similar warning.)

I found an excellent site called Full Security Engineer [9] with some very useful security information and used the rule on display there, as shown in Listing 2.

Listing 2

Matching mimikatz Malware

rule mimikatz_test
{
  meta:
     author = "fullsecurityengineer.com"
     description = "A rule for detecting ParrotSec mimikatz"
  strings:
     $s1 = {4d 5a} // Windows Executable File Magic Numbers
     $s2 = "benjamin@gentilkiwi.com0" ascii fullword
     $s3 = "$http://blog.gentilkiwi.com/mimikatz 0" ascii fullword
  condition:
     $s1 at 0 and $s2 and $s3
}

And, as if by magic, YARA displays multiple findings (Listing 3). The -r option recursively scans all files in a directory tree.

Listing 3

Yara Output

$ yara -r ~/mimikatz_test mimikatz/
mimikatz_test mimikatz//Win32/mimilove.exe
mimikatz_test mimikatz//Win32/mimilib.dll
mimikatz_test mimikatz//x64/mimilib.dll
mimikatz_test mimikatz//Win32/mimikatz.exe
mimikatz_test mimikatz//x64/mimikatz.exe

As should be obvious from the output, multiple files have been matched using some of the strings in the rule in Listing 2, specifically the following:

$s1 = {4d 5a} // Windows Executable File Magic Numbers
$s2 = "benjamin@gentilkiwi.com0" ascii fullword
$s3 = "$http://blog.gentilkiwi.com/mimikatz 0" ascii fullword

Running the command yara -help, as you would expect, offers lots of other options. The -s switch, for instance, will print matching strings, which is ideal for debugging.

Having compiled the mimikatz_test rule, I can now pull together a recursive scan with pattern-matches, using the following command with the abbreviated output as shown in Figure 4:

$ yara -sCr compiled_mimikatz_test mimikatz/
Figure 4: Lots of matched output, abbreviated.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Malware Analysis

    Forensic experts can't just delete a sketchy file – sometimes the challenge is to see what is in it without triggering an attack. Learn about some of the tools investigators use for analyzing suspicious files.

  • Rasp Pi Security

    Analyze malware on hacked Raspberry Pis and create a signature to detect malware in log entries.

  • MITRE ATT&CK Workshop

    The MITRE ATT&CK website keeps information on attackers and intrusion techniques. We'll show you how to use that information to look for evidence of an attack.

  • News

    Updates on Technologies, Trends, and Tools

  • News

    In the news: Linux Mint 20.3 Now Available; Linux Gets an Exciting New Firmware Feature; elementary OS 6.1 Has Been Released; Intel Releases Linux Patch for Alder Lake Thread Director; New Multiplatform Backdoor Malware Targets Linux, macOS, and Windows; and WhiteSource Releases Free Log4j Detection Tool.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News