Faster search filter: ICgrep
Well Filtered

One of the most common tasks when working on computers involves browsing texts for search patterns. Here, ICgrep offers a modern, parallel, and Unicode-enabled alternative to the classic grep.
Grep is arguably the most import text-browsing tool in Unix. Although its origins are not entirely clear, likely, the name for this tool developed from the command g/re/p
(global, regular expression, print) from the Unix standard editor Ed.
In any case, grep searches entries line by line for certain formulated patterns, as regular expressions. Depending on the options used, the tool can display the matches, their location, number, and so on.
With today's texts, this concept is stretched to its limits in several respects. For example, modern systems no longer use the antiquated ASCII character set with its 128 characters or its larger, 256-character siblings such as Latin 1 or increasingly Unicode [1] (see the "Using Unicode Characters" box).
Using Unicode Characters
Unicode aims to assign a digital code to each existing meaningful character or text element of all known writing cultures and character systems. The character range is equally extensive – encoding a single character requires several bytes.
Due to the large scope of the Unicode character range, it often isn't possible to enter characters on the keyboard directly because there simply aren't any keys for them. However, in Linux, Qt and GTK allow you to enter characters using key codes. To do so, you just need to precede the codes with the keyboard shortcut Ctrl+U or Ctrl+Shift+U – but not all programs support this.
It is often easier just to search for these – especially if you don't know the code for a particular character. There are two utilities for GTK-based desktops that are both used regularly: Gucharmap (Figure 1) and Gnome Characters (Figure 2). The latter is particularly useful for finding special characters and copying their codes.

Another problem is the definition of input lines. A plethora of variants already existed for identifying or inducing a line break with the previous, non-Unicode-enabled systems [2]. The default in Unix is NL
(New Line), and there are other options with Unicode, in particular NEL
(New Line), but also LS
(Line Separator) or PS
(Paragraph Separator).
The relatively low operating speed of grep – with large input volumes and complex regular expressions – makes matters worse. Even in the best case scenario, regexes [3] are pretty complicated to enter. When editing complex expressions, the computer requires a corresponding amount of computing power. On top of that, there are also conditional, non-greedy, and greedy regexes – which attempt to generate matches for samples to include as much text as possible – and many other advanced variants. All grep varieties process regular expressions at run time so that the complexity is mapped directly in the calculation time.
An online study [4] shows just how quickly ICgrep [5] works in comparison to the standard grep. It describes a scenario in which the standard grep needs more than a minute to search for a pattern, whereas ICgrep handles the same tasks in just over a second. According to the ICgrep developers, there might be differences in speed with a factor of up to 100.
Installation
Installing ICgrep proved to be a little tricky and quite time consuming in the test; its prerequisites are Subversion, Cmake, and Clang (Listing 1, line 1).
Listing 1
Creating ICgrep
To begin, load the latest ICgrep version onto your computer using Subversion. The message Checked out, revision
Number
marks the end of this action (line 5). Now switch to the icgrep1.0/icgrep-devel/
directory. There is an installation guide, among other things, in the README-icgrep-1.00a.txt
file. It essentially consists of two parts: the interpretation and the installation of Low Level Virtual Machine (LLVM) and the subsequent construction of ICgrep itself.
To install LLVM, it is important to generate the required Makefiles. To do this, switch to the llvm-build/
directory (line 6) and let Cmake perform its role there (line 7). After a short time, you can initiate the build of the LLVM using make && make install
(line 10). Depending on the computing power, the interpretation process takes about an hour, and the subsequent installation only takes a few seconds to complete.
The second step involves creating ICgrep (Listing 2). To do so, switch to the icgrep1.0/icgrep-devel/icgrep-build/
directory and create the Makefiles there using the command from the second line. The compiler automatically detects which CPU extensions the program can use to speed up processing. Alternatively, you can present one of the options SSE2
, SSE3
, SSE4_1
, SSE4_2
, AVX1,
or AVX2
using the -DSIMD_SUPPORT
button.
Listing 2
Generate the Executable File
The next step is to generate the executable file (Listing 2, line 15) without options using make
– this again takes a little longer. The generated program icgrep
can be made smaller using strip
. Last but not least, you can install the program manually in the /usr/local/bin/
directory.
In Practice
You can largely control ICgrep – and its cousins grep or agrep – using options. As this tool is a piece of research software, it contains two or three sets of options with different relevance for typical users. As usual, -help
shows the most important ones; -help-list
goes a step further: It generates a short, compact list of important buttons that, however, do require quite a bit of knowledge of regular expressions. The -help-list-hidden
option ultimately gives you the full list of options; however, many of these are only important in special cases.
You can define patterns, that is, regular expressions, using the -e
button ("expression"). If there is a risk that the shell will detect and evaluate or replace metacharacters in it, you should include the pattern between the single or double quotes. An equals sign can (but does not have to) follow the call, which may occur more than once:
$ icgrep -e Patt -e=ern
These types of expressions then stand for a logical OR: Condition 1 OR Condition 2 OR both must be met. If the number of patterns is too large and the commands get too confusing, it is advisable to outsource the pattern to an external file, one per line. You can specify this file using icgrep -f <SampleFile>
. Table 1 summarizes the most important buttons.
Table 1
Important Options in ICgrep
Option | Function |
---|---|
|
Defines a sample, can be specified multiple times |
|
Sample file |
|
Only counts matches, doesn't show them |
|
Displays the line numbers of matches |
|
Displays file name |
|
Ignores upper/lowercase |
|
Only |
|
Normalizes line breaks |
|
Shows the regular expressions used |
|
Shows named expressions |
One of ICgrep's significant shortcomings is that there's no option for recursively editing, like GNU grep does with the -r
button. Instead, you have to resort to the combination of Find, Xargs, and ICgrep:
$ find <Path> -type f | xargs icgrep <Sample>
Alternatively, today many shells, including Bash, support recursive directory searching using **
. However, this functionality must be activated via shopt -s globstar
. The shopt globstar
command then shows globstar on
. Caution: This process might lead to overly long argument lists and error messages like -bash: /usr/local/bin/icgrep: The argument list is too long
.
The other important options currently missing in ICgrep include -v
("invert match"), -C
, -A
, -B
(displays context lines), -m
(defines maximum number of matches), -a
, -b
(interprets inputs as text or binary), and --color
(highlights matches in color). The developers are planning to implement some of them soon [6].
Testing Regexes
With large data streams, it is advisable to test regular expressions first before applying them to all the data. Tools that visualize regular expressions make life easier for less experienced users. Two such tools are available in Ubuntu in the repositories: Kodos and Kiki [7].
Although Kodos can only be installed on older distributions with difficulty, Kiki is less demanding (Figure 3). It uses regular expressions defined and generated by Python, which it then applies to a previously entered text.

The regexp testing tool regexper
[8] works somewhat differently. The Java program naturally uses Java-based regular expressions and applies them to a previously loaded test file (Listing 3). The test data may be relatively large, which sometimes proves advantageous.
Listing 3
Test File
All of these tools, however, have weaknesses when it comes to processing Unicode characters. Therefore, when constructing regular expressions, they are only suitable for initial tests.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Find SysAdmin Jobs
News
-
Kubuntu Focus Announces XE Gen 2 Linux Laptop
Another Kubuntu-based laptop has arrived to be your next ultra-portable powerhouse with a Linux heart.
-
MNT Seeks Financial Backing for New Seven-Inch Linux Laptop
MNT Pocket Reform is a tiny laptop that is modular, upgradable, recyclable, reusable, and ships with Debian Linux.
-
Ubuntu Flatpak Remix Adds Flatpak Support Preinstalled
If you're looking for a version of Ubuntu that includes Flatpak support out of the box, there's one clear option.
-
Gnome 44 Release Candidate Now Available
The Gnome 44 release candidate has officially arrived and adds a few changes into the mix.
-
Flathub Vying to Become the Standard Linux App Store
If the Flathub team has any say in the matter, their product will become the default tool for installing Linux apps in 2023.
-
Debian 12 to Ship with KDE Plasma 5.27
The Debian development team has shifted to the latest version of KDE for their testing branch.
-
Planet Computers Launches ARM-based Linux Desktop PCs
The firm that originally released a line of mobile keyboards has taken a different direction and has developed a new line of out-of-the-box mini Linux desktop computers.
-
Ubuntu No Longer Shipping with Flatpak
In a move that probably won’t come as a shock to many, Ubuntu and all of its official spins will no longer ship with Flatpak installed.
-
openSUSE Leap 15.5 Beta Now Available
The final version of the Leap 15 series of openSUSE is available for beta testing and offers only new software versions.
-
Linux Kernel 6.2 Released with New Hardware Support
Find out what's new in the most recent release from Linus Torvalds and the Linux kernel team.