Creating more readable regular expressions with Simple Regex Language
Clear-Sighted

Regular expressions are a powerful tool, but they can also be very hard to digest. The Simple Regex Language lets you write regular expressions in natural language.
Regular expressions are a fundamental feature of Linux – and many other modern operating systems. A regular expression is a search term with special placeholders representing several possible characters at the same time. The concept of a regular expression is an extension of the idea behind the "wildcard" character used in many GUI search tools, but the power and subtlety of regular expressions far exceeds what you can do with a simple wildcard.
For example, suppose you want to search the system.log
file for errors, but you don't know whether the term Error
will appear with initial cap or all lowercase (Error
or error
). You could use a regular expression as part of the Grep command:
grep -e '[eE]rror' system.log
The expression [eE]
means: There is either a lowercase e
or uppercase E
.
A quick check for capitalization is easy to read and interpret, but some regular expressions are much more exotic. Who is able to say right away what text the following expression describes:
/^(?:\w|[\.\-\+])+(?:@) (?:[a-z]|[0-9]|[\.\-])+(?:\.)[a-z]{2,}$/i
Once you derive an expression like this, it can be a powerful tool for a script or a string search tool like Grep, but for the human who created this expression, and the other humans who comes along later and want to read it, decoding a regular expression can be a time-consuming endeavor. What is more, a small error that creeps into the expression could be difficult to spot, although it could have a significant effect on the value of the search result. An error in a complex regular expression could even form the basis for malicious code and an Internet attack.
The fledgling Simple Regex Language (SRL, [1]) from the developer Karim Geiger aims to address the problem of incomprehensibility in regular expressions. Geiger started SRL as a bit of fun in Fall 2016, and since then, other developers have helped to implement SRL in various coding languages.
The SRL allows you to write regular expressions in natural English. In the previous example of the logfile, the two words Error
and error
start with either E
or e
. In SRL, you could say:
one of "eE"
and follow it with the character string rror
:
one of "eE" literally "rror"
This line forms a complete expression in the SRL. SRL does not consider uppercase and lowercase for keywords, so LITERALLY
is thus the same as literally
. However, for literal strings, uppercase and lowercase are very important: literally "Error"
therefore means something completely different from literally "error"
.
In SRL, the developer can frame strings – in the example rror
– with single or double quotes. You have the option of separating the individual components of the complete expression with a comma or a line break. Adding a break does not change the logic but instead simply improves the legibility:
one of "eE", literally "rror"
The example expression matches all text passages where the character strings error
or Error
appear. Hence the word Terror
ism would be a valid reference.
Empty Words
Spaces (whitespaces) correctly separate the words:
whitespace one of "eE" literally "rror" whitespace
The word error
is usually at the beginning of a line in logfiles. Anyone who is only interested in these lines, just needs to write:
begin with one of "eE" literally "rror"
The test text now needs to start with Error
or error
. However, the expression only works if the program considers each line of the file as text to be retested (similarly to grep
).
Some logfiles mark errors with the abbreviation EE
, which you could include in the expression with:
begin with any of (literally "EE", (one of "eE" literally "rror"))
As with traditional regular expressions, brackets group matching subexpressions. The term any of
serves as a logical Or. In the example, the text looks for lines beginning with either with the character string EE
, or with Error
or error
. The comma is cosmetic.
When the Post Rings
Sometimes characters should be repeated several times. For example, with the abbreviation EE
, there are exactly two E
s in succession. Or in SRL, you could say: literally "E" exactly 2 times
. Instead of exactly 2 times
, you could also write twice
.
In the following expression:
begin with any of (any character, one of".-+") once or more
the expression any character
stands for any letters between A and Z or for a digit between 0 and 9 or an underscore _
. Uppercase and lowercase are of no importance. The permitted characters can be repeated as often as desired; however, there must be at least one character. The entry once or more
ensures a minimum of one character.
If the string you are looking for is an email address, you'll also need to ensure the presence of the @
character: literally "@"
. The domain name behind it may, in turn, be made up of several letters or numbers and the special characters .
and -
:
any of (letter, digit, one of ".-") once or more
The any character
expression does not work for the domain name because domain names prohibit the underscore _
. The letter
and digit
expressions specify letters and numerals without additional characters. The top-level domain, which starts with a period, forms the end:
<C>literally "."<C>
At least two more letters follow:
letter at least 2 times must end
The developer explains that uppercase and lowercase are irrelevant by explicitly adding case insensitive
.
Listing 1 shows the whole expression. The expression deliberately keeps the email address test simple; for example, the standard allows other special characters in front of the @
. The domain name must also always end with a letter or a number.
Listing 1
Checking an Email Address
Testing, Testing, 1, 2, 3
You can test your SRL expression directly at the SRL project website under the menu item Build [2]. Just enter the SRL expression under Your SRL Query, type a test text under Test Input, and have it checked via Run Query (Figure 1). At the bottom of the page, developers immediately find out whether the test text matches the SRL expression. In addition, the page supplies the corresponding regular expression for comparison.

Figure 2 shows the expression for Listing 1 as an example – which, by the way, is identical to the cryptic regular expression at the beginning of this article. If the tester places a check mark in front of Save Query (to the right of Test Input), the server keeps track of all entries. The tester can use the URL at the bottom of the page to access the page with the SRL expression at any time. It remains unclear where the stored data will reside, so testers should not use sensitive data with Test Input.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Direct Download
Read full article as PDF:
Price $2.95
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
News
-
An All-Snap Version of Ubuntu is In The Works
Along with the standard deb version of the open-source operating system, Canonical will release an-all snap version.
-
Mageia 9 Beta 2 Ready for Testing
The latest beta of the popular Mageia distribution now includes the latest kernel and plenty of updated applications.
-
KDE Plasma 6 Looks to Bring Basic HDR Support
The KWin piece of KDE Plasma now has HDR support and color management geared for the 6.0 release.
-
Bodhi Linux 7.0 Beta Ready for Testing
The latest iteration of the Bohdi Linux distribution is now available for those who want to experience what's in store and for testing purposes.
-
Changes Coming to Ubuntu PPA Usage
The way you manage Personal Package Archives will be changing with the release of Ubuntu 23.10.
-
AlmaLinux 9.2 Now Available for Download
AlmaLinux has been released and provides a free alternative to upstream Red Hat Enterprise Linux.
-
An Immutable Version of Fedora Is Under Consideration
For anyone who's a fan of using immutable versions of Linux, the Fedora team is currently considering adding a new spin called Fedora Onyx.
-
New Release of Br OS Includes ChatGPT Integration
Br OS 23.04 is now available and is geared specifically toward web content creation.
-
Command-Line Only Peropesis 2.1 Available Now
The latest iteration of Peropesis has been released with plenty of updates and introduces new software development tools.
-
TUXEDO Computers Announces InfinityBook Pro 14
With the new generation of their popular InfinityBook Pro 14, TUXEDO upgrades its ultra-mobile, powerful business laptop with some impressive specs.