Bulk renaming files with the rename command

Names Have Been Changed

© Photo by CHUTTERSNAP on Unsplash

© Photo by CHUTTERSNAP on Unsplash

Author(s):

The rename command is a powerful means to simultaneously rename or even move multiple files following a given pattern.

Users often have to rename a collection of related files according to a specific pattern. You might have logfiles with dates and times in the file name, but the dates are not written in your preferred format (20230315 instead of 15-03-2023). Perhaps you have a collection of digital photos from your camera, or maybe you are working with files created on an old Microsoft Windows or MS-DOS system that are all uppercase, and you want to give them more readable file names.

Changing the names of a few files by hand may be manageable, but changing more than a dozen files quickly becomes not only tedious but error-prone. Linux does have some tools that will rename files in bulk. Most notably, the Thunar file manager [1] has a very flexible Bulk Rename tool (Figure 1), with several powerful built-in pattern-matching criteria from which to choose, making the tool sufficient for most use cases.

Figure 1: The Bulk Rename tool features many advanced capabilities, but it may not be as efficient as a command-line tool in the hands of an experienced user.

Once you get used to the command line, renaming files with a text-based command is usually faster than using a graphical tool. Plus, Thunar's Bulk Rename tool, although powerful, is still limited in its flexibility. For example, while Bulk Rename can rename files, it usually cannot move files from one directory or group of directories to another.

This article takes a deep look at the rename command [2], a very powerful command-line tool written in Perl that you can use for bulk renaming and a whole lot more.

Getting Started

If you don't have rename on your system, you can install it on Debian, Ubuntu, and derivatives with the following command:

sudo apt install rename

The rename command has the following syntax:

rename [options] [expression] [files]

The files are one or more files to rename. As with other command-line tools, standard shell wildcards such as *.png or file[0-9] are permitted.

The expression consists of commands to match and change parts of the file names; the results of applying the expression to each file name are used to give the file a new name. Usually, you will specify only one command – the s/// command for searching (or, less often, the y/// command for exchanging or transliterating individual letters) – to change uppercase file names to lowercase.

However, the expression can actually be almost any valid Perl code that operates on strings. If you are interested in Perl expressions, see the official Perl documentation [3]. However, it is unlikely that you'll need more than the s/// and y/// commands for changing file names.

In addition, rename accepts one or more options (see Table 1 for the most useful options).

Table 1

Useful rename Options

Option

Meaning

-n, --nono

Does not rename or move any files. This option is most useful when combined with the -v option, to show what would be done without actually renaming any files.

-v, --verbose

Prints each file's name, both before the expression is applied and after. This is useful to test the effects of the rename expression, especially when combined with the -n option.

-f, --force

Proceeds with renaming the files, even files which, once renamed, would have names that clash with existing files. Normally, rename will not rename a file if a file already exists with that name. When used, the renamed file will overwrite any existing file with the same name. Use with caution.

--path, --fullpath

Operates on the file's full pathname, not just the file name itself. For example, replacing all instances of the word JPG with JPEG on the file at Pictures/JPGs/1.JPG not only renames the file to 1.JPEG, but moves the file to Pictures/JPEGs/1.JPEG as well. This is rename's default behavior, so you should rarely need to specify this option explicitly.

-d, --filename, --nopath, --nofullpath

Operates only on the name of the file itself, rather than the full pathname of the file. Replacing all instances of the word JPG with JPEG in the file at Pictures/JPGs/1.JPG will rename the file to Pictures/JPGs/1.JPEG.

-u, --unicode

Normally, rename expects file names to be plain ASCII text. This option specifies Unicode format. An optional parameter specifies the exact character encoding for the file names.

A Basic Example

For my first example, I have some HTML files of Wikipedia articles that I downloaded using my web browser (see Listing 1). My web browser conveniently named each web page after the page's title. However, each page's title (and thus file name) ends with a hyphen followed by the word "Wikipedia," which is redundant and unnecessarily lengthens the name of each file.

Listing 1

HTML File Names with Redundant Text

$ ls -N
IEEE 754 - Wikipedia.html
Iron oxide - Wikipedia.html
Key Code Qualifier - Wikipedia.html
Wikipedia - Wikipedia.html

To remove the trailing "Wikipedia" and the hyphen, I will search for files whose names end with a space, a hyphen, another space, the word "Wikipedia," and the string ".html" and replace all that with just the string ".html" using the following command:

rename 's/ - Wikipedia\.html$/.html/' *.html

The s/// command searches for the part of the file name matching a pattern (enclosed between the first two slash characters as shown in annotation 1 in Figure 2) and replaces the matched text with some other text (enclosed between the second and third slashes, annotation 2 in Figure 2). Listing 2 shows the results of running this s/// command.

Listing 2

New File Names After Running rename

01 $ rename 's/ - Wikipedia\.html$/.html/' *.html
02 $ ls -N
03 IEEE 754.html
04 Iron oxide.html
05 Key Code Qualifier.html
06 Wikipedia.html
Figure 2: A simple but typical rename command: The command searches for the search text (1) and replaces any occurrence of it with the replacement text (2) in each of the supplied file names (3).

Note the backslash (\) character preceding the dot character (.) in the search term (line 1 of Listing 2). The search expression uses regular expression syntax [4], and the dot character has a special meaning in regular expressions. When not preceded by a backslash (known as an escape), a dot character will match not only a single dot character in the file name, but will match any kind of character. If I had not escaped the dot and had instead searched for simply - Wikipedia.html with a leading space, the search expression would have matched files (again all with leading spaces) named - Wikipedia.html, - Wikipediazhtml, - Wikipedia!html, and so on.

In practice, the set of files I want to rename contains nothing besides files of the form [x] - Wikipedia.html, so escaping the dot character is unnecessary in this case. However, when formulating search terms, it is good to be as specific as possible.

The dot character is one of several metacharacters that have a special meaning in regular expressions (see Table 2). The dollar sign ($) at the end of the search expression tells rename to match part of a file name only if the match occurs at the end of the file name.

Table 2

Regular Expression Metacharacters

Metacharacter

Meaning

\ (backslash)

Escapes the character immediately following the backslash so that the immediately following character is interpreted literally and not as a metacharacter itself. Use two consecutive backslashes (\\) to match a single literal backslash character.

. (dot)

Matches any single character.

[and ] (square brackets):

Matches any one of the characters enclosed within the square brackets. For example, [Ahk7~] matches A, h, k, 7, or ~, but no other characters and no combination of two or more characters. Ranges of characters are also supported; for instance, [A-Z] matches any single uppercase letter, and [A-Za-z0-9] matches any single numeric digit or upper- or lowercase letter. If a caret (^) immediately follows the open square bracket, the matching is inverted, and the square bracket expression will match any character not present within the square brackets; thus, [^A-Z_] matches k, 6, and #, but not K, Z, or an underscore (_).

( and ) (parentheses)

Combines parts of a regular expression that would normally be considered separate, as well as separates parts that would otherwise be considered one component. For example, (b|c|f)ar would match bar, car, or far, whereas without the parentheses (b|c|far) it would match b, c, or far but not bar or car. Anything within a pair of parentheses is grouped together into a single sub-expression, and other metacharacters will operate upon the parenthesized sub-expression as one unit; so (me)+ will match me, meme, mememe, and so on.

? (question mark)

Marks the previous character as optional (i.e., the character may either not occur or may occur exactly once). For example, z? matches either z or an empty string, but will not by itself match zz or zzzzzzz.

* (asterisk)

Causes the previous character in the search string to match no matter how many or how few times it occurs in a row, even if it does not occur at all. For example, H* will match H, HH, HHH, HHHHHHHHHH, or even nothing at all.

+ (plus sign)

Like the asterisk, causes the previous character in the search string to match no matter how many or how few times it occurs in a row, as long as it occurs at least once. For example, H+ will match H, HH, HHH, HHHHHHHHHH, but not an empty string.

{ and } (braces)

Causes the previous character to match if it appears a number of times, that number being between an upper and lower range specified between the braces. For example, k{2,6} matches between two and six letter ks in a row, but not seven or more, not a single k, and not an empty string. k{,6} is equivalent to k{1,6}, and k{3,} matches three or more letter ks in a row.

| (pipe)

Matches either of two (or possibly more) sub-expressions. For example, cat|walrus matches either cat or walrus, (cat|walrus)walk matches either catwalk or walruswalk, and cat|lion|weasel matches any of the words cat, lion, or weasel.

^ (caret)

Matches the start of a line. This does not match any real character by itself; it just marks that the next character in the search string must occur at the very beginning of the line. As expected, the caret must generally be the first character in the search string.

$ (dollar sign)

Matches the end of a line. As with the caret, this does not match any real character by itself and only informs rename to consider the previous character a match if and only if the previous character is the last character on the line. The dollar sign has another meaning if followed by a digit and/or if it appears in the replacement expression instead of the search expression (see the entry below).

$1 thru $9

References a specific parenthesized part of the search expression. $1 references whatever was matched by the sub-expression enclosed in the first pair of parentheses in the search expression, $2 references the sub-expression in the second pair of parentheses, and so on. See the section "Using Back References" for more information.

For example, the file Key Code Qualifier -- Wikipedia.html would be matched by the regular expression I used in Listing 2, but the file Z - Wikipedia.html.gz (which includes an extra trailing .gz) would not be matched. As with the dot character, to match a literal dollar sign character in the file name, the dollar sign must be preceded by a backslash.

You may also specify one or more characters following the final slash in the s/// command. These characters further modify the behavior of the search-and-replace operation, such as disabling case-sensitive matching (see the "s/// Options" box for details on the options supported by the s/// command.)

s/// Options

By adding one or more extra characters to the end of the s/// command, the behavior of the search-and-replace operation can be modified in various ways. Each option is a single character; multiple options may be specified by immediately following one option character by another, such as s/dog/cat/g, s/\.html$/.HTM/i, and s/recieve/receive/gi.

While more than a dozen options are supported, only two options are potentially useful to most users when renaming files. The first, g, instructs rename to replace all occurrences of the search string with the replacement string, not just the first occurrence. The default is to replace only the first occurrence of the search term; this is sufficient in most cases, but not if you want to replace all occurrences of, for example, the word "affect" with "effect" in the file name affect_of_the_affective_initial_affect.txt.

The other potentially useful option, i, enables case-insensitive searching. In other words, rename does not care whether a character in the search string is upper- or lowercase; either type of character will match either type of character in the file name. By default, if a character in the search string is lowercase, the corresponding character in the file name must also be lowercase in order for the search string to match. For example, without the i option, the search term \.html would match the file test1.html, but not test2.HTML or test3.Html. By contrast, with the i option, the same search expression would match all three files. Even if all or part of the search expression were capitalized, it would still work.

For more information on the other options not discussed here, see the Perl documentation [3].

Using Back References

Renaming files using simple search terms and regular expressions is sufficient in most cases. Most of the time, it suffices to simply add or remove a fixed string to each file name, as in the example of the downloaded Wikipedia pages.

However, sometimes it may be useful to rename files in more sophisticated ways. In the following example, as shown in Listing 3, I have a number of logfiles in a directory, all with dates and times in their names. Each file name contains the year, month, day, hour, minute, and second at which the logfile was created, in that order – roughly the convention of ISO 8601, the international format for dates and times.

Listing 3

Logfiles Names with Dates and Times

$ ls -N
daemon_20200309_071842
messages_20211213_134327
messages_20230402_093200
syslog_20191013_233611
syslog_20220726_185603

But suppose I were European and I wanted the dates and times formatted in my local date convention, which is the day followed by the month and finally the year. In addition, I want hyphens inserted between each of the date components (day-month-year) and colons inserted between each time component (hours:minutes:seconds), as in syslog_26-07-2022_18:56:03. Listing 4 shows what I want the file names to look like after renaming the files.

Listing 4

Date and Time Logfiles After Renaming

$ ls -N
daemon_09-03-2020_07:18:42
messages_13-12-2021_13:43:27
messages_02-04-2023_09:32:00
syslog_13-10-2019_23:36:11
syslog_26-07-2022_18:56:03

Renaming the files in this manner is not possible using just the simple regular expression syntax. For this purpose, you not only need to search for specific parts of the file name, but also reference in the replacement string the matching text of each of those parts. First, you need to search for the year (a four-digit number) followed by the month (a two-digit number) followed by the day (another two-digit number) and then replace it with the third string found by the search (the day) followed by the second string (the month) followed by the first string (the year).

Regular expressions provide a way to reference parts of the search string in the replacement string using back references. To use back references, the portion of the search string to be referenced must first be enclosed in parentheses. Then the parenthesized part of the search string may be back referenced in the replacement string by inserting a dollar sign ($) character followed by an index number into the replacement string.

The following rename command uses back references to accomplish my first task of reordering the components of the dates and also inserts hyphens between the components:

rename 's/([0-9]{4})([0-9]{2})([0-9]{2})/$3-$2-$1/' *

Figure 3 illustrates which parts of the search expression are referenced by each back reference. The arrows in the figure point to the referenced parenthesized regions of the search expression.

Figure 3: A rename command that uses back references to rearrange the parts of a date string. The annotations illustrate each parenthesized region that is referenced by each back reference in the replacement expression.

Combining Multiple Operations

As shown in Figure 3, the preceding example only reformatted the dates. I still need to insert the colons between each time component. Again, I can use back references, as follows:

rename 's/([0-9]{2})([0-9]{2})([0-9]{2})$/$1:$2:$3/' *

This command certainly works, but what if I want to use one single rename command to do both the date and time manipulation instead of running two separate rename commands in sequence? Certainly, I could combine the two search expressions into one very long search expression, but this quickly becomes cumbersome and very difficult to read:

rename 's/([0-9]{4})([0-9]{2})([0-9]{2})_([0-9]{2})([0-9]{2})([0-9]{2})$/$3-$2-$1_$4:$5:$6/' *

Fortunately, it is possible to perform both tasks using one command but keep the tasks logically separated. If each expression is separated by a semicolon character, rename can execute two or more expressions in one command:

rename 's/([0-9]{4})([0-9]{2})([0-9]{2})/$3-$2-$1/;
s/([0-9]{2})([0-9]{2})([0-9]{2})$/$1:$2:$3/' *

(Note the new line after the semicolon character. While not necessary, it improves the readability of the search expression; rename interprets it as a harmless whitespace character).

Transliterating Characters

The y/// command transliterates text. It looks for each character specified in the command's first parameter and replaces any instance of that character with the corresponding character in the second parameter. For example, to replace any As with Zs and any Zs with As in file names, use:

rename 'y/AZ/ZA/' *

After executing this command, the file ZAGREB.TXT becomes AZGREB.TXT.

While the y/// command is case-sensitive like s///, the y/// command does not have an option switch to enable case-insensitivity (see the "s/// Options" box for more information). Thus, the above y/// command will replace ZAGREB.TXT but not zagreb.txt. Furthermore, it will change Zagreb.txt to Aagreb.txt, but not to Azgreb.txt as you may expect. To do that, you would need to change the command to:

rename 'y/AZaz/ZAza/' *

One common use of y/// is to convert uppercase file names to lowercase, or vice versa, which is useful for old MS-DOS or early Windows files that saved files in all uppercase characters. You can implement such transliteration by specifying the entire alphabet in the command explicitly, but doing so is cumbersome because that would require typing out at least 52 letters: the 26 uppercase letters in the search expression, and the 26 lowercase letters in the replacement expression. Instead, you can specify ranges of characters in the search expression, as in y/[A-Z]/[a-z]/ (to replace uppercase characters with their lowercase equivalents).

Like the s/// command, the y/// command accepts one or more options following the final slash of the command. None of these options are likely to be useful for general purposes, but c and d might have some niche uses (see the "y/// Options" box).

y/// Options

Like the s/// command, the y/// command accepts a few option characters; each option alters the behavior of the y/// command in its own way. The y/// options are rarely useful, but two options, c and d, might come in handy.

Both of these options are used in connection with an intrinsic behavior of y/// known as squashing: If the number of characters on the replacement list is less than the number of characters on the search list, the last character on the replacement list is duplicated until the search and replacement lists are equal in length. For example,

y/[A-Z]/x/

is equivalent to:

y/[A-Z]/xxxxxxxxxxxxxxxxxxxxxxxxxx/

Both expressions will replace any uppercase letter with a lowercase x character. The first, however, is much more compact and easier to read.

One potentially useful option, c, instructs y/// to complement the list of characters on the search list and replace any character that is not present on the list. When combined with squashing, this can be used to change forbidden characters not explicitly on the search list to one particular placeholder character. For instance, if you have files with unprintable characters in their names (*nix/Linux filesystems can handle most non-printable characters in file names), you can quickly clean up the file names by replacing all non-alphabetic, non-numeric, non-underscore/hyphen characters in the file names with dot (.) characters, as in:

y/[A-Z][a-z][0-9]_-/./c

Another potentially useful option, d, disables squashing and deletes any character on the end of the replacement list that has no corresponding character on the search list. Thus, y/_.[A-Z]/_.[a-d]/d will convert the file name DOC_1993.BAK into dc_.ba. While this example is contrived, it is the nature of an option switch with limited practical utility.

Moving Files Between Directories

Another potential use of rename is to have each category of logfile placed in its own directory. In Listing 4, I have several dated logfiles named daemon, syslog, and messages. While I currently only have five logfiles in that directory, I could eventually end up with hundreds or even thousands of logfiles to manage. Consequently, I want to move each type of logfile into its own directory (e.g., I want syslog_13-10-2019_23:36:11 to be moved into a directory called syslog). Ideally, I would also like the initial part of the logfile's name to be removed because the containing directory's name should make clear the type of logfile. Listing 5 shows the desired resulting directory tree.

Listing 5

Separating into Subdirectories by Name

$ ls -FNR
.:
daemon/  messages/  syslog/
./daemon:
09-03-2020_07:18:42
./messages:
02-04-2023_09:32:00  13-12-2021_13:43:27
./syslog:
13-10-2019_23:36:11  26-07-2022_18:56:03

Fortunately, rename can move files just as easily as it can rename them. In fact, it can do both in the same step. Obviously, I want to do both simultaneously in this case, because I want to move the file and then remove the first part of the file name.

Unfortunately, to move a file to another directory, rename requires that the destination directory already exist; rename will not create the directory for you. Prior to running rename, you will have to pre-create all the necessary directories. I used the following shell one-liner to create the directories before running rename:

find . -maxdepth 1 -type f -printf '%f\0' | grep -Eoz '^[^_]+' |xargs -0 mkdir

This one-liner lists all files immediately under the current directory – not any files under subdirectories – and then takes the part of the file name up to the first underscore (e.g., messages), and creates a new directory in the current directory named after the first part of the file name.

Now, to move each logfile and then remove the initial part of each file name, I use:

rename 's/^([^_]+)_/$1\//' *

There are several things to note here. The first is that I instructed rename to search for any length of string at the very beginning of the file name that does not contain an underscore (the ^([^_]+) in the search expression). This takes advantage of the fact that the logfile type is separated from the date by an underscore. I then use a back reference followed by a slash in the replacement expression to tell rename to move the file into a directory named after whatever was matched by the aforementioned parenthesized expression.

Note how I escaped the slash character (as in \/) to guarantee that rename does not mistake the slash as the end of the replacement expression. Remember, the search and replacement expressions, as well as any options to the s/// command, are separated by slash characters, just like file-name components are separated by slashes. Actually, I could have used virtually any character to separate the parts of the s/// command; while using slashes is the common convention, I also could have used at signs (@) in the rename command above, or in any of the previous s/// commands. The following would have worked just as well:

rename 's@^([^_]+)_@$1/@' *

By using a character other than the slash to separate the parts of the s/// command, I no longer have to escape the slash in the replacement expression that denotes part of a directory path. In my opinion, this makes the command a bit easier to read. Just make sure that the character that you choose appears neither in the search or replacement expression (or is escaped where it appears).

Conclusion

Once you understand its syntax and use, the rename command is an efficient and very powerful utility for virtually any bulk renaming job you have in mind – from converting file names to title case, to moving files into different directories, to changing month numbers into month names (e.g., 2015-02-17 into 2015-Feb-17). All of these jobs and more can be performed with rename. Furthermore, several jobs can be combined into one command for even more power and flexibility.

This article has covered a number of examples to showcase the major features of rename, but I have only scratched the surface in terms of what can be done with the command. Hopefully, you will be inspired to come up with your own rename commands.

The Author

Michael Williams, better known by his pseudonym Gordon Squash, is a freelance, open source software developer. He is a member of the Core Developers Team of the MATE Desktop Environment project (https://mate-desktop.org/), enjoys hacking anything related to the GTK+ GUI widget toolkit, and works toward developing a fork of GTK+ called STLWRT (https://github.com/thesquash/stlwrt) when time permits. You can see some of his other current projects on his personal GitHub page (https://github.com/thesquash/).