Bulk renaming in a single pass with Go
Closed Case
The secret is known as closure and is a feature supported not only by Go but also by many other scripting and programming languages. Listing 4 illustrates the procedure with a simple example.
Listing 4
closure.go
01 package main 02 03 import "fmt" 04 05 func main() { 06 mycounter := mkmycounter() 07 08 mycounter() 09 mycounter() 10 mycounter() 11 } 12 13 func mkmycounter() func() { 14 count := 1 15 16 return func() { 17 fmt.Printf("%d\n", count) 18 count++ 19 } 20 }
Before a function-creating function like mkmycounter()
returns a newly constructed subroutine to the caller, it is allowed to define local variables, which are then wrapped into the returned function's context. When called multiple times, those variables subsequently appear global (or rather static) to the call context. If a call to the generated and returned function modifies one of these variables, the next call to the function will also find the previously modified value. The enclosed variables therefore belong to the function, much like instance variables belong to an object in object-oriented programming.
As expected, the call of the binary compiled from Listing 4 shows successive calls of the generated function outputting growing counter values (Listing 5).
Listing 5
Calling the Binary
01 $ go build closure.go 02 $ ./closure 03 1 04 2 05 3
Characters, Bytes, and Runes
The call to the regexp
function ReplaceAllString()
in line 31 of Listing 3 also needs some explanation. It replaces all the characters in the org
string matched by the regular expression rex
with the characters in the repl
string. On the other hand, the ReplaceAll()
function (without the String
suffix), which the user may find first in a cursory study of the man page, expects slices of the type []byte
instead of strings. Attentive readers may wonder what the difference is, considering the fact that you can easily convert a string into a byte slice with []byte(string)
.
To explain this, it is worthwhile digressing into Go's implementation of strings [2]. Astonished Go students will discover that strings and byte slices ([]byte
) are fundamentally different data types in Go. You are not allowed to modify existing strings: Strings are immutable, but you are allowed to mess around with byte slices. In addition, strings distinguish between characters and bytes. Since strings are UTF-8 encoded in Go code, the "PiÒata" string in the program text of Listings 6 and 7 takes up seven bytes, since the accented Ò character in UTF-8 is represented as c3 b1
hex.
As the meaning of the word "character" has historically often been confused with "byte," the Unicode standard refers to them as code points. The Ò character occupies position U+00F1
, which UTF-8 encodes as c3 b1
. To make things worse, there is also an alternative rendering of it in the form of two Unicode code points. This has a squiggly tilde floating above an n, but we'll not be going into that today. The only important thing is that Go refers to code points in the Unicode standard as "runes."
While the range
operator in Listing 6 parses the runes (Figure 1), the for
loop in Listing 7 indexes the individual bytes and returns the accented character in the form of two illegible bytes. You see: It makes sense to check very carefully whether a function processes strings or byte slices. Converting between the two different data types looks easy, but it involves a great deal of internal overhead – that is, it'll cost you compute cycles at runtime.
Listing 6
range.go
package main import "fmt" func main() { str := "PiÒata" for i, c := range str { fmt.Printf("str[%d]='%c'\n", i, c) } }
Listing 7
forloop.go
package main import "fmt" func main() { str := "PiÒata" for i := 0; i < len(str); i++ { fmt.Printf("str[%d]='%c'\n", i, str[i]) } }
Off We Go
Let's get back to Listing 4. Because of the closure implemented there, the function increments the value of the seq
variable by one for each call and replaces the {seq}
placeholder in the file template with the integer value padded out to four digits with leading zeros. foo-{seq}.log
first becomes foo-0001.log
, then foo-0002.log
, and so on.
The call to
go build renamer.go mkmodifier.go
compiles both listings and links the result together into a binary called renamer
. Figure 2 shows some usage examples.
By the way, the os.Rename()
function also accepts identical source and target files – in which case it just does nothing. But if the target file already exists, it overwrites it with the source file without any warning. If you don't want that, you can add a test and maybe a new --force
option, which tells the program to bulldoze whatever it finds in the way.
To avoid unintentional renaming of critical files, it is always a good idea to do a dry run first with -d
. Is everything okay? Then go again, and do it live this time.
Infos
- Renamer: https://github.com/adriangoransson/renamer
- "Strings, bytes, runes, and characters in Go": https://blog.golang.org/strings
« Previous 1 2
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
So Long Neofetch and Thanks for the Info
Today is a day that every Linux user who enjoys bragging about their system(s) will mourn, as Neofetch has come to an end.
-
Ubuntu 24.04 Comes with a “Flaw"
If you're thinking you might want to upgrade from your current Ubuntu release to the latest, there's something you might want to consider before doing so.
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.
-
Kodi 21.0 Open-Source Entertainment Hub Released
After a year of development, the award-winning Kodi cross-platform, media center software is now available with many new additions and improvements.
-
Linux Usage Increases in Two Key Areas
If market share is your thing, you'll be happy to know that Linux is on the rise in two areas that, if they keep climbing, could have serious meaning for Linux's future.
-
Vulnerability Discovered in xz Libraries
An urgent alert for Fedora 40 has been posted and users should pay attention.