Track down race conditions with Go
Programming Snapshot – Racing Goroutines
If program parts running in parallel keep interfering with each other, you may have a race condition. Mike Schilli shows how to instruct the Go compiler to detect these conditions and how to avoid them in the first place.
If programmers are not careful, program parts that are running in parallel will constantly get in each other's way, whether as processes, threads, or goroutines. If you leave the order in which system components read or modify data to chance, you are adding time bombs to your code. They will blow up sooner or later, leaving you with runtime errors that are difficult to troubleshoot. But how do you avoid them?
The common assumption that components will run in the same order that a program calls them is a fallacy – one easily refuted with an example such as in Listing 1. But coincidence can also be a factor. It is quite possible for something to work once but then crash after a small, and often unrelated, change to the code. The load on the system you are using can also play a role: Something may work flawlessly in slack times but fall apart unexpectedly under a heavy load.
Listing 1
orderfail.go
01 package main 02 import ( 03 "fmt" 04 ) 05 06 func main() { 07 done := make(chan bool) 08 n := 10 09 10 for i := 0; i < n; i++ { 11 go func(id int) { 12 fmt.Printf("goroutine %d\n", id) 13 done <- true 14 }(i) 15 } 16 17 for i := 0; i < n; i++ { 18 <-done 19 } 20 }
The fact that unsynchronized goroutines do not run in the order in which they are defined, even if the program starts them one after the other, is nicely illustrated by Listing 1 [1] and the output in the upper part of Figure 1. Although the for
loop starts goroutine
first, followed by 1
, then 2
, and so on, as defined by the index numbers in i
, the upper part of Figure 1 makes it clear from the compiled program's output that chaos reigns, and the goroutines write their messages to the output as a wildly confusing mess.
Each of the 10 go func()
s created in the for
loop passes the current loop index as a parameter to the respective goroutine, completely according to the textbook, so that they do not all share the same variable. Also, to stop the program from terminating immediately after the for
loop ends – instead of making it wait until all the goroutines have completed their work – each goroutine sends a message to the done
channel at the end of its working life. The final for
loop starting in line 17 collects the messages from there and does not terminate until the last goroutine has said goodbye.
One by One
But if you really want goroutine
to start first, then goroutine 1
, and so on, you need to use a synchronization mechanism, such as channels or mutex constructs, to make sure that the Go runtime maintains the desired order, defying the natural chaos.
Listing 2 demonstrates this with an array of 10 channels. The goroutines all start blocking, shortly after they are called, and wait until a message arrives on the channel assigned to them. This unblocks the read statement from the channel array starters
in line 17, and the goroutine moves on to printing its "Running" message. At first, none of the channels will have a message, but line 27 after the for
loop then starts a chain of events by writing a value to the first channel.
Listing 2
orderok.go
01 package main 02 import ( 03 "fmt" 04 ) 05 06 func main() { 07 done := make(chan bool) 08 n := 10 09 10 starters := make([](chan bool), n) 11 for i := 0; i < n; i++ { 12 starters[i] = make(chan bool) 13 } 14 15 for i := 0; i < n; i++ { 16 go func(id int) { 17 <-starters[id] 18 fmt.Printf("Running %d\n", id) 19 if id < n-1 { 20 starters[id+1] <- true 21 } 22 // [... DO WORK ...] 23 done <- true 24 }(i) 25 } 26 27 starters[0] <- true 28 29 for i := 0; i < n; i++ { 30 <-done 31 } 32 }
This releases the goroutine with the id
of
, because the block in its read statement in line 17 is now lifted. The routine then outputs its message and, to keep things ticking along, writes to the channel with the id+1
(i.e., 1
). This in turn triggers goroutine 1
, which in turn triggers goroutine 2
. This merry dance continues in a controlled manner until goroutine 9
initiates the completion of the program.
This approach naturally reduces the concurrency of all goroutines, which now do not all start quasi-simultaneously but wait for each other – but only as long as the individual goroutine needs to trigger the next one in the channel. What happens afterwards within the individual goroutines (commented in line 22 with the placeholder DO WORK
), is again a quasi-simultaneous affair.
There Can Only Be One Winner
The disastrous consequences that race conditions can cause in an application are illustrated by an airline's booking program in Listing 3. It detects in line 13 that there is still one seat available on the plane in the variable seats
, which is shared by two different goroutines. It then outputs a success message to the user and sets the number of remaining seats to zero.
Listing 3
airline.go
01 package main 02 import ( 03 "fmt" 04 "time" 05 ) 06 07 func main() { 08 seats := 1 09 10 for i := 0; i < 2; i++ { 11 go func(id int) { 12 time.Sleep(100 * time.Millisecond) 13 if seats > 0 { 14 fmt.Printf("%d booked!\n", id) 15 seats = 0 16 } else { 17 fmt.Printf("%d missed out.\n", id) 18 } 19 }(i) 20 } 21 22 time.Sleep(1 * time.Second) 23 fmt.Println("") 24 }
However, there are two parallel goroutines fighting over the booking in the for
loop starting in line 10. While one rejoices and prints the success message, the second goroutine also tests the variable seats
, which is still set to 1
, and proceeds to book the seat as well. The result is an overbooked plane and angry passengers.
The output at the top of Figure 2 shows that Listing 3 does indeed allow repeated double-bookings – exacerbated by the length of the microsleep instruction at line 12, simulating the actual booking process. This is not what a customer, or an airline, wants.
The root of the problem is obvious: Two concurrent program threads share the variable seats
during the time that elapses between the check seats > 0
in line 13 and the variable being reset by seats = 0
in line 15. If the second goroutine is performing a check while the first is booking the seat, the second goroutine erroneously thinks it has a free seat because seats
is still set to 1
. A booking error is inevitable.
The problem can be solved by either performing the check and setting the variable in a single atomic statement or by declaring the program area containing both statements to be a critical section that locks out other goroutines as long as one goroutine is working in it.
Listing 4 shows a possible solution to the problem using a buffered booking
channel with a depth of 1
, as created by the make
statement in line 9. Thanks to the buffer, one goroutine can write a value into the channel without it immediately blocking [2]. But if the next goroutine tries to send a value into the channel, it blocks until someone else has extracted the buffered value, and this happens at the end of the critical section in line 21.
Listing 4
airline-ok.go
01 package main 02 import ( 03 "fmt" 04 "time" 05 ) 06 07 func main() { 08 seats := 1 09 booking := make(chan bool, 1) 10 11 for i := 0; i < 2; i++ { 12 go func(id int) { 13 time.Sleep(100 * time.Millisecond) 14 booking <- true 15 if seats > 0 { 16 fmt.Printf("%d booked!\n", id) 17 seats = 0 18 } else { 19 fmt.Printf("%d missed out.\n", id) 20 } 21 <-booking 22 }(i) 23 } 24 25 time.Sleep(1 * time.Second) 26 fmt.Println("") 27 }
With this safeguard in place, only one goroutine traverses the critical section at any given time, and it doesn't matter how long it takes to check or set the seats
variable, because no one can interfere in the meantime. The lower part of Figure 2 then also shows that only one goroutine at a time makes the booking, while the other goroutine reports that there are no more seats available – to the disappointment of the passenger who wants to book. But that's how things have to be.
Reporting Speeders
During development, Go helps you detect race conditions – if you compile the source code with the -race
option. If two goroutines then race for a variable, the Go runtime detects this in the moment and outputs a corresponding error message (Figure 3). However, this requires the program to enter the subrange that triggers the problem during the test run. This makes it important for the test suite to cover the code as completely as possible.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.
-
Fedora 41 Beta Available with Some Interesting Additions
If you're a Fedora fan, you'll be excited to hear the beta version of the latest release is now available for testing and includes plenty of updates.
-
AlmaLinux Unveils New Hardware Certification Process
The AlmaLinux Hardware Certification Program run by the Certification Special Interest Group (SIG) aims to ensure seamless compatibility between AlmaLinux and a wide range of hardware configurations.
-
Wind River Introduces eLxr Pro Linux Solution
eLxr Pro offers an end-to-end Linux solution backed by expert commercial support.
-
Juno Tab 3 Launches with Ubuntu 24.04
Anyone looking for a full-blown Linux tablet need look no further. Juno has released the Tab 3.
-
New KDE Slimbook Plasma Available for Preorder
Powered by an AMD Ryzen CPU, the latest KDE Slimbook laptop is powerful enough for local AI tasks.
-
Rhino Linux Announces Latest "Quick Update"
If you prefer your Linux distribution to be of the rolling type, Rhino Linux delivers a beautiful and reliable experience.