Performance gains with goroutines

Faster Through Parallelism

With this tool, the web client in Listing 5 now sends the requests to the different Internet pages at the same time and saves time instead of awaiting each request's return and then moving on to the next. It also uses the httpsimple package shown in Listing 1 to retrieve the data from the web. The fetchall() function as of line 32 starts a separate goroutine for each request; this means that four goroutines are working on retrieving and processing the data, and another one is collecting the results, all at the same time!

Listing 5

http-parallel.go

01 package main
02
03 import(
04   "fmt"
05   "httpsimple"
06 )
07
08 type Result struct {
09   Error error
10   Body string
11   Url string
12 }
13
14 func main() {
15   urls := []string{
16     "https://google.com",
17     "https://facebook.com",
18     "https://yahoo.com",
19     "https://apple.com"}
20
21   results := fetchall(urls)
22
23   for i := 0; i<len(urls); i++ {
24     result := <-results
25     if result.Error == nil {
26       fmt.Printf("%s: %d bytes\n",
27         result.Url, len(result.Body))
28     }
29   }
30 }
31
32 func fetchall(
33   urls []string) (<-chan Result) {
34
35   results := make(chan Result)
36
37   for _, url := range urls {
38     go func(url string) {
39       body, err := httpsimple.Get(url)
40       results <- Result{
41         Error: err, Body: body, Url: url}
42     }(url)
43   }
44
45   return results
46 }

The channel through which the worker bees send their results to the main program is defined in line 35, setting the type of the data fed into the channel as the Result structure defined in line 8. After the Get() function of the httpsimple package has returned the text data from the retrieved web page, and the result has been stored in the body variable, line 40 inserts it along with any error codes and the URL into the data structure and then writes it into the channel, using the write operator <- on the right side of the results channel variable.

Beware Pitfalls!

When firing off goroutines in for loops, there is one typical newcomer mistake that you will want to avoid [4]. The go func(){}() call to an anonymously defined function as a goroutine acts as a closure (i.e., any locally defined variables in the main program are available in the goroutines, even if the variables lose their validity on leaving the current code block.)

But since the url loop variable changes its value in each new pass of the loop, and most likely none of the goroutines will start running before the loop ends, programmers will find themselves faced with the strange phenomenon that each of the goroutines is given the same value for url, usually the last element of the array in the loop. To prevent this from happening and to make sure that each goroutine gets its own url value, the loop body in Listing 5 adds the url parameter to the argument list of the anonymous function in line 38, while line 42 passes it into the function as an argument.

The main program iterates as of line 23 over a fixed number of channel entries. Thankfully, the number is defined by the length of the urls array in line 15. The channel read operator can then simply block in line 24 until the next result is available in the channel, since the parallel goroutines will store exactly the specified number of results in the channel.

Figure 4 shows that parallel data collection indeed saves a good deal of time; the program completes the process about three times as fast. It is undoubtedly more efficient to keep the computer busy with other tasks while waiting for web data than to sit around, twiddling its tiny thumbs.

Figure 4: If goroutines fire off requests simultaneously, the program is three times faster.

I highly recommend the book by Katherine Cox-Buday on the subject of concurrency with Go [5]. It meticulously walks the reader through good and bad design with Go channels, and it not only shows the common design patterns, but also looks behind the scenes and explains why a certain approach will produce faster and less error-prone programs.

The speed increase does not come as a free gift with parallelization. If you don't pay meticulous attention, you might end up scratching your head and wondering why you have race conditions, deadlocks, or other mysterious panic attacks of the program on production systems under load. Consequently, careful design is important.

Infos

  1. Listings for this article: ftp://ftp.linux-magazine.com/pub/listings/linux-magazine.com/219/
  2. "Don't use Go's default HTTP client (in production)" by Nathan Smith: https://medium.com/@nate510/don-t-use-go-s-default-http-client-4804cb19f779
  3. "Tower of Babylon" by Michael Schilli, Linux Magazine, issue 201, August, 2017, pp. 60-62: http://www.linux-magazine.com/Issues/2017/201/Programming-Snapshot-Multilingual-Programming/(language)/eng-US
  4. "Closure mistake with for loops": https://github.com/golang/go/wiki/CommonMistakes
  5. Cox-Buday, Katherine. Concurrency in Go. O'Reilly, 2017

The Author

Mike Schilli works as a software engineer in the San Francisco Bay area, California. Each month in his column, which has been running since 1997, he researches practical applications of various programming languages. If you email him at mailto:mschilli@perlmeister.com he will gladly answer any questions.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Rat Rac

    If program parts running in parallel keep interfering with each other, you may have a race condition. Mike Schilli shows how to instruct the Go compiler to detect these conditions and how to avoid them in the first place.

  • Let's Go!

    Released back in 2012, Go flew under the radar for a long time until showcase projects such as Docker pushed its popularity. Today, Go has become the language of choice of many system programmers.

  • Fighting Chaos

    When functions generate legions of goroutines to do subtasks, the main program needs to keep track and retain control of ongoing activity. To do this, Mike Schilli recommends using a Context construct.

  • Motion Sensor

    Inotify lets applications subscribe to change notifications in the filesystem. Mike Schilli uses the cross-platform fsnotify library to instruct a Go program to detect what's happening.

  • Progress by Installments

    Desktop applications, websites, and even command-line tools routinely display progress bars to keep impatient users patient during time-consuming actions. Mike Schilli shows several programming approaches for handwritten tools.

comments powered by Disqus

Direct Download

Read full article as PDF:

Price $2.95

News