Go retrieves GPS data from the komoot app
Scraping the Web
The web scraper is running as a compiled Go binary called from the command line. As a browser replacement, it uses the Go Colly package, which has been featured in our Snapshot programming series previously [4]. The functions in Listing 4 log in to the komoot account (kLogin()
, line 23), retrieve a list of tours stored there (kTours()
, line 48), and extract the GPS data from individual tours (kTour()
, line 70).
Listing 4
kfetch.go
01 package main 02 03 import ( 04 "fmt" 05 "github.com/gocolly/colly/v2" 06 ) 07 08 var loginURL = "https://account.komoot.com/v1/signin" 09 var signinURL = "https://account.komoot.com/actions/transfer?type=signin" 10 11 type kColl struct { 12 c *colly.Collector 13 creds map[string]string 14 } 15 16 func NewkColl() kColl { 17 return kColl{ 18 c: colly.NewCollector(), 19 creds: readCreds(), 20 } 21 } 22 23 func (kc knoll) kLogin() error { 24 c := kc.c.Clone() 25 c.OnRequest(func(req *colly.Request) { 26 fmt.Println("Visiting", req.URL) 27 }) 28 29 payload := map[string]string{ 30 "email": kc.creds["email"], 31 "password": kc.creds["password"], 32 "reason": "null", 33 } 34 35 err := c.Post(loginURL, payload) 36 if err != nil { 37 return err 38 } 39 40 err = c.Visit(signinURL) 41 if err != nil { 42 return err 43 } 44 45 return nil 46 } 47 48 func (kc kColl) kTours() ([]byte, error) { 49 c := kc.c.Clone() 50 toursURL := fmt.Sprintf( 51 "https://www.komoot.com/user/%s/tours", 52 kc.creds["client_id"]) 53 54 jdata := []byte{} 55 var err error 56 57 c.OnRequest(func(req *colly.Request) { 58 fmt.Println("Visiting", req.URL) 59 req.Headers.Set("onlyprops", "true") 60 }) 61 62 c.OnResponse(func(resp *colly.Response) { 63 jdata = resp.Body 64 }) 65 66 c.Visit(toursURL) 67 return jdata, err 68 } 69 70 func (kc kColl) kTour(tourID string) ([]byte, error) { 71 c := kc.c.Clone() 72 tourURL := fmt.Sprintf( 73 "https://www.komoot.com/tour/%s", tourID) 74 75 jdata := []byte{} 76 var err error 77 78 c.OnRequest(func(req *colly.Request) { 79 fmt.Println("Visiting", req.URL) 80 req.Headers.Set("onlyprops", "true") 81 }) 82 83 c.OnResponse(func(resp *colly.Response) 84 { 85 jdata = resp.Body 86 }) 87 88 c.Visit(tourURL) 89 return jdata, err 90 }
Go does not offer classic object orientation, but with a data structure like kColl
in line 11, a constructor like NewkColl()
in line 16, and receivers on the left side of the function names used as methods, it has something very similar, to all extents and purposes. The functions share the data structure, which the caller initializes once at the beginning with the constructor. The constructor in the code at hand stores an instance of the Colly scraper and the creds
hash table with the previously obtained user credentials.
The Colly open source scraper library jumps to the OnRequest()
callbacks before it executes the requested HTTP request with the Visit()
or Post()
functions. In Listing 4, Print()
shows the user which URL is currently being processed and, in some cases, sets special HTTP headers so that the komoot servers won't return HTML code but easier-to-analyze JSON data.
Tasty Cookies
All three functions share a scraper instance that preserves the cookies set at the beginning of the komoot session, which starts when logging in because the server would not hand out the tour data to simply every Tom, Dick, and Harry. One thing to look out for, though: The Colly scraper does not replace the callbacks in the OnRequest()
calls when you set them again later but stacks them up so that, in the code at hand, the third function would not output the URL being accessed once but three times. This is remedied by clones created with Clone()
, which keep the cookies but reset the callbacks. Figure 5 shows how the program compiles with the listings that will be explained in the remaining sections of this article. It also illustrates the program's typical output as it finds tours on the server but only downloads them if they are not already available locally.
Cats and Dogs
Komoot's web server delivers both the tour list and the details of individual tours in JSON format because of the headers set in Listing 4. JSON and Go are as compatible as cats and dogs, however, because JSON offers dynamic types with few type checks, while Go insists on precisely defined data structures. To convert deeply nested JSON text into internal Go data structures, programmers need to really coax the language. If you wanted to import JSON into a scripting language such as Python and convert it to GPX later on, you could do so effortlessly with just a dozen lines of code. Go, on the other hand, as you can see from Listing 5 and Listing 6, calls for some pretty exhausting requirements.
Listing 5
tours.go
01 package main 02 03 import ( 04 "encoding/json" 05 ) 06 07 func tourIDs(jdata []byte) []string { 08 var data map[string]interface{} 09 10 err := json.Unmarshal(jdata, &data) 11 if err != nil { 12 panic(err) 13 } 14 15 data = drill(data, 16 []string{"kmtx", "session", 17 "_embedded", "profile", 18 "_embedded", "tours", 19 "_embedded"}) 20 21 items := 22 data["items"].([]interface{}) 23 24 ids := []string{} 25 26 for _, item := range items { 27 table := 28 item.(map[string]interface{}) 29 id := table["id"].(string) 30 ids = append(ids, id) 31 } 32 33 return ids 34 } 35 36 func drill(part map[string]interface{}, keys []string) map[string]interface{} { 37 for _, key := range keys { 38 part = part[key].(map[string]interface{}) 39 } 40 41 return part 42 }
Listing 6
gpx.go
01 package main 02 03 import ( 04 "encoding/json" 05 "fmt" 06 "time" 07 ) 08 09 func toGpx(jdata []byte) []byte { 10 var data map[string]interface{} 11 12 json.Unmarshal([]byte(jdata), &data) 13 tour := drill(data, []string{ 14 "page", "_embedded", "tour"}) 15 start := tour["date"].(string) 16 17 coord := drill(tour, []string{ 18 "_embedded", "coordinates"}) 19 items := 20 coord["items"].([]interface{}) 21 ts, err := time.Parse(time.RFC3339, start) 22 if err != nil { 23 panic(err) 24 } 25 26 xml := "<gpx><trk>" 27 for _, item := range items { 28 pt := item.(map[string]interface{}) 29 secs := pt["t"].(float64) / 1000.0 30 t := ts.Add(time.Duration(secs) * time.Second) 31 xml += fmt.Sprintf(`<trkseg> 32 <trkpt lat="%f" lon="%f"> 33 <ele>%.1f</ele> 34 <time>%s</time> 35 </trkpt></trkseg>`, pt["lat"], 36 pt["lng"], pt["alt"], 37 t.Format(time.RFC3339)) 38 } 39 xml += "</trk></gpx>\n" 40 return []byte(xml) 41 }
Since the komoot data is nested a whopping nine levels deep, the officially prescribed approach for the conversion would be a bit of a pain. It would mean defining the complete data structure with all its levels using struct
declarations in Go. If you are wary of this much typing, you can simply define a one-dimensional map with an empty interface{}
as a placeholder instead, like in line 8 of Listing 5, and make a type assertion to a hashmap each time when descending into the depths of the sub-hashmaps (line 38). Go then looks at the value, concludes that it might be a map, and lets you dig deeper.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Gnome 47.1 Released with a Few Fixes
The latest release of the Gnome desktop is all about fixing a few nagging issues and not about bringing new features into the mix.
-
System76 Unveils an Ampere-Powered Thelio Desktop
If you're looking for a new desktop system for developing autonomous driving and software-defined vehicle solutions. System76 has you covered.
-
VirtualBox 7.1.4 Includes Initial Support for Linux kernel 6.12
The latest version of VirtualBox has arrived and it not only adds initial support for kernel 6.12 but another feature that will make using the virtual machine tool much easier.
-
New Slimbook EVO with Raw AMD Ryzen Power
If you're looking for serious power in a 14" ultrabook that is powered by Linux, Slimbook has just the thing for you.
-
The Gnome Foundation Struggling to Stay Afloat
The foundation behind the Gnome desktop environment is having to go through some serious belt-tightening due to continued financial problems.
-
Thousands of Linux Servers Infected with Stealth Malware Since 2021
Perfctl is capable of remaining undetected, which makes it dangerous and hard to mitigate.
-
Halcyon Creates Anti-Ransomware Protection for Linux
As more Linux systems are targeted by ransomware, Halcyon is stepping up its protection.
-
Valve and Arch Linux Announce Collaboration
Valve and Arch have come together for two projects that will have a serious impact on the Linux distribution.
-
Hacker Successfully Runs Linux on a CPU from the Early ‘70s
From the office of "Look what I can do," Dmitry Grinberg was able to get Linux running on a processor that was created in 1971.
-
OSI and LPI Form Strategic Alliance
With a goal of strengthening Linux and open source communities, this new alliance aims to nurture the growth of more highly skilled professionals.