Go retrieves GPS data from the komoot app
Scraping the Web
The web scraper is running as a compiled Go binary called from the command line. As a browser replacement, it uses the Go Colly package, which has been featured in our Snapshot programming series previously [4]. The functions in Listing 4 log in to the komoot account (kLogin()
, line 23), retrieve a list of tours stored there (kTours()
, line 48), and extract the GPS data from individual tours (kTour()
, line 70).
Listing 4
kfetch.go
01 package main 02 03 import ( 04 "fmt" 05 "github.com/gocolly/colly/v2" 06 ) 07 08 var loginURL = "https://account.komoot.com/v1/signin" 09 var signinURL = "https://account.komoot.com/actions/transfer?type=signin" 10 11 type kColl struct { 12 c *colly.Collector 13 creds map[string]string 14 } 15 16 func NewkColl() kColl { 17 return kColl{ 18 c: colly.NewCollector(), 19 creds: readCreds(), 20 } 21 } 22 23 func (kc knoll) kLogin() error { 24 c := kc.c.Clone() 25 c.OnRequest(func(req *colly.Request) { 26 fmt.Println("Visiting", req.URL) 27 }) 28 29 payload := map[string]string{ 30 "email": kc.creds["email"], 31 "password": kc.creds["password"], 32 "reason": "null", 33 } 34 35 err := c.Post(loginURL, payload) 36 if err != nil { 37 return err 38 } 39 40 err = c.Visit(signinURL) 41 if err != nil { 42 return err 43 } 44 45 return nil 46 } 47 48 func (kc kColl) kTours() ([]byte, error) { 49 c := kc.c.Clone() 50 toursURL := fmt.Sprintf( 51 "https://www.komoot.com/user/%s/tours", 52 kc.creds["client_id"]) 53 54 jdata := []byte{} 55 var err error 56 57 c.OnRequest(func(req *colly.Request) { 58 fmt.Println("Visiting", req.URL) 59 req.Headers.Set("onlyprops", "true") 60 }) 61 62 c.OnResponse(func(resp *colly.Response) { 63 jdata = resp.Body 64 }) 65 66 c.Visit(toursURL) 67 return jdata, err 68 } 69 70 func (kc kColl) kTour(tourID string) ([]byte, error) { 71 c := kc.c.Clone() 72 tourURL := fmt.Sprintf( 73 "https://www.komoot.com/tour/%s", tourID) 74 75 jdata := []byte{} 76 var err error 77 78 c.OnRequest(func(req *colly.Request) { 79 fmt.Println("Visiting", req.URL) 80 req.Headers.Set("onlyprops", "true") 81 }) 82 83 c.OnResponse(func(resp *colly.Response) 84 { 85 jdata = resp.Body 86 }) 87 88 c.Visit(tourURL) 89 return jdata, err 90 }
Go does not offer classic object orientation, but with a data structure like kColl
in line 11, a constructor like NewkColl()
in line 16, and receivers on the left side of the function names used as methods, it has something very similar, to all extents and purposes. The functions share the data structure, which the caller initializes once at the beginning with the constructor. The constructor in the code at hand stores an instance of the Colly scraper and the creds
hash table with the previously obtained user credentials.
The Colly open source scraper library jumps to the OnRequest()
callbacks before it executes the requested HTTP request with the Visit()
or Post()
functions. In Listing 4, Print()
shows the user which URL is currently being processed and, in some cases, sets special HTTP headers so that the komoot servers won't return HTML code but easier-to-analyze JSON data.
Tasty Cookies
All three functions share a scraper instance that preserves the cookies set at the beginning of the komoot session, which starts when logging in because the server would not hand out the tour data to simply every Tom, Dick, and Harry. One thing to look out for, though: The Colly scraper does not replace the callbacks in the OnRequest()
calls when you set them again later but stacks them up so that, in the code at hand, the third function would not output the URL being accessed once but three times. This is remedied by clones created with Clone()
, which keep the cookies but reset the callbacks. Figure 5 shows how the program compiles with the listings that will be explained in the remaining sections of this article. It also illustrates the program's typical output as it finds tours on the server but only downloads them if they are not already available locally.
Cats and Dogs
Komoot's web server delivers both the tour list and the details of individual tours in JSON format because of the headers set in Listing 4. JSON and Go are as compatible as cats and dogs, however, because JSON offers dynamic types with few type checks, while Go insists on precisely defined data structures. To convert deeply nested JSON text into internal Go data structures, programmers need to really coax the language. If you wanted to import JSON into a scripting language such as Python and convert it to GPX later on, you could do so effortlessly with just a dozen lines of code. Go, on the other hand, as you can see from Listing 5 and Listing 6, calls for some pretty exhausting requirements.
Listing 5
tours.go
01 package main 02 03 import ( 04 "encoding/json" 05 ) 06 07 func tourIDs(jdata []byte) []string { 08 var data map[string]interface{} 09 10 err := json.Unmarshal(jdata, &data) 11 if err != nil { 12 panic(err) 13 } 14 15 data = drill(data, 16 []string{"kmtx", "session", 17 "_embedded", "profile", 18 "_embedded", "tours", 19 "_embedded"}) 20 21 items := 22 data["items"].([]interface{}) 23 24 ids := []string{} 25 26 for _, item := range items { 27 table := 28 item.(map[string]interface{}) 29 id := table["id"].(string) 30 ids = append(ids, id) 31 } 32 33 return ids 34 } 35 36 func drill(part map[string]interface{}, keys []string) map[string]interface{} { 37 for _, key := range keys { 38 part = part[key].(map[string]interface{}) 39 } 40 41 return part 42 }
Listing 6
gpx.go
01 package main 02 03 import ( 04 "encoding/json" 05 "fmt" 06 "time" 07 ) 08 09 func toGpx(jdata []byte) []byte { 10 var data map[string]interface{} 11 12 json.Unmarshal([]byte(jdata), &data) 13 tour := drill(data, []string{ 14 "page", "_embedded", "tour"}) 15 start := tour["date"].(string) 16 17 coord := drill(tour, []string{ 18 "_embedded", "coordinates"}) 19 items := 20 coord["items"].([]interface{}) 21 ts, err := time.Parse(time.RFC3339, start) 22 if err != nil { 23 panic(err) 24 } 25 26 xml := "<gpx><trk>" 27 for _, item := range items { 28 pt := item.(map[string]interface{}) 29 secs := pt["t"].(float64) / 1000.0 30 t := ts.Add(time.Duration(secs) * time.Second) 31 xml += fmt.Sprintf(`<trkseg> 32 <trkpt lat="%f" lon="%f"> 33 <ele>%.1f</ele> 34 <time>%s</time> 35 </trkpt></trkseg>`, pt["lat"], 36 pt["lng"], pt["alt"], 37 t.Format(time.RFC3339)) 38 } 39 xml += "</trk></gpx>\n" 40 return []byte(xml) 41 }
Since the komoot data is nested a whopping nine levels deep, the officially prescribed approach for the conversion would be a bit of a pain. It would mean defining the complete data structure with all its levels using struct
declarations in Go. If you are wary of this much typing, you can simply define a one-dimensional map with an empty interface{}
as a placeholder instead, like in line 8 of Listing 5, and make a type assertion to a hashmap each time when descending into the depths of the sub-hashmaps (line 38). Go then looks at the value, concludes that it might be a map, and lets you dig deeper.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Red Hat Adds New Deployment Option for Enterprise Linux Platforms
Red Hat has re-imagined enterprise Linux for an AI future with Image Mode.
-
OSJH and LPI Release 2024 Open Source Pros Job Survey Results
See what open source professionals look for in a new role.
-
Proton 9.0-1 Released to Improve Gaming with Steam
The latest release of Proton 9 adds several improvements and fixes an issue that has been problematic for Linux users.
-
So Long Neofetch and Thanks for the Info
Today is a day that every Linux user who enjoys bragging about their system(s) will mourn, as Neofetch has come to an end.
-
Ubuntu 24.04 Comes with a “Flaw"
If you're thinking you might want to upgrade from your current Ubuntu release to the latest, there's something you might want to consider before doing so.
-
Canonical Releases Ubuntu 24.04
After a brief pause because of the XZ vulnerability, Ubuntu 24.04 is now available for install.
-
Linux Servers Targeted by Akira Ransomware
A group of bad actors who have already extorted $42 million have their sights set on the Linux platform.
-
TUXEDO Computers Unveils Linux Laptop Featuring AMD Ryzen CPU
This latest release is the first laptop to include the new CPU from Ryzen and Linux preinstalled.
-
XZ Gets the All-Clear
The back door xz vulnerability has been officially reverted for Fedora 40 and versions 38 and 39 were never affected.
-
Canonical Collaborates with Qualcomm on New Venture
This new joint effort is geared toward bringing Ubuntu and Ubuntu Core to Qualcomm-powered devices.