Automate your web logins

Log In and Go

© Lead Image © thotti, photocase.com

© Lead Image © thotti, photocase.com

Author(s):

Automated web logins with command-line tools and Selenium ensure you don't miss scheduling an activity.

During the COVID-19 lockdown, many activities like pools, gyms, and golf courses required people to sign in to websites before they could access these activities. These precautions helped to maintain a safe environment; however, the booking process was awkward, and it was easy to miss an activity if you weren't signed up early enough.

Luckily, some great Linux tools can automate web logins. In this article, I share two techniques to create automated web logins. The first technique uses command-line tools like xte and xdotool. This approach allows simple Bash scripts to replicate how you would use keystrokes to access web pages.

The second technique uses the Selenium Python library. The Selenium API allows you to tackle more complex projects by giving you access to the full Document Object Model (DOM) of a web page.

Keyboard Simulation

The most popular choices for keyboard and mouse simulation are the xautomation package [1] and the xdotool utility [2]. The xdotool utility is feature-rich, with special functions for desktop and window functions. The xte tool, a part of xautomation, is a little simpler, focusing entirely on keyboard and mouse simulation.

The wmctrl [3] utility is also very useful to help you determine which windows are open on your desktop, and it can also set the active window with a text substring.

In Ubuntu, enter

sudo apt-get install xautomation xdotool wmctrl

to install the xautomation package and the xdotool and wmctrl utilities.

Log In with xte

With the xte utility, you can send a single keyboard character or strings of characters. A Bash script that uses xte commands can emulate your actions to log in manually to a web page.

Typically people use the mouse on web pages, which is quite different from logging in 100 percent with the keyboard. Web pages often have a number of clickable items before the main form entry area, so it is important to step through and document the login procedure manually. A good simple example is to try and log in to Netflix (Figure 1).

Figure 1: Netflix sign in with the xte command.

The Bash script in Listing 1 uses xte to automate the Netflix sign-in. This script opens a Chrome browser page (line 10) and then sets the focus to this page (line 12). Next, it sends the correct tab, text, and return key sequences (lines 15-22).

Listing 1

Netflix Sign-In

01 #!/bin/bash
02 # netflix_login.sh - script logs into Netflix
03 #
04
05 url="https://www.netflix.com/ca/login"
06 email="my_email.com"
07 pwd="my_password"
08
09 # open browser to wait for the page to open, then set focus to it
10 chromium-browser $url &
11 sleep 2
12 wmctrl -a "Netflix - Chromium"
13
14 sleep 1 # allow time to get focus before sending keys
15 xte "key Tab"
16 xte "key Tab"
17 xte "str $email"
18 xte "key Tab"
19 xte "str $pwd"
20 xte "key Tab"
21 xte "key Tab"
22 xte "key Return"
23
24 echo "Netflix Login Done..."

Setting the window focus can be tricky if you have a number of windows open. The command wmctrl -l lists all open windows, and the command

wmctrl -a '<some title info>'

sets the mouse and keyboard focus to a specific window from a substring of the window title.

Book with xdotool

The xdotool syntax also sends keystrokes and text and is very similar to xte, but with a few extra features. A park booking example (Figure 2) is a bit more complex, because a booking time needs to be selected from a list. For this project, the automation script needs to manage eight entry fields (to keep things simple, I'll pass the date in the URL) and select a time slot.

Figure 2: Automating a park booking with a time search.

Neither the xte nor xdotool utility supports a search text function. A simple workaround is to use the web browser's search function. By enabling caret (text cursor) navigation, it's possible to move the active cursor location according to the browser's search results.

The caret dialog is shown by pressing F7 (Figure 3). It's important to note that the caret enable/cancel and Yes/No buttons can vary between browsers.

Figure 3: Improve keyboard navigation with caret browsing.

The Bash script in Listing 2 uses the browser's search dialog to find and select a 10:00am time slot for a park. One of the first steps is to enable caret navigation (lines 12-13).

Listing 2

Book a Park Visit

01 #!/bin/bash
02 # book10am.sh - make a 10:00 park booking
03 #
04 sdate="startDate=2021-04-23" #adjust the date
05 url="https://book.parkpassproject.com/book?inventoryGroup=1554186518&&inventory=1229284276&$sdate"
06
07 chromium-browser $url & #open browser to park booking page
08 sleep 5 # wait for browser to come up
09 wmctrl -a "Chromium"
10 sleep 2
11 # Turn on caret browsing
12 xdotool key F7
13 xdotool key Return
14 sleep 1
15
16 # tab to 'Time Slot' area
17 tabcnt=8
18 xdotool key --repeat $tabcnt --delay 100 Tab
19
20 xdotool key Return
21 sleep 1
22
23 # Search for 10:00 time and select it
24 xdotool key ctrl+f
25 xdotool type '10:00'
26 xdotool key Return
27 # Close find dialog and select time
28 xdotool key Tab Tab Tab Return Return
29
30 echo "Park Time Booking Complete"

A useful feature of xdotool is the repeat with a delay option (lines 17-18). In this script, I used this feature to tab eight times to get to the Time Slot field. A Ctrl+F keystroke opens the browser search dialog (line 24). Next, the xdotool type option passes in the '10:00' time string (line 25). The final step is to close the search dialog and hit Return to select the 10:00 AM – 12:00 PM time slot (line 28).

Script Limitations

The xdotool and xte utilities are great for simple web page automation when the HTML form items are sequential and no special decision making is required. Unfortunately, I found that when I tried to book a park time on the weekend, I started to see some limitations (Figure 4). During busy times, if I tried to book by time, xte and xdotool could not determine whether the time slot was taken. A simple workaround would be to search for the first Available or Not Busy slot, but this doesn't allow you to pick times you like.

Figure 4: Limitations with keystroke automation.

For projects that require some logic (like choosing a good time from a list of times), Selenium with Python is an excellent fit.

Selenium with Python

Selenium [4] is a portable framework for testing web applications, with client-server tools and an IDE. The Selenium WebDriver component (available for Firefox, Google Chrome, Internet Explorer, Safari, Opera, and Edge) sends commands from client APIs directly to a browser. Client APIs are available for C#, Go, Java, JavaScript, PHP, Python, and Ruby. The Selenium Downloads page [5] has details on installation of the WebDriver scripts.

To install the Linux 32-bit Selenium driver (geckodriver) for Firefox, enter:

wget https://github.com/mozilla/geckodriver/releases/download/v0.29.1/geckodriver-v0.29.1-linux32.tar.gz
tar -xvzf geckodriver-v0.24.0-linux32.tar.gz
chmod +x geckodriver
sudo mv geckodriver /usr/local/bin

To install the Selenium library for Python, enter:

pip install selenium

The big difference between the xte or xdotool utility and Selenium is that Selenium can access the HTML code of the selected web page directly.

Log In with Selenium and Python

As for xte and xdotool, you need to do some background manual work before writing the script. Once the required web page is open, you can use the Web Developer Inspector tool to examine HTML code. To access the Inspector, Select Tools | Web Developer | Inspector from the top menubar or use the shortcut Ctrl+Shift+C.

For the Netflix sign-in example, the Email or phone number and Password inputs are needed (Figure 5). When the Inspector is open, items selected on the web page are highlighted in the Inspector pane. In this example, the Email or phone number entry uses id="id_userLoginId", and the password entry uses id="id_password". Listing 3 shows the Python code that signs in to Netflix.

Listing 3

Netflix Sign-In with Selenium

01 #
02 # netflix_login.py - automate Netflix Login
03 #
04 from selenium import webdriver
05
06 url="https://www.netflix.com/ca/login"
07 email="my_email.com"
08 pwd="my_password"
09
10 browser = webdriver.Firefox()
11
12 browser.get(url)
13
14 # wait for page to refresh
15 browser.implicitly_wait(10)
16
17 username = browser.find_element_by_id('id_userLoginId')
18 username.send_keys(email)
19
20 password = browser.find_element_by_id('id_password')
21 password.send_keys(pwd)
22
23 password.submit()
24
25 print("Login Complete")
Figure 5: Finding input field IDs with Inspector.

When a web page is called, it's important to give the page some time to refresh. The implicity_wait(10) call (line 15) waits up to 10 seconds for a Selenium query to complete.

HTML items can be found by either ID (find_element_by_id()) or by name (find_element_by_name()). A Selenium object needs to be created before initiating any action on it. Line 17 finds and then creates a username object from ID 'id_userLoginId'. The send_keys() method is used to pass text strings to <input> tags (lines 18 and 21). Calling the submit() method on any input object will send all the form data as a request to the web server (line 23).

Selenium Searches

From the earlier park booking example, you saw that xte and xautomation had some limitations when a variable list of options was presented. Luckily Selenium has a number of functions that can be used for searching HTML tags and text. Like the last example, the first step is to open the web page and inspect the structure manually (Figure 6).

Figure 6: Getting list details.

For this example, the Inspector shows that each status entry in the list has a <div class="jss97"> that could have a Past, Available, Not Busy, or Full status. The top-level <div class="jss94"> has both the times and the status messages. Knowing the top-level div class now makes it possible to search for the park's time slots and get the status of each of the times.

Figure 7 shows an example that searches for the first Not Busy time slot. As in the earlier xdotool example, the time slot list needs to be clicked to open. In Python code, this is done by finding and then clicking on the timefield object.

Figure 7: Finding the first Not Busy time.

The key piece in this code is:

thetimes = find_elements_by_class_name("jss94")

This operation will build an array (thetimes) of all the time slots with their status messages.

Next, a for loop can examine each time slot. In this example, the code looks for the first time a time slot is Not Busy:

# Get the top level div
thetimes = browser.find_elements_by_class_name("jss94")
for itimes in thetimes:
    # Find the first "Not Busy"
    if "Not Busy" in itimes.text:
        print(itimes.text)
        itimes.click()
        break

Logic could be written for different conditions, like looking for time slots between 9 and 11am, and if none are found, then looking for time slots between 2 and 4pm.

Final Comments

After using the various methods discussed in this article, I found that:

  • Often my apps written during off hours would not work during peak times because I had not accounted for the increased peak callup delays.
  • The browser search dialog with xte/xdotool was extremely useful because it allowed me to jump to specific areas of a web page, rather than tabbing to it.
  • Creating apps with xte or xdotool is considerably easier than using Python with Selenium. I found that some web pages were incredibly complex, and it often took some time to find the required IDs that Selenium needed.
  • For large web entry pages, you can always create automated web logins by mixing and matching the xte/xdotool utilities and Python.
  • Two huge advantages in using Selenium are being able to add some decision-making logic and implicity_wait() methods, which wait until the page is ready and is a lot more efficient than putting in a long sleep time.