Build your own web server in a few simple steps

Self Made

© Pavel Ignatov, 123RF.com

© Pavel Ignatov, 123RF.com

Author(s):

If you want to learn a little bit more about the communication between a web browser and an HTTP server, why not build your own web server and take a closer look.

Programming your own web server might seem like a difficult and unnecessary undertaking. Any number of freely available web servers exist in the Linux space, from popular all-rounders like Apache or NGINX to lightweight alternatives like Cherokee or lighttpd (pronounced "lighty").

But sometimes you don't need a full-blown web server. If you just want to share a couple of HTML pages locally on your own network or offer people the ability to upload files, Linux on-board tools are all it takes. A simple shell script is fine as a basic framework that controls existing tools from the GNU treasure chest. Network communication is handled by Netcat [1], aka the Swiss army knife of TCP/IP.

Getting Ready

With a project like this, the best place to start is at the root. Because a web server is still a server at the end of the day, it needs to constantly listen on a given port and respond appropriately to requests. Usually, web servers listen on port 80 for normal requests, and port 80 generally only accepts HTTP requests without encryption. The web server I'll describe in this article listens on ports 8080 and 8081 and communicates without encryption. If you are using a firewall and want to test the server on the local network, remember to allow these two ports in the firewall.

A web server needs a root folder from which it loads the requested HTML files. It also needs a directory in which it can store uploaded files. Your first step is to define a configuration using a series of simple variables at the start of the server script (Listing 1). And you need to create the directories, along with a FIFO file, either manually or using the Bash test builtin. The server6.sh script, which is included with the code from this article [2], offers a solution.

Listing 1

Configuration

HTTP_HOME=http_home
HTTP_UPLOAD=${HTTP_HOME}/upload
CACHE_DATEI=${HTTP_UPLOAD}/filetoprocess
FIFO_GET=fifo_get
HTTP_GET_PORT=8080
HTTP_POST_PORT=8081
MEINE_IP=$(ip addr show <enp2s0> | grep -Eo "([0-9]{1,3}\.){3}[0-9]+" | sed 1q)

In the last line of Listing 1, you can see that your own IP address is also important. You will need to modify the network device specification (the Ethernet interface enp2s0 in this example) to suit your own system. When a web browser tries to submit a file via a web form, it needs a target address. GET requests are the simplest approach to doing this. When a browser sends a GET request, it expects the content of a web page in response, and it displays this content in the browser window.

You'll also need to create some sample HTML files for testing your homegrown server. (See the box entitled "Sample Files.")

Sample Files

Files for testing the web server are easily scripted. The function in Listing 2 runs through a for loop seven times. The routine uses a here document (heredoc) to support the entry of HTML code almost 1:1 (third line). Heredocs let you refer to the variable set in the for statement, which then simply contains the sequence number.

Heredocs help to define sections of text in many programming languages. Unlike conventional output via echo or printf, line breaks, indents, and some special characters are preserved in the text. Bash also supports the use of variables in heredocs.

In this way, you can create as many HTML files as you need with just a few lines of code. You could optionally integrate additional dynamic content that you generate with a script within the heredoc.

Listing 2

Creating Sample Files

function create_files () {
  for x in {1..7}; do
    cat <<-FILE > ${HTTP_HOME}/datei${x}.html
      <html><head><meta charset="utf-8">
      <title>Page ${x}</title>
      </head><body>
      <p> $( date ) </p>
      <p> Page ${x} </p>
      </body></html>
    FILE
  done
}

GET Requests

Responding to a GET request entails much more than just sending the content of a file. HTTP and HTTPS require that additional information be sent along with the transmission. If you want to know what a response from a genuine web server looks like, type the following command:

wget --spider -S "https://www.zeit.de/index"

The wget utility downloads a web page from the terminal. The --spider option tells wget to behave like a web spider; in other words, it won't download the actual content but will check that the content is there and will receive the transmission information associated with an HTTP request.

In the first line, the server confirms that it is happy to take the HTTP request – HTTP/1.1 200 OK. Further lines in the form of value pairs (such as Connection: keep-alive, Content-Length:300) are used to send back additional information or instructions.

It also appears that this service is a well-secured web server, because it does not reveal precisely what kind of server program it is. Many servers out themselves at this point as server: nginx, for example – not advisable, because such disclosures makes things easier for attackers. If you want Netcat to behave like a genuine web server, you'll need a way to generate this header information associated with HTTP.

Netcat

Netcat is available on virtually any Linux system and can be used for many purposes given a little creativity on the user's part, although it admittedly has some limitations. You can emulate basic network operations using Netcat, but complex interactions are difficult or impossible. You definitely don't want to try to compete with Apache or NGINX just using Netcat.

If you want Netcat to permanently listen on a port and also send different responses, you have to combine it in a loop with a FIFO file. FIFO refers to the "first in, first out" principle. This means that the information comes back out of the file in the same order in which it was sent in [3]. Listing 3 shows an example.

Listing 3

Netcat Response

while true; do
  respond < $FIFO_GET | netcat -l $HTTP_GET_PORT > $FIFO_GET
done

The FIFO file improves the communication between Netcat and the respond function, as shown in Listing 4. Netcat listens on the specified port and writes to the FIFO file. On the left side of the pipe, you can see the call to the function that reads the browser request. It evaluates the request and then sends a matching response, containing an HTML header and HTML data, back through the pipe to Netcat. The respond function decides what to return to the browser.

Listing 4

FIFO File

01 function respond () {
02   read get_or_post address httpversion
03   if [ ${#address} = 1 ]; then
04     list_dir
05   elif [ ${#address} -gt 1 ]; then
06     return_file $address
07   fi
08 }

This variant is already a fairly powerful solution. If the length of the browser request is 1 (line 3), then it is /, and Netcat returns a directory listing. If the length is not equal to 1, Netcat returns the content of a file from the root directory. To get the web server to return a list of the files contained in the root folder, a very simple ls directory_name is all that is needed. However, the results then need to be embedded in suitable HTML code so that the links work and the browser can actually use them (Figure 1). The sed [4] stream editor is recommended for converting a directory listing into HTML code.

Figure 1: The DIY web server returns a listing of the root directory content.

Listing 5 shows the functions referenced in Listing 4. In the list_dir function, the directory content is output with a simple ls command. Sed then converts the results into plain vanilla HTML. The files generated by the function from Listing 2, which reside in the root directory, already contain HTML code. The server uses the return_file function in line 19 of Listing 5 to send a file back to the browser with a matching header.

Listing 5

Output

01 function list_dir () {
02   local output=$( ls --hide=upload -1 $HTTP_HOME | sed -r '
03   1 i<html><head><meta charset="utf-8"><title>Content</title></head>\
04   <body style="margin: 45px; font-family: sans-serif">
05   s#(.*)#<li><a href="\1">\1</a></li>#
06   $ a</body></html>
07   ' )
08
09   local content_length="Content-Length: $( cat <<<$output | wc --bytes )"
10
11   cat <<<$output | sed '
12   1 i HTTP/1.1 200 OK
13   1 i Server: Your GET SERVER
14   1 i Connection: close
15   1 i '"$content_length"'\n
16   '
17 }
18
19 function return_file () {
20   content=$( cat ${HTTP_HOME}/${1:1} )
21   if [[ $? -eq 0 ]]; then
22     laenge=$( cat <<<${content} | wc --bytes )
23     cat <<<${content} | sed -r '
24       1 i HTTP/1.1 200 OK
25       1 i Server: Your GET SERVER
26       1 i Connection: close
27       1 i Content-Length: '"$length"'\n'
28   else
29     cat <<-ERROR
30       HTTP/1.1 404 Not Found
31       Connection: close
32       Content-Length: 42
33
34       The requested page does not exist, sorry!
35   ERROR
36   fi
37 }

Because Netcat is continuously available for requests in the loop and sends a header and the corresponding HTML, a browser in the local network thinks it is dealing with a real web server.

However, it can also happen that the user manually requests a page in the browser that does not exist. This leads to the infamous 404 error, which you have probably seen on the web before [5]. The custom web server can also come up with this feature. If the cat command in the first line of the return_file function (line 20) throws an error, the else branch starting at line 28 is executed. The web browser then displays a message that the requested page does not exist.

POST Requests

Unlike GET requests, where the web browser wants to download files, there are also POST requests that allow the browser to send data to the web server. You can think of this as like posting something on social media. You type some text, add images, or even add videos in a box provided for that purpose, and then press Post. The content is then uploaded to the server and subsequently displayed under your profile. Our simple server only uploads files from a browser and saves them in the uploads/ folder.

Again, the browser sends a header indicating that it wants to post something. You can easily find out what a post request looks like by running the command in Listing 6. In the browser, call the web form in the root folder and send a file (Figure 2). After a few seconds, interrupt the Netcat command by pressing Ctrl+C. The browser displays a File arrived message, and the file is where you redirected it. But this is not a displayable JPEG file, because the file saved here still contains the header, as shown in Figure 3.

Listing 6

POST Simulation

$ echo "File arrived" | netcat -l 8081 > upload/filex.jpg
Figure 3: The upload request contains data that does not belong to the uploaded file.
Figure 2: Using the web form to upload files such as photos or texts to the web server.

Listing 7 shows how sed can get rid of the excess data that you do not want in the uploaded file. Sed handles this task in the while loop starting in line 9. Sed removes the header, boundary statements, the file name, and similar data. To compare this with what the data originally looked like, take a look at the cache file, which is also in the upload folder. If sed didn't remove all the ballast, the operating system would be unable to display the received files correctly.

Listing 7

Filtering

01 function run_post_server () {
02
03 message_for_post='HTTP/1.1 200 OK
04 Content-Length: 13
05 Connection: close
06
07 File arrived
08
09 while true; do
10   cat <<< $message_for_post | netcat -l $HTTP_POST_PORT > "${CACHE_DATEI}"
11   new_name=$( sed -r -n '/filename/{ s/(.*)(filename=")(.+)(".*)/\3/; p}' ${CACHE_DATEI} )
12   upload_path="${HTTP_UPLOAD}/${new_name}"
13   sed '1,/filename/d;/Content-Type/{N;d};$d' ${CACHE_DATEI} > "${upload_path}"
14 done
15 }
16
17 run_post_server &

In the background, the routine also calls the run_post_server function (line 17). This function contains a matching response for POST requests stating the content length in bytes and containing instructions to break down the connection after reading. Without these instructions, Firefox would simply keep the connection option, although the data has already been sent. The function launches in the background (&) to avoid it blocking everything as soon as the files have been sent.

Unchecked

Even if the web browser explicitly requests the root directory or another file, the web server can basically return whatever you want – you just need to declare the returned content correctly for the browsers. Listing 8 shows an example of this where the browser immediately displays a JPEG file on calling localhost:8080 or IP_address:8080 without any complaints.

Listing 8

Sending a JPEG File

#!/bin/bash
header="HTTP/1.1 200 OK"
myfile="http_home/upload/IMG-20220213-WA0002.jpg"
content_length="Content-Length: $( cat $myfile | wc --bytes )"
content_type="Content-Type: image/jpeg"
cat $myfile | sed -r -e "1 i $header" -e \
    "1 i $content_length" -e \
    "1 i $content_type" -e \
    "1 i Connection: close\n" | netcat -l 8080

The interesting thing here is not just that this works, but that it also represents a potential vulnerability. Apparently, most web browsers don't bother checking whether the content of the GET request and the returned page actually match. In this case, the browser asked for the index page of the web server and was given a JPEG file instead. That's something like a tennis player getting a basketball thrown at them by their opponent all of a sudden.

From experience, these idiosyncrasies, and many other features you might want to implement, are not very well documented on the web or are not documented at all. That's why it could be useful to log what Firefox and other browsers request. The function in Listing 9 starts the server. You can see two tee redirects there that forward all of the data to a logfile for debugging. This log will then contain the date and time, what the web browser sent as a request, and what the server sent back as a response (Figure 4). Armed with these details, you can analyze each request and response and understand what exactly is going on when the client and server talk.

Listing 9

Calling the Web Server

function run_server () {
  while true; do
    date | sed -r 's/^|$/\n/g' >> debug
    respond < $FIFO_GET | tee --append debug |
      netcat -l $HTTP_GET_PORT |
    tee --append debug > $FIFO_GET
  done
}
Figure 4: An excerpt from the debug file for port 8080.

For example, many browsers ask for the famous favicon.ico after they have talked to a server for a little while. This is the icon that you usually see at the top of the browsers' tabs. It is usually found in the web server's root folder.

If you want your own server to provide a favicon, you first need to find out what the browser request looks like and then tell the server to respond appropriately. You can tell that the web browser often asks for this file from the error message cat: http_home/favicon.ico: file or directory not found in the logfile.

Conclusions

As you can see, a rudimentary web server is quite easy to build yourself. The web server presented in this article has a plain and simple design, but it is not intended to compete with major league players like the Apache web server or NGINX. On the other hand, your homegrown web server does have some capabilities that a typical web server can't offer. For instance, you can access the whole repertoire of shell commands to display information locally with minimal overhead. The resources consumed by the small script are also minimal. This DIY server is quite useful as an info server on your own network, and you can also use it to transfer files from one computer to another – all told, not a bad solution for small tasks.

The Author

Goran Mladenovic is a hobby developer and inventor, who believes programming is a passion.