Let an AI chatbot do the work

Programming Snapshot – ChatGPT

Author(s):

The electronic brain behind ChatGPT from OpenAI is amazingly capable when it comes to chatting with human partners. Mike Schilli picked up an API token and has set about coding some small practical applications.

Every month just before Linux Magazine goes to press, I hold secret rites to conjure up an interesting topic at the last minute. So it is with interest that I have followed the recent meteoric rise of the AI chatbot ChatGPT [1], which – according to the alarmist press – is so smart that it will soon outrank Google as a search engine. Could this system possibly help me find new red-hot article topics that readers will snap up with gusto, greedily imbibing the wisdom I bundle into them?

The GPT in ChatGPT stands for Generative Pre-trained Transformer. The AI system analyzes incoming text, figures out what kind of information is wanted, mines appropriate responses from a massive data model trained with information from the Internet, and repackages it as a text response. Could the electronic brain possibly help me find an interesting topic for this column?

I put it to the test and directed the question to the chatbot. To do this, I typed the question in the search box of ChatGPT's website (Figure  1), and, to my amazement, the AI actually did come up with a couple of usable topics. Introductions to Rust and Go or a deep dive into parallel programming – that sounds interesting!

Figure 1: ChatGPT helps Mike find topics for new issues of this column.

ELIZA 2.0

The interface feels like a modern version of ELIZA [2], the standard program from the stone age (1960) of this genre. In fact, after ELIZA, almost no progress was made in terms of AI and word processing for half a century, despite long-winded optimistic announcements. In the wake of deep learning and the associated neural networks, however, the field has experienced such a tremendous boost in the past 10 years that it is now difficult to tell whether you are chatting with a computer or a human being. Today, professional novelists [3] and newspaper writers even freely admit to letting their computer co-drivers take the wheel for wordy passages and plot development [4].

If you type questions for the bot in ChatGPT's browser interface, the system answers after a brief consideration with a jerky text flow, just like a call center person talking to an inquiring customer. The AI not only answers questions about general knowledge, but also writes entire newspaper articles on command. For example, within half a minute of the command "Write an introductory magazine article …," as shown in Figure 2, the text beneath, introducing parallel programming with Go, trickled out of the browser. The results seem amazingly professional. And you can do this in other languages, too. You might suspect that hordes of sports and financial journalists have been using similar systems for years to produce the same article formats on minor league games or stock market fluctuations by the dozen.

Figure 2: If so desired, the AI system will write complete articles on the fly.

Terminal as a Life Coach

Now, if you don't want to keep signing into OpenAI.com with your email address, password, and annoying I-am-not-a-robot challenges before you can enter your requests, you can register on the developer page and retrieve an API key [5] (Figure 3). If a program provides this key as an authentication token in API requests, OpenAI.com responds via HTTP response, and simple command-line tools can snap up and print the response in the terminal. This gives you a ready-to-rumble command-line interface (CLI) to pretty much all of today's common knowledge.

Figure 3: You can get an API token for free to avoid the hassle of constantly signing into the OpenAI website.

The auth token for playing around is available after registration with a valid email address and phone number. OpenAI additionally offers a payment model [6], with token-based charges to your credit card. Because limits in free mode are so generous, you won't incur costs for normal use, so you can usually skip entering your credit card data in the first place.

Using the go-gpt3 package from GitHub, the Go program in Listing 1 communicates with GPT's AI model with just a few lines of code. The program sends questions – known as prompts – to the GPT server as API requests under the hood. If the API server understands the content and finds an answer, it bundles what is known as a completion into an API response and returns it to the client.

Listing 1

openai.go

01 package main
02 import (
03   "context"
04   "fmt"
05   gpt3 "github.com/PullRequestInc/go-gpt3"
06   "os"
07 )
08 type openAI struct {
09   Ctx context.Context
10   Cli gpt3.Client
11 }
12 func NewAI() *openAI {
13   return &openAI{}
14 }
15 func (ai *openAI) init() {
16   ai.Ctx = context.Background()
17   apiKey := os.Getenv("APIKEY")
18   if apiKey == "" {
19     panic("Set APIKEY=API-Key")
20   }
21   ai.Cli = gpt3.NewClient(apiKey)
22 }
23 func (ai openAI) printResp(prompt string) {
24   req := gpt3.CompletionRequest{
25     Prompt:      []string{prompt},
26     MaxTokens:   gpt3.IntPtr(1000),
27     Temperature: gpt3.Float32Ptr(0),
28   }
29   err := ai.Cli.CompletionStreamWithEngine(
30     ai.Ctx, gpt3.TextDavinci003Engine, req,
31     func(resp *gpt3.CompletionResponse) {
32       fmt.Print(resp.Choices[0].Text)
33     },
34   )
35   if err != nil {
36     panic(err)
37   }
38   fmt.Println("")
39 }

Listing 1 wraps the logic for communication in a constructor NewAI() and a function init() starting in line 15; this fetches the API token from the environment variable APIKEY and uses it to create a new API client in line 21. To allow other functions in the package to use the client, line 21 drops it in a structure of the openAI type defined in line 8. The constructor gives the calling program a pointer to the structure, and Go's receiver mechanism includes this with the function calls.

Object Orientation in Go

The context object created in line 16 is used to remotely control the client if it times out or otherwise cancels an active request. This is not needed in the simple application in Listing 1, but the gpt3 library needs it anyway, so line 16 also stores the context in the OpenAI structure so that the printResp() function can access it later, starting in line 23.

Prompts and Completions

This is where the actual web access takes place. Line 24 creates a structure of the type CompletionRequest and stores the request's text in its Prompt attribute. The MaxTokens parameter sets how deep the speech processor should dive into the text. The price list on OpenAI.com has a somewhat nebulous definition of what a token is [6]. Allegedly, 1,000 tokens are equivalent to about 750 words in a text; in subscription mode, the customer pays two cents per thousand tokens.

The value for MaxTokens refers to both the question and the answer. If you use too many tokens, you will quickly reach the limit in free mode. However, if you set the value for MaxTokens too low, only part of the answer will be returned for longer output (for example, automatically written newspaper articles). The value of 1,000 tokens set in line 26 will work well for typical use cases.

The value for Temperature in line 27 indicates how hotheaded you want the chatbot's answer to be. A value higher than   causes the responses to vary, even at the expense of accuracy – but more on that later. The actual request to the API server is made by the CompletionStreamWithEngine function. Its last parameter is a callback function that the client will call whenever response packets arrive from the server. Line 32 simply prints the results on the standard output.

Forever Bot

To implement a chatbot that endlessly fields questions and outputs one answer at a time, Listing 2 wraps a text scanner listening on standard input in an infinite for loop and calls the function printResp() for each typed question in line 16. The function contacts the OpenAI server and then prints its text response on the standard output. Then the program jumps back to the beginning of the infinite loop and waits for the next question to be entered.

Listing 2

chat.go

01 package main
02 import (
03   "bufio"
04   "fmt"
05   "os"
06 )
07 func main() {
08   ai := NewAI()
09   ai.init()
10   scanner := bufio.NewScanner(os.Stdin)
11   for {
12     fmt.Print("Ask: ")
13     if !scanner.Scan() {
14       break
15     }
16     ai.printResp(scanner.Text())
17   }
18 }

Figure 4 shows the output of the interactive chat program that fields the user's question on the terminal, sends it to the OpenAI server for a response, and prints its response to the standard output. The bot waits for the Enter key to confirm sending a question, prints the incoming answer, and jumps to the next input prompt. Pressing Ctrl+D or Ctrl+C terminates the program.

Figure 4: The chatbot is shown here as a terminal application with OpenAI.com as the back end.

It is important to install the API token in the APIKEY environment variable in the called program's environment; otherwise, the program will abort with an error message. In Figure 4, the user asks the AI model about the advantages and disadvantages of Wiener schnitzel [7] and receives three pro and con points each as an answer. It turns out that the electronic brain has amazingly precise knowledge of everything that can be found somewhere on the Internet (and preferably on Wikipedia). It is actually capable of analyzing this content semantically, storing it in machine readable form, and answering even the most abstruse questions about Wiener schnitzel in a meaningful way. The chat binary with the ready-to-run chatbot is generated from the source code in the listings as usual with the three standard commands shown in Listing 3.

Listing 3

Creating the Binary

$ go mod init chat
$ go mod tidy
$ go build chat.go openai.go

By the way, I've found that the AI system's knowledge is not always completely correct. In the answer to the question about who the author of this column is, it lists two Perl books that I have never written (Figure 5). For reference, the correct titles would have been Perl Power and Go To Perl 5. More importantly, there's no good way to find out what's right and what's wrong, because the bot never provides any references on how it arrived at a particular answer.

Figure 5: The AI even knows the author of this column.

Setting the Creative Temperature

Programs can also ask the API to increase the variety of completions provided by the back end. Do you want the answers be very precise or do you prefer to have several varying answers to the same question in a more playful way, even at the risk of them being not 100 percent accurate?

This behavior is controlled by the Temperature parameter in line 27 of Listing  1. With a value of   (gpt3.Float32Ptr(0)), the AI robotically gives the same precise answers every time. But even with as low a value as 1, things start to liven up. The AI constantly rewords things and comes up with interesting new variations. At the maximum value of 2, however, you get the feeling that the AI is somewhat incapacitated, causing it to output slurred nonsense with partly incorrect grammar. In Figure  6, the robot is asked to invent a new tagline for the Go programming column. With a temperature setting of 1, it provides several surprisingly good suggestions.

Figure 6: With a temperature value of 1, the AI continually comes up with new suggestions.

Translation at Your Command

Because users communicate with the AI back end via questions as text blocks instead of differently named API calls, you can now easily build more specialized tools that follow the pattern of Listing 2. How about a translation chatbot that translates texts from different languages into French? Listing 4 reads texts and sends them to the API with the instruction to "Translate to French" in the prompt.

Listing 4

french.go

package main
import (
 "bufio"
 "os"
)
func main() {
 ai := NewAI()
 ai.init()
 scanner := bufio.NewScanner(os.Stdin)
 text := ""
 for scanner.Scan() {
 text += scanner.Text()
 }
 ai.printResp("Translate to French:\n" + text)
}

Because the server can parse requests in many different languages, the API request does not even need to specify the source language of the provided text to be translated. The AI discovers this automatically on reading the question. Figure  7 shows that the electronic brain translates questions from German and English tourists concerning the route to the Eiffel Tower into very usable French. You can compile the french binary by typing

go build french.go openai.go
Figure 7: The electronic brain also translates from and to a wide variety of languages.

This links the new program to the library from Listing 1.

In the examples so far, we've used the el cheapo mode of chatbot with MaxTokens set to 1000, giving us relatively brief answers. For longer texts, however, Listing 1 needs to set MaxTokens to a higher value. At some point, though, this will lead to OpenAI wanting to be compensated for the work. You will then need to add a means of payment to the account in the form of a credit card.

Hard Limits

In summary, for the Davinci003 model that Listing 1 set as the electronic brain to use in line 30, the AI is surprisingly knowledgeable about real-world facts. However, this knowledge stops abruptly in the year 2021, because more recent events have not yet been fed in. For example, the server has to pass on questions about the Soccer World Cup 2022 in Qatar, and when asked about the Secretary of Defense, the server will respond with the appointee from the Trump administration. The next update to the model will hopefully get it up to speed with current events.

Disturbing Worlds

The AI behind ChatGPT can understand texts, analyze problems, answer questions, and produce output. But it is also capable of image processing. For example, using the API, it can analyze uploaded images, assign them to a category, or subject them to amazing transformations.

For example, Listing 5 runs a curl command to contact the OpenAI API endpoint images/variations. The back end then goes ahead and disturbingly modifies the original photo of a Wiener schnitzel that I cooked myself [7]. It is important to note that this particular end point only accepts square photos that are in PNG format and are no larger than 4MB.

Listing 5

var.sh

curl https://api.openai.com/v1/images/variations \
 -H "Authorization: Bearer sk-XXXXXXXXXXXXX" \
 -F image='@test.png' -F n=3 -F size="1024x1024"

To use it, I spun up the trusted convert tool from the ImageMagick collection to reformat my schnitzel photo to create a PNG image measuring 1000x1000 pixels, using the -crop and -resize options. When called with a valid API token, the curl command – which parses and uploads the test.png file – returns three different URLs in the JSON response. Each of them results in a photo variation generated from the original, which curl can subsequently retrieve from the server and store on the local drive.

Figure 8 shows the uploaded original, and Figure 9 shows one of the three variants OpenAI offered for download. You can see that the schnitzel is now on a different plate and surrounded by a light brown sauce. Next to the plate there is a beer glass full of salad. And instead of good old American India Pale Ale, you get what looks like a sake rice wine bottle top left.

Figure 8: From this schnitzel image, the AI generated …
Figure 9: … this rather disturbing variation.

On closer inspection, even the schnitzel is no longer the original. Instead, the algorithm used by the AI seems to have combined my photo with one from its archive, probably from a Japanese tonkatsu restaurant. We live in amazing times.

The Author

Mike Schilli works as a software engineer in the San Francisco Bay Area, California. Each month in his column, which has been running since 1997, he researches practical applications of various programming languages. If you email him at mailto:mschilli@perlmeister.com he will gladly answer any questions.