The latest challenge to copyleft

Copilot

© Photo by Arie Wubben on Unsplash

© Photo by Arie Wubben on Unsplash

Article from Issue 251/2021
Author(s):

GitHub's Copilot takes code autocompletion to a new level but raises copyleft licensing issues.

In a world where most people carry a cell phone, autocomplete might seem too widely used to cause contention. However, Copilot [1], a new code autocompletion service for Visual Studio Code (Figure 1), is already raising licensing issues, especially for copyleft licenses such as the different versions of the GNU General Public License (GPL) that are the backbone of free software.

Figure 1: Copilot is autocompleter for code that raises challenges for copyleft licenses.

Code assistants are not new. However, Copilot claims to be in a class of its own. Developed by GitHub, Copilot is built on Codex, the new AI system created by OpenAI, and trained with all the public code on GitHub in dozens of programming languages. With this backing, Copilot includes the potential to reduce the time coders take to write or find examples or to learn a new programming language. It also makes possible innovative features such as the conversion of code descriptions in comments to code, autofill for repetitive code, suggestions for code tests, and lists of alternatives, taking autocompletion to entirely new levels.

However, along with these completions comes new licensing problems. Copilot's developers quibble that it is a code synthesizer [2], not a search engine, and insist that "the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set. Many of these cases happen when you don't provide sufficient context (in particular, when editing an empty file), or when there is a common, perhaps even universal, solution to the problem. We are building an origin tracker to help detect the rare instances of code that is repeated from the training set, to help you make good real-time decisions." In other words, exact copying is rare, usually the user's fault, and should be detectable in the future.

Meanwhile, and perhaps even after an origin tracker is included, the possibility remains for unintentional copyright violations. For example, public domain code may still have copyright restrictions. If you borrow public domain code with restrictions, have you violated copyright?

The potential for violations is even stronger with copyleft, which is likely to be common in the code used for training. How much code, if any, can be copied from an application released under a version of the GPL? Does borrowing via Copilot make your code a derivative of GPL code that must therefore also be released under the same license? If so, then under the terms of the GPL, must you include copyright notices and disclaimers?

GitHub's assumption seems to be that using code suggested by Copilot is not a violation of copyleft licenses, because it is based on the training data en masse, not the work of any individual. However, that position causes its own problems. If GitHub is correct, Copilot becomes a means to sneak copyleft code into proprietary programs, making copyleft licenses ineffectual. This will result in challenges from institutions such as the Linux Foundation or the Software Freedom Law Center, which might drag on for years, no doubt accompanied by conspiracy theorists reminding everybody that GitHub is owned by Microsoft. Whichever way you look at it, Copilot seems to be a recipe for chaos.

Hunting for a Solution

Julia Reda, a former German politician who advocates copyright reform and is currently a member of the Berkman Klein Center for Internet & Society at Harvard, has blogged in detail about these alternative viewpoints [3]. Her position is not at all what many free software supporters would like to see. Rather, she warns that applying copyright to Copilot's position would be impractical and self-defeating.

To start with, Reda maintains that copyright infringement does not apply to small snippets of code. Rather, she maintains that copyright violation only applies when the "excerpt used is in turn original and unique enough to reach the threshold of originality." Otherwise, accusations of copyright violations would be made constantly over trivial or unavoidable borrowings. Most borrowing through Copilot, she maintains, is likely to be too limited to reach this threshold – similar to how short quotations in the media aren't copyright violations of novels.

Reda goes on to argue that while considering Copilot's suggestions as derivative works, which would place them under the original copyleft licenses, might be satisfying to free software advocates, this position would create its own problems. In particular, it would mean that all AI-generated material would also be subject to copyright. Although her suggestion that a music label might train "an AI with its music catalogue to automatically generate every tune imaginable" seems far-fetched, it would amount to an extension of copyright – the exact opposite of what most copyleft advocates would desire. As she points out, companies at the World Intellectual Property Organization (WIPO) are already lobbying to apply copyright to AI-generated works, a change that would most benefit major corporations such as Microsoft. In other words, in winning the Copilot battle, free software could lose the war.

However, Reda also rejects GitHub's position. Code, she points out, is not the anonymous work of machines, but of individual coders. According to her, "Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work. This means that machine-generated code like that of GitHub Copilot is not a work under copyright law at all, so it is not a derivative work either. The output of a machine simply does not qualify for copyright protection – it is in the public domain. That is good news for the open movement and not something that needs fixing."

The trouble with this position is that Copilot would remain a means for bypassing the provisions of the GPL. In addition many free software supporters would not find Reda's solution – to leave things as they are – acceptable, especially those who fear the hand of Microsoft behind Copilot. For this reason, the issues raised are unlikely to have a quick solution, particularly at a time when the Free Software Foundation is divided and attempting to reorganize. Will the corporations of the Linux Foundation move to protect their investment in free software instead? So far, the only thing that is clear is that the problem will not disappear easily or quickly.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • maddog's Doghouse

    If an artificial intelligence produces something new, who owns the new creation?

  • Meet Free Software Pioneer Eben Moglen

    Few have had a closer view of the Free Software revolution than Eben Moglen, former lead counsel for the Free Software Foundation and founder of the Software Freedom Law Center. We asked Moglen about the legal basis for the GPL's famous copyleft protection and the long, steady effort to tell the world about the benefits of free software.

  • Kernel News

    Chronicler Zack Brown reports on the latest news, views, dilemmas, and developments within the Linux kernel community.

  • GPLv3 Comes to the Rescue of GPL Violators

    Red Hat adopts GPLv3 cure provisions to help companies fix GPL violations.

  • The GPL and the birth of a revolution

    The GNU General Public License was born of the simple idea that freedom matters. Yet this simple tool for protecting freedom has another important feature that makes it even more powerful, and that is the ability to build communities.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News