Google Takeout: viewing what Google knows about you

Off the Beat: Bruce Byfield's Blog
Ever wonder what information Google has collected about you? Now, you can find out, thanks to Google Takeout, which allows you to download most of the information that Google has collected about you.
The question should be of more than passing interest to just about everyone. Few people may have bought Google's Chromebook with its web-based applications, but Google still dominates our computer lives. We use it to receive emails. We store pictures and documents on it. We socialize on it -- and, all the time, Google is collecting information about us.
Google Takeout is a creation of the Data Liberation Front, which describes itself as
"an engineering team at Google whose singular goal is to make it easier for users to move their data in and out of Google products. We do this because we believe that you should be able to export any data that you create in (or import into) a product. We help and consult other engineering teams within Google on how to 'liberate' their products."
You can run Google Takeout to see what information Google has stored about you, then go the Data Liberation Front site for instructions about how to remove data from specific Google products. Not all Google products have been "liberated," although most of the major ones have.
What is unclear, however, is how official Google Takeout or the Data Liberation Front is. Google Takeout appears to be hosted on Google servers, but the Data Liberation Front site gives no information about where it is hosted, doesn't identify anyone associated with the project, and gives no contact information other than a Twitter account.
Consequently, whether either is officially supported by Google, semi-official, or clandestine is uncertain. I'm guessing they may be projects Google employees have undertaken in their twenty percent time -- the time Google gives employees to work on private projects -- but don't know.
What I can say is that running Google Takeout is an educational and somewhat alarming experience.
Revelations in the archive
Running Google Takeout is as easy as logging into your Google account, and selecting which services to include in the archive of your data. If you want, you can first review a summary of the information collected by each service by clicking the download button. Probably, the largest omissions are GMail, which I suspect is the most heavily used Google service, and search engine records.
Obviously, the time needed to create the archive depends on how heavily you have used Google, but the result is a zip archive named for your account saved to your hard drive, neatly divided with a separate folder for each service.
In my case, the archive was just over 4.4 megabytes -- but, then, compared to other people, I am undoubtedly a light user of Google services, especially when GMail and searches are omitted. The service I most heavily use is Google+, and even that I've lost interest in because of its refusal to accept pseudonyms -- and even, in some cases, non-European names.
Still, even as a light user, I was taken aback by how much information Google was storing about me. I shouldn't have been, I know -- I willingly provided all that information, and I could see no sign that Google was storing anything I hadn't authorized.
All the same, seeing all the accumulated information was a shock. It is one thing to know that Google never throws out old information, and another one to realize that documents abandoned six years ago in Google Docs are still around. In my case, they are mostly test documents and likely to be of minimal interest to anyone, but if I had other work habits, their continued existence would raise concerns about privacy and security.
Similarly, while I was obviously aware that pictures posted on Google+ had to be stored somewhere, I was puzzled to see I had graphics stored in Picasa. Since I have never used Picasa separately, I took a few seconds to realize how they had got there. This experience convinced me that, in recently announcing the centralization of its services, Google was only making official what was already happening, but it also raises security questions. After all, making sure that your data is safe becomes harder when you are unaware of exactly which service it is stored in.
Then there's the information I was aware of. In the Streams folder, the archive included every posting I had every made on Google+, as well as all my contacts and circles (groupings of people I follow, if you don't happen to use Google+). All those individual decisions to post, I quickly discovered as I read them together, add up to a thorough picture of my online persona, especially since many of the posts are links to articles I've published.
Even more seriously, the people I choose to follow and the circles in which I've arranged them easily tells information that goes far beyond what I ever intended to give. From my circles, for example, anyone reading the information could tell something about my family and professional associates, and therefore about interests and connections I might prefer to keep private.
These discrete pieces of information could easily be combined with other archived information such as my pictures to tell more about me than I ever intended. For instance, from my pictures, one might deduce what I enjoy buying, and, from the circles, from whom I buy.
True, Google shows no signs of selling such information to advertisers or retailers. But what if Google's security is breached? Like most people, I have no informed opinion about the quality of Google's security. Yet,one piece at a time, I have entrusted more deeply personal information to Google. The fact that Google probably has less information about me than about most people is no comfort, because I find that, just by using Google's services, I have let my information be made available in ways in which I never consented.
Taking responsibility
I don't mean to be paranoid. Nor am I suggesting that Google is untrustworthy or deceptive. Its services are convenient, and perhaps some small losses of privacy are a reasonable exchange for that convenience.
Yet I am disturbed by how little Google emphasizes this potential lack of privacy, and how willingly I went along with it, less than half aware of what I was doing. If someone like me, with a reasonable lay knowledge of security and privacy issues, can fall into such complacent behavior, there must be millions of users who are even more naive than I was, and entrusting far more potentially damaging information to Google.
Even more importantly, what about the mail and search services not included in Google Takeout? If our uses of other services contain so much information, how much more do these popular services contain?
I can't answer that question. But I do know that over the next few days I will be using the Data Liberation Front's tools to remove unnecessary information from as many Google services as I can. At future intervals, I will repeat the process. In addition, I'll consider what Google services I might do without.
I have also decided that, instead of turning to Google twenty or thirty times a day for search results, I will transition completely to DuckDuckGo, a small search engine that claims not to store records of your search. I consider these decisions not paranoid, but simply small steps towards being more responsible about my online habits.
comments powered by DisqusSubscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.
News
-
The GNU Project Celebrates Its 40th Birthday
September 27 marks the 40th anniversary of the GNU Project, and it was celebrated with a hacker meeting in Biel/Bienne, Switzerland.
-
Linux Kernel Reducing Long-Term Support
LTS support for the Linux kernel is about to undergo some serious changes that will have a considerable impact on the future.
-
Fedora 39 Beta Now Available for Testing
For fans and users of Fedora Linux, the first beta of release 39 is now available, which is a minor upgrade but does include GNOME 45.
-
Fedora Linux 40 to Drop X11 for KDE Plasma
When Fedora 40 arrives in 2024, there will be a few big changes coming, especially for the KDE Plasma option.
-
Real-Time Ubuntu Available in AWS Marketplace
Anyone looking for a Linux distribution for real-time processing could do a whole lot worse than Real-Time Ubuntu.
-
KSMBD Finally Reaches a Stable State
For those who've been looking forward to the first release of KSMBD, after two years it's no longer considered experimental.
-
Nitrux 3.0.0 Has Been Released
The latest version of Nitrux brings plenty of innovation and fresh apps to the table.
-
Linux From Scratch 12.0 Now Available
If you're looking to roll your own Linux distribution, the latest version of Linux From Scratch is now available with plenty of updates.
-
Linux Kernel 6.5 Has Been Released
The newest Linux kernel, version 6.5, now includes initial support for two very exciting features.
-
UbuntuDDE 23.04 Now Available
A new version of the UbuntuDDE remix has finally arrived with all the updates from the Deepin desktop and everything that comes with the Ubuntu 23.04 base.