Google Takeout: viewing what Google knows about you

Off the Beat: Bruce Byfield's Blog

Feb 15, 2012 GMT

Bruce Byfield

Ever wonder what information Google has collected about you? Now, you can find out, thanks to Google Takeout, which allows you to download most of the information that Google has collected about you.

The question should be of more than passing interest to just about everyone. Few people may have bought Google's Chromebook with its web-based applications, but Google still dominates our computer lives. We use it to receive emails. We store pictures and documents on it. We socialize on it -- and, all the time, Google is collecting information about us.

Google Takeout is a creation of the Data Liberation Front, which describes itself as
"an engineering team at Google whose singular goal is to make it easier for users to move their data in and out of Google products. We do this because we believe that you should be able to export any data that you create in (or import into) a product. We help and consult other engineering teams within Google on how to 'liberate' their products."

You can run Google Takeout to see what information Google has stored about you, then go the Data Liberation Front site for instructions about how to remove data from specific Google products. Not all Google products have been "liberated," although most of the major ones have.
What is unclear, however, is how official Google Takeout or the Data Liberation Front is. Google Takeout appears to be hosted on Google servers, but the Data Liberation Front site gives no information about where it is hosted, doesn't identify anyone associated with the project, and gives no contact information other than a Twitter account.

Consequently, whether either is officially supported by Google, semi-official, or clandestine is uncertain. I'm guessing they may be projects Google employees have undertaken in their twenty percent time -- the time Google gives employees to work on private projects -- but don't know.
What I can say is that running Google Takeout is an educational and somewhat alarming experience.

Revelations in the archive
Running Google Takeout is as easy as logging into your Google account, and selecting which services to include in the archive of your data. If you want, you can first review a summary of the information collected by each service by clicking the download button. Probably, the largest omissions are GMail, which I suspect is the most heavily used Google service, and search engine records.

Obviously, the time needed to create the archive depends on how heavily you have used Google, but the result is a zip archive named for your account saved to your hard drive, neatly divided with a separate folder for each service.

In my case, the archive was just over 4.4 megabytes -- but, then, compared to other people, I am undoubtedly a light user of Google services, especially when GMail and searches are omitted. The service I most heavily use is Google+, and even that I've lost interest in because of its refusal to accept pseudonyms -- and even, in some cases, non-European names.

Still, even as a light user, I was taken aback by how much information Google was storing about me. I shouldn't have been, I know -- I willingly provided all that information, and I could see no sign that Google was storing anything I hadn't authorized.

All the same, seeing all the accumulated information was a shock. It is one thing to know that Google never throws out old information, and another one to realize that documents abandoned six years ago in Google Docs are still around. In my case, they are mostly test documents and likely to be of minimal interest to anyone, but if I had other work habits, their continued existence would raise concerns about privacy and security.

Similarly, while I was obviously aware that pictures posted on Google+ had to be stored somewhere, I was puzzled to see I had graphics stored in Picasa. Since I have never used Picasa separately, I took a few seconds to realize how they had got there. This experience convinced me that, in recently announcing the centralization of its services, Google was only making official what was already happening, but it also raises security questions. After all, making sure that your data is safe becomes harder when you are unaware of exactly which service it is stored in.

Then there's the information I was aware of. In the Streams folder, the archive included every posting I had every made on Google+, as well as all my contacts and circles (groupings of people I follow, if you don't happen to use Google+). All those individual decisions to post, I quickly discovered as I read them together, add up to a thorough picture of my online persona, especially since many of the posts are links to articles I've published.

Even more seriously, the people I choose to follow and the circles in which I've arranged them easily tells information that goes far beyond what I ever intended to give. From my circles, for example, anyone reading the information could tell something about my family and professional associates, and therefore about interests and connections I might prefer to keep private.

These discrete pieces of information could easily be combined with other archived information such as my pictures to tell more about me than I ever intended. For instance, from my pictures, one might deduce what I enjoy buying, and, from the circles, from whom I buy.

True, Google shows no signs of selling such information to advertisers or retailers. But what if Google's security is breached? Like most people, I have no informed opinion about the quality of Google's security. Yet,one piece at a time, I have entrusted more deeply personal information to Google. The fact that Google probably has less information about me than about most people is no comfort, because I find that, just by using Google's services, I have let my information be made available in ways in which I never consented.

Taking responsibility
I don't mean to be paranoid. Nor am I suggesting that Google is untrustworthy or deceptive. Its services are convenient, and perhaps some small losses of privacy are a reasonable exchange for that convenience.

Yet I am disturbed by how little Google emphasizes this potential lack of privacy, and how willingly I went along with it, less than half aware of what I was doing. If someone like me, with a reasonable lay knowledge of security and privacy issues, can fall into such complacent behavior, there must be millions of users who are even more naive than I was, and entrusting far more potentially damaging information to Google.

Even more importantly, what about the mail and search services not included in Google Takeout? If our uses of other services contain so much information, how much more do these popular services contain?

I can't answer that question. But I do know that over the next few days I will be using the Data Liberation Front's tools to remove unnecessary information from as many Google services as I can. At future intervals, I will repeat the process. In addition, I'll consider what Google services I might do without.

I have also decided that, instead of turning to Google twenty or thirty times a day for search results, I will transition completely to DuckDuckGo, a small search engine that claims not to store records of your search. I consider these decisions not paranoid, but simply small steps towards being more responsible about my online habits.

« previous post next post »

comments powered by Disqus

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Hannah Montana Linux Is Back!

DEBIAN , Kubuntu , Plasma

Developer Noah Cagle decided the world needed the once obscure but beloved Linux distribution and gave it a decidedly pink refresh.
System76 Refreshes the Lemur Laptop

Hardware , laptop

If you're looking for a laptop with tons of power and battery, look no further than the latest iteration of the System76 Lemur Pro.
More than 43 Million Lines of Code in Linux Kernel 7.2

Kernel , Linux

Using the cloc utility, Michael Larabel of Phoronix discovered that Linux kernel 7.2 has over 43 million lines of code.
Kubuntu Focus Goes Ultra

Hardware , Kubuntu , laptop

The Kubuntu Focus team has upped the performance ante of its M2 and Zr laptops with the latest, greatest CPUs from Intel.
Linux Gamers May Soon See Less Mouse Lag in KDE Plasma

Games , KDE , Plasma

Gamers using KDE’s Plasma desktop have been suffering from a slight input delay in mouse movement that could lead to getting fragged.
Three Lines of Code Improve Linux Storage Performance

Kernel , Performance , Storage

A developer changed three lines of code, giving Linux storage performance a 5% bump.
AUR Hit Again with Malicious Packages

Arch Linux , Security

Once again the Arch User Repository is plagued by a high volume of malicious packages.
Alpine Linux 3.24 Features Fresh Desktops and a Newer Kernel

Alpine Linux , Gnome , Plasma , Security

If you're a fan of Alpine Linux, it's time to upgrade because the latest version has been released with KDE Plasma 6.6, Gnome 50, and Linux kernel 6.18 LTS.
EU Open Source Strategy Plays Key Role in Tech Sovereignty Package

EU , government , open source

Comprehensive measures adopted by the European Commission aim to reduce dependency on non-EU countries.
Linux Foundation Report Indicates AI Driving Tech Hiring

Artificial Inte... , privacy , Security

Within growing security and skills gaps, AI has been found to be a positive driving force behind tech hiring trends in Europe.

Google Takeout: viewing what Google knows about you

Off the Beat: Bruce Byfield's Blog

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters