Migrating social media data with the data transfer project
New Address
The Data Transfer Project wants to make it easier to move your data between social media sites.
Data portability and transparency are ongoing issues that plague all major social media giants. Who owns the data you post to your social media accounts? Can you get a copy of the data if you ask for it? If you had a copy, what could you do with it?
Many leading social media companies have APIs that let you extract and upload data, but the data formats tend to be dissimilar and proprietary, which means if you obtained the data, you couldn't do much with it unless you are a programmer yourself and have plenty of time for personal coding.
Back in 2018, a few leading social media companies pledged to make an effort at addressing this problem. The result is the Data Transfer Project, which was recently rebranded and expanded as the Data Transfer Initiative [1]. The mission of the Data Transfer Project (and Initiative) is to support a common neutral format for social media data as it passes from one platform to another, as well as to provide the tools necessary for transforming data in and out of that format.
In the long term, the goal is for the user to be free from direct involvement in the migration. A user who wishes to move data from one platform (say Twitter) to another platform (say Facebook) will simply choose an option in the Facebook user interface, and the migration will happen automatically. According to the Data Transfer Project developers, the purpose of the project is to "Extend data portability beyond a user's ability to download a copy of their data from their service provider ("provider"), to providing the user the ability to initiate direct transfer of their data into and out of any participating provider" [2].
The project was originally started by Google, Apple, Meta, Microsoft, Twitter, and SmugMug (Figure 1), but other companies are invited to join in. The Data Transfer Project is still a work in progress, but some of the code is available on GitHub [3] (Figure 2), and the developers provide API keys for the participating services for those who are interested in testing [4].
How It Works
According to Meta's Engineering Blog [5], The Data Transfer Project consists of three main components:
- a set of shared data models to represent each vertical (i.e., photos, contacts, playlists),
- adapters, which handle the authentication of a user to a service (normally OAuth) and the transformation of data to and from the shared data models (importers and exporters),
- and a task management framework, which puts all the pieces together and handles the life cycle of a transfer job, including job creation and running the transfer.
The data models provide the neutral format needed for transferring data to or from the participating platforms. One of the guiding principles of the project is that the vendors should not have to rewrite their APIs. Instead, the vendor provides an adapter to transform the data from the platform's own format to the neutral format – and also to transform the neutral data to a format needed for data import. The adapters basically serve as extensions of the API. Without something like DTP, a vendor would have to create a different solution for migration to every different platform. With DTP, the vendor just has to write an adapter for importing data from the neutral format and one to export code into the neutral format.
Of course, you can't migrate the data unless you have access to it. In addition to the data adapters are authentication adapters that allow the requesting service to access the originating service. According to the DTP documentation, "OAuth is likely to be the choice for most providers, however the DTP is agnostic to the type of authentication."
Why and Why Now?
If you've ever tried to download your information from major providers like Facebook, you'll know that they already have an online tool to download your account data, so you might wonder why the DTP is necessary.
The reason is that the data you download isn't specifically designed to be interoperable with other services. Facebook has rectified this to some extent by allowing you to download your account data in JSON rather than HTML format to make it easier to reupload to another service [6] but stills caution that the files are for your personal use (Figure 3).
New regulations like GDPR (General Data Protection Regulation) in Europe also require companies to provide all data they hold on their customers. In the case of websites like Facebook, the data download doesn't necessarily include information like location data, facial recognition, links to friend's profiles, and so on.
As Congressman David Cicilline pointed out around the time of the Cambridge Analytica scandal, smaller companies will only become competitive with Facebook if it's easy for existing users to transfer all their data elsewhere, such as html links to friends' profiles [7].
Open Data and Open Source
In their announcements, both Facebook and Google play up the open source nature of the data transfer tools. There are public, open source extensions that allow the Data Transfer Project to be run on the Google Cloud Platform and Microsoft Azure. But the overall goal of the project is to support open data – not necessarily open source. An open source tool that supports migration to a proprietary format or a closed source service running in the cloud is not an ideal case study in free software. Because the software running on these platforms is proprietary, there's no way to be certain every part of your user data has been copied to your destination platform in a secure way.
Kevin Bankston, Director of the Open Technology Institute, has urged tech companies to go further [8]. Although he is optimistic about companies like Twitter and Facebook who offer downloads of account data in JSON format, he pointed out in 2018 that "Social networks should consider using the Activity Streams 2.0 open standard [9], a particular JSON-based format for exporting social media items. Facebook helped develop the standard at the World Wide Web Consortium, but right now only decentralized social network tools like Mastodon use it." (Since the time of writing Twitter has also adopted the Activity Streams format) [10].
Bankston mentions that tech giants could segue around the data transfer issue by making their platforms more interoperable. He points out that Meta's Developer Policy [11], e.g., makes it extremely difficult to create an app that makes full use of the API and replicates Facebook's core functionality like instant messaging.
Facebook also famously dropped support for the open XMPP standard for messaging in 2015 in favor of their own proprietary standards [12]. If they and other platforms were to agree to adopt XMPP (Figure 4), this would allow a Facebook user to view contacts and message and video-call users of other services like Skype or Google Chat without moving their data.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.
-
Linux Sees Massive Performance Increase from a Single Line of Code
With one line of code, Intel was able to increase the performance of the Linux kernel by 4,000 percent.
-
Fedora KDE Approved as an Official Spin
If you prefer the Plasma desktop environment and the Fedora distribution, you're in luck because there's now an official spin that is listed on the same level as the Fedora Workstation edition.
-
New Steam Client Ups the Ante for Linux
The latest release from Steam has some pretty cool tricks up its sleeve.
-
Gnome OS Transitioning Toward a General-Purpose Distro
If you're looking for the perfectly vanilla take on the Gnome desktop, Gnome OS might be for you.