Migrating social media data with the data transfer project

New Address

© Lead Image © GloriaSanchez, 123rf.com.png

© Lead Image © GloriaSanchez, 123rf.com.png

Article from Issue 272/2023
Author(s):

The Data Transfer Project wants to make it easier to move your data between social media sites.

Data portability and transparency are ongoing issues that plague all major social media giants. Who owns the data you post to your social media accounts? Can you get a copy of the data if you ask for it? If you had a copy, what could you do with it?

Many leading social media companies have APIs that let you extract and upload data, but the data formats tend to be dissimilar and proprietary, which means if you obtained the data, you couldn't do much with it unless you are a programmer yourself and have plenty of time for personal coding.

Back in 2018, a few leading social media companies pledged to make an effort at addressing this problem. The result is the Data Transfer Project, which was recently rebranded and expanded as the Data Transfer Initiative [1]. The mission of the Data Transfer Project (and Initiative) is to support a common neutral format for social media data as it passes from one platform to another, as well as to provide the tools necessary for transforming data in and out of that format.

In the long term, the goal is for the user to be free from direct involvement in the migration. A user who wishes to move data from one platform (say Twitter) to another platform (say Facebook) will simply choose an option in the Facebook user interface, and the migration will happen automatically. According to the Data Transfer Project developers, the purpose of the project is to "Extend data portability beyond a user's ability to download a copy of their data from their service provider ("provider"), to providing the user the ability to initiate direct transfer of their data into and out of any participating provider" [2].

The project was originally started by Google, Apple, Meta, Microsoft, Twitter, and SmugMug (Figure 1), but other companies are invited to join in. The Data Transfer Project is still a work in progress, but some of the code is available on GitHub [3] (Figure 2), and the developers provide API keys for the participating services for those who are interested in testing [4].

Figure 1: There are some big names behind DTP, though most of the coding is done by Google.
Figure 2: An impressive array of DTP open source tools are available, but it is not clear how exactly they're used on each platform.

How It Works

According to Meta's Engineering Blog [5], The Data Transfer Project consists of three main components:

  • a set of shared data models to represent each vertical (i.e., photos, contacts, playlists),
  • adapters, which handle the authentication of a user to a service (normally OAuth) and the transformation of data to and from the shared data models (importers and exporters),
  • and a task management framework, which puts all the pieces together and handles the life cycle of a transfer job, including job creation and running the transfer.

The data models provide the neutral format needed for transferring data to or from the participating platforms. One of the guiding principles of the project is that the vendors should not have to rewrite their APIs. Instead, the vendor provides an adapter to transform the data from the platform's own format to the neutral format – and also to transform the neutral data to a format needed for data import. The adapters basically serve as extensions of the API. Without something like DTP, a vendor would have to create a different solution for migration to every different platform. With DTP, the vendor just has to write an adapter for importing data from the neutral format and one to export code into the neutral format.

Of course, you can't migrate the data unless you have access to it. In addition to the data adapters are authentication adapters that allow the requesting service to access the originating service. According to the DTP documentation, "OAuth is likely to be the choice for most providers, however the DTP is agnostic to the type of authentication."

Why and Why Now?

If you've ever tried to download your information from major providers like Facebook, you'll know that they already have an online tool to download your account data, so you might wonder why the DTP is necessary.

The reason is that the data you download isn't specifically designed to be interoperable with other services. Facebook has rectified this to some extent by allowing you to download your account data in JSON rather than HTML format to make it easier to reupload to another service [6] but stills caution that the files are for your personal use (Figure 3).

Figure 3: Some websites like Facebook allow you to download your account information in JSON format for easier reupload, but these files don't necessarily hold all your data.

New regulations like GDPR (General Data Protection Regulation) in Europe also require companies to provide all data they hold on their customers. In the case of websites like Facebook, the data download doesn't necessarily include information like location data, facial recognition, links to friend's profiles, and so on.

As Congressman David Cicilline pointed out around the time of the Cambridge Analytica scandal, smaller companies will only become competitive with Facebook if it's easy for existing users to transfer all their data elsewhere, such as html links to friends' profiles [7].

Open Data and Open Source

In their announcements, both Facebook and Google play up the open source nature of the data transfer tools. There are public, open source extensions that allow the Data Transfer Project to be run on the Google Cloud Platform and Microsoft Azure. But the overall goal of the project is to support open data – not necessarily open source. An open source tool that supports migration to a proprietary format or a closed source service running in the cloud is not an ideal case study in free software. Because the software running on these platforms is proprietary, there's no way to be certain every part of your user data has been copied to your destination platform in a secure way.

Kevin Bankston, Director of the Open Technology Institute, has urged tech companies to go further [8]. Although he is optimistic about companies like Twitter and Facebook who offer downloads of account data in JSON format, he pointed out in 2018 that "Social networks should consider using the Activity Streams 2.0 open standard [9], a particular JSON-based format for exporting social media items. Facebook helped develop the standard at the World Wide Web Consortium, but right now only decentralized social network tools like Mastodon use it." (Since the time of writing Twitter has also adopted the Activity Streams format) [10].

Bankston mentions that tech giants could segue around the data transfer issue by making their platforms more interoperable. He points out that Meta's Developer Policy [11], e.g., makes it extremely difficult to create an app that makes full use of the API and replicates Facebook's core functionality like instant messaging.

Facebook also famously dropped support for the open XMPP standard for messaging in 2015 in favor of their own proprietary standards [12]. If they and other platforms were to agree to adopt XMPP (Figure 4), this would allow a Facebook user to view contacts and message and video-call users of other services like Skype or Google Chat without moving their data.

Figure 4: XMPP (formerly Jabber) is an open messaging standard that tech giants could use to enable messaging between platforms.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Open-Xchange and Zarafa

    The Open-Xchange and Zarafa groupware systems can tap into the APIs of Facebook, Twitter, and Xing, but you need different tactics for each service – keep in mind that the information yield is sometimes quite meager.

  • Hadoop 2 and Apache Spark

    Hadoop version 2 has transitioned from an application to a Big Data platform. Reports of its demise are premature at best.

  • ThinkUp

    Community managers, professional marketers, and active social media users want to know the effect their messages have on followers. ThinkUp can help.

  • Fediverse Introduction

    Do you have to you give up your privacy to enjoy access to social media? The makers of the Fediverse say no.

  • Open-Xchange and SugarCRM Partnership Established

    Groupware solutions expert Open-Xchange and customer relationship management (CRM) provider SugarCRM are partnering to mutually integrate their solutions.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News