The post-lunch session at the Social Media Access Days at the German National Library starts with LK Seiling and Sophia Graf, who discuss the Weizenbaum-Institut’s DSA40 Collaboratory project. The EU’s Digital Services Act provides for research access to public and non-public data via its articles 40(12) and 40(4), and in both cases this is limited to research that investigates what is called ‘systemic risks’, and to Very Large Online Platforms which serve at least 10% of the EU population, which translates to 45 million users.
If platforms are found to have failed to provide such access, the EU can (and does) take actions against them. But this requires a process of boundary-setting of what is acceptable and expected: platforms now report to the EU how many data access requests they have received and approved, while the Weizenbaum’s DSA40 Access Tracker also independently tracks the success of research access requests as self-reported by researchers.
Such tracking shows some very divergent patterns across platforms: some three quarters of applications to X are rejected, while TikTok approved some three quarters of applications it received.
The overall data access process includes application preparation, application vetting, and data analysis stages; in each of these stages various boundaries circumscribing the research are negotiated. The data access application forms required by the various platforms vary substantially, and some platforms require the submission of information which goes well beyond what the DSA Article 40 itself requires. The application forms tend to shape what researchers say they will research.
The experiences reported by researchers also provide an opportunity to model the differing decision-making processes that the various platforms engage in as they assess data access applications. Some of these processes are explicitly in breach of the requirements of Article 40: they reject applications for reasons that the DSA states cannot be used as grounds for rejection; and they implement burdensome vetting and approval procedures that are equally unacceptable under the EU’s rules.
And even when data access is eventually provided, the APIs and datasets provided by the platforms have repeatedly been found to be incomplete and unreliable. These shortcomings have been increasingly documented by researchers comparing such data with data obtained via scraping and other non-sanctioned data access mechanisms.
Platforms should also provide data catalogues to document and advertise the datasets available to researchers under DSA rules, but these data catalogues are often unacceptably underdeveloped and incomplete. One way to address this would be to require platforms to use a structured and standardised format for describing their data catalogues, as well as a standardise metadata format for data packages provided.











