Skip to main content
Home
Snurblog — Axel Bruns

Main navigation

  • Home
  • Information
  • Blog
  • Research
  • Publications
  • Presentations
  • Press
  • Creative
  • Search Site

Towards a Vocabulary for the On-Sharing of Research Data from Social Media Platforms

Snurb — Tuesday 17 March 2026 23:54
'Big Data' | Social Media | Social Media Access Days 2026 | Liveblog |

The next speaker in this session at the Social Media Access Days at the German National Library is Katharina Maubach, whose focus here is on data formats for archiving social media data. She works with a project exploring liking activities on social media platforms, especially relating to content from news sites; this covers Disqus, Facebook, YouTube, Xitter, and Instagram.

Ideally, such a cross-platform dataset should be shared with other researchers under FAIR principles (findable, accessible, interoperable, and reusable), but under the Terms of Service of such platforms and their data access conditions this is very difficult; the focus of Katharina’s talk here is on the data formats that might support such sharing, and the extent and limitations of such data.

While there are generic metadata formats which could be applied here (W3C Activity Streams, Dublin Core), these do not easily map onto the structure that social media activity data commonly take; the project therefore developed its own Canonical Social Media (CanSM) format. This covers data management information (collection time and modes, etc.), communication structure (message type and thread/tree information), provenance (platform, seed domain used in the collection, author information, etc.), temporal data (dates of creation and modification), message content (caption, text, tags), and metrics (e.g. engagement received).

Some such data are anonymous, pseudonymous, identifiable, and/or in some cases especially protected by applicable laws (e.g. personal opinion, religion, etc.); some were deleted from the platforms subsequent to data collection. This creates further complications for any data sharing. The project therefore identified several levels of data shareability to address this: 0 – data related to project organisation; 1 – anonymisable without losing information; 2 – aggregate data and statistics; 3 – localisable in datasets, and anonymising creates information loss; 4 – identifiable using the textual content. A first extract of this has been published via OSF.

The data vocabulary that underpins this was therefore designed specifically for social media data, and is transferable to other projects; its categories enable a differentiation in the formats and modes of publishing such data. Many other questions still need to be addressed, however, and this includes data protection, copyright issues, ethical and moral rights, and other issues. A further workshop on the publication of research data will be organised in early 2027.

  • 3 views
INFORMATION
BLOG
RESEARCH
PUBLICATIONS
PRESENTATIONS
PRESS
CREATIVE

Recent Work

Presentations and Talks

Beyond Interaction Networks: An Introduction to Practice Mapping (ACSPRI 2024)

» more

Books, Papers, Articles

Untangling the Furball: A Practice Mapping Approach to the Analysis of Multimodal Interactions in Social Networks (Social Media + Society)

» more

Opinion and Press

Inside the Moral Panic at Australia's 'First of Its Kind' Summit about Kids on Social Media (Crikey)

» more

Creative Work

Brightest before Dawn (CD, 2011)

» more

Lecture Series


Gatewatching and News Curation: The Lecture Series

Bluesky profile

Mastodon profile

Queensland University of Technology (QUT) profile

Google Scholar profile

Mixcloud profile

[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Licence]

Except where otherwise noted, this work is licensed under a Creative Commons BY-NC-SA 4.0 Licence.