You are here

'Big Data'

Reviewing the Emerging 'Big Social Data' Research

The next session at Social Media and Society is on 'big data', and begins with Andra Siibak (who is also the programme chair for AoIR 2017 in Tartu, Estonia!). She highlights the possible methodological shifts that arise from the use of 'big data' in social science research: this is in part seen as a shift towards more quantitative methods, but also as a more nuanced and methodological shift from designed to more 'organic' data, whatever we may mean by this. Approaches that are built on formulating and testing preconceived hypotheses may also be challenged by other, alternative approaches.

Social Media in Research: From 'Big Data' to 'Wide Data'

It's the second day of Social Media and Society in London, and after a day of workshops we're now starting the conference proper with a keynote by Susan Halford. She begins by pointing out the significant impact of social media on a wide range of areas of public and everyday life. We're constantly presented with the digital traces of social media – with social media data at an unprecedented scale, telling us something about what people do with social media in their everyday lives. This is an unexpected gift, but is also causing significant concern and scepticism.

What is the quality of the data – what are they, what do they represent, what claims can be made from these data? Some social scientists are even suggesting that such data are dangerous and can affect the public reputation of the scientists and disciplines using them. Few people were experts in working with social media data when these data first arrived – we are building the boat as we row it, to use an old Norwegian saying, and we're learning about how to do so as we go along.

Social Media and Collective Political Action

The closing (!) keynote of Web Science 2016 is presented by Helen Margetts from the Oxford Internet Institute. Her focus is on the use of social media for collective political action – that is, for activities undertaken by citizens with the aim of contributing to the public good. There is a strong feeling that such action is happening, but as yet not enough empirical evidence about how and why it is happening.

Even those who refuse to participate online are somehow caught up in the changes that the Internet has contributed to: our lives are intertwined with its technologies, platforms, and content. And these technosocial spaces also generate a substantial amount of transactional data about user participation that goes well beyond the sort of data – for instance about political attitudes and engagement – that were available in pre-Internet days.

Modelling Discrete Choice Problems

Post-lunch, the final day of Web Science 2016 continues with a keynote by Andrew Tomkins, whose focus is on the dynamics of choice in online environments. He begins by highlighting R. Duncan Luce's work, including his Axiom of Choice, but also points out the subsequent work that has further extended the methods for analysing discrete choice. Today, the most powerful models are mathematically complex and computationally intractable, as well as requiring sophisticated external representations of dependence.

From this work it has become clear that the Axiom of Choice holds only under relatively select conditions. Contextual data is of great importance here, and additional approaches to modelling general behaviour of discrete choice are required. The Randomised Utility Model, for instance, assigns a random utility value to each available choice, and in an ideal world users would then select the item with maximum utility; but because of existing preferences real-world users will deviate from such choices.

How Facebook Uses Computational Processes to Police Its Ads

The final Web Science 2016 keynote for today is by Daniel Olmedilla, whose work at Facebook is to police the ads being posted on the site. Ads are the only part of Facebook where inherently unsolicited content is pushed to users, so the quality of those ads is crucial – users will want relevant and engaging content, while advertisers need to see a return on investment. Facebook itself must ensure that its business remains scalable and sustainable.

Key problem categories are legally prohibited content (e.g. ads for illegal drugs); shocking and scary content; sexually suggestive material; violent and confronting content; offensive before-and-after images; ads with inappropriate language; and images containing a large amount of text.

Predicting Twitter-Based Information Cascades

The next session at Web Science 2016 starts with a paper by Jure Leskovec on information cascades. Such cascades emerge as users of social media platforms (re)share content through their networks, and the prediction of such processes is traditionally very difficult.

Current Practices in Social Media Data Sharing between Researchers

The next WebSci 2016 presenters are Katharina Kinder-Kurlanda and Katrin Weller, who argue that it is necessary to address the digital divides in data accessibility in social media research. They interviewed a large number of social media researchers, and what emerges from this work is that much data sharing is already taking place, but under varying circumstances.

Identifying MOOC Learners on Social Media Platforms

We start the first paper session at WebSci 2016 with a paper by Guanliang Chen that examines learner engagement with Massively Open Online Courses (MOOCs). These generate a great deal of data about learner engagement during the MOOC itself, but there's very little information about learners before and after this experience. Can we use external social Web data to identify and profile these learners, in order to better customise the learning experience for them?

Web Science and Biases in Big Data

It's a cool morning in Germany, and I'm in Hannover for the opening of the 2016 Web Science conference, where later today my colleague Katrin Weller and I will present our paper calling for more efforts to preserve social media content as a first draft of the present. But we start with an opening keynote by Yahoo!'s Ricardo Baeza-Yates, on Data and Algorithmic Bias in the Web.

Ricardo begins by pointing out that all data have a built-in bias; additional bias is added in the data processing and interpretation. For instance, some researchers working with Twitter data then extrapolate across entire populations, although Twitter's demographics are not representative for the wider public. There are even biases in the process of measuring for bias.

Pages

Subscribe to RSS - 'Big Data'