Proposal: Creating Open Metrics to Audit Coordinated News Sharing on Social Networks

swapneelm · November 11, 2022, 6:25am

Name of Project: SimPPL (sim-people)

One-line Summary

Allow anyone external to a platform to evaluate the spread of online news examining which accounts spread it, when they spread it, and how much interest it garnered–all at the click of a button.

Description

The rise in popularity of online social networks has made it painfully evident that they lack adequate and effective content moderation techniques and the outcomes affect end users, disproportionately impacting vulnerable communities at times of civic volatility. We create an open-access auditing mechanism to quantify if there has been an inorganic spread of news posts, for example, those promoting a certain narrative so that human review can be conducted in an impactful manner and crowdsourced to permit the public scrutiny of otherwise opaque platform-level news sharing behavior.

Previous Grant

We received a previous grant to create a dashboard to study what information is spread online and how it spreads and partnered with The Times / The Sunday Times (UK) to verify if there was any coordinated activity surrounding the spread of news from two Russian state-backed media outlets (that had been banned multiple times by the EU, USA, Meta, and others), resulting in the first iteration of our system under the ‘Networks and Topics’ tab of our website https://demo.simppl.org.

So we successfully created open metrics around both the content itself, and the spread of content–receiving a bunch of valuable feedback around credibility, thoughts on misinformation, propaganda, and disinformation from the community. Now it is not correct to consider any single authority as a constant, neutral arbiter, especially in a war, so we would like to create systems that can audit any news source in a similar manner, focusing on objective metrics around news sharing before we dive deeper into the content-based ones.

Proposed Extension Details

We will expand our existing system to focus on the spread of posts and determine if there is potentially any coordinated behavior in the promotion of the posts (within reasonable confidence intervals). Our system, for instance, will be able to track how often account X reshares news provider Y’s post within Z seconds of Y posting, with a low Z (e.g. 5 seconds) indicating either a very close following relationship or potential inauthentic spread based on published work that we were pointed to, used by platforms themselves (see fabiogiglietto/CooRnet on Github). The reason we do this is to understand if provider Y has a lot of inauthentic accounts promoting their posts on a regular basis. We are trying to source quantifiable signals around such events that can then be publicly reviewed. This sounded fairly naive to us early on, but we quickly realized when we started out, that the data collection and tracking, for each user and each post, is an extremely challenging task.

We are primarily focusing on objective metrics around spread as the comments from reviewers of our last grant and community discussions from Algovera on Discord led us to appreciate that there is more work required when it comes to communicating the uncertainty of ML models e.g. predicting propaganda labels; and evaluating the credibility of users based on content. In light of the Misinformation and Health teams laid off at Twitter and similar layoffs at Meta, this work is likely to gain increasing importance to support the real-time, external audit of information operations during civic events like elections, protests, and war.

Grant Deliverables:

Generate reports containing open metrics around the reach of news articles through a live-querying system for Twitter data
Develop a cloud based architecture to repetitively query graphs with hundreds of thousands of nodes and millions of edges representing a social network, with low latency (POCs with Neo4j worked well).
Find and visualize examples of coordinated networks on Twitter that can be further examined by the platform, journalists, researchers, or non-technical stakeholders as a validation of our system’s utility in sourcing inauthentic spread based on past publications.

Squad

Squad Lead:

Twitter: @swapneel_mehta
Discord: swapneelm#8582
USDC ETH mainnet wallet address: 0xA516953726AD7598C889B69572CFf00025A6763a

Squad members:

Jay Gala, Deep Gandhi, Jhagrut Lalwani, Dhara Mungra, Raghav Jain

swapneelm · November 11, 2022, 6:28am

Here are the slides that provide you with an overview of our deliverables from the last grant that supported our initial foray into auditing how news is shared online. The reason we picked this example is that we were approached by journalists interested in this, and we wanted to understand how to build a system that actively supports the work of potentially non-technical stakeholders in the space.

Here’s the demo (under the networks and topics tab): https://demo.simppl.org