Proposal: Auditing information propagation landscapes on social media

swapneelm · September 6, 2022, 9:28pm

Name of Project: SimPPL

Proposal in one sentence: Auditing information propagation landscapes on social networks.

Description of the project and what problem is it solving: We prevent the spread of news deserts by supporting local news organizations with data-driven insights to improve their reach. Our software helps journalists at local news organizations visualize the spread of news articles on social media and the audience participating in its propagation. It will provide a tool for the journalism community to audit the impact of news on social media in terms of the reach and engagement it achieves, communities it impacts, and contextual analysis of the conversations it precipitates.

This is an applied project, part of a broader research effort aiming to model the online information diffusion landscape and predict how content spreads in social networks using simulation intelligence.

Grant Deliverables:

Audience Dashboard for newsrooms to collectively understand the audience for an article across their website and social networks (Reddit and Twitter).

This will summarize:

Demographic Information from Google Analytics for newsrooms
Subreddit membership on Reddit
Topic-level interests from public Reddit and Twitter histories
Toxicity metrics from public Twitter conversations
Public Engagement information across Reddit and Twitter

Squad

Swapneel Mehta (team lead), Jay Gala, Deep Gandhi, Jhagrut Lalwani, Dhara Mungra

Website: https://simppl.org

swapneelm · September 12, 2022, 3:26am

Loom Video for our Deliverables, and team: Loom | Free Screen & Video Recording Software | Loom

Feel free to DM me here, or on Twitter if this sounds like a tool you’d be interested in using!

endomorphosis · September 12, 2022, 10:32pm

The problem that I have with this proposal, is that you have mentioned the “means” but not the “ends”, and the ends to me seems to be the suppression of speech by “Fight Misinformation and Disinformation Online”, but not by getting to the ground truth of hypothesis generation, but by alerting the censors to dissenting views.

This to me is a problem because journalists who are frequently funded embedded by industry or government, or perhaps this can be used to target journalists who publish information not in favor by the establishment. Just remember that once upon the time the corporate news media all said in unison that Saddam Hussein had WMD’s and yellow cake uranium.

You cite "public health infodemics” yet what we have seen during the coronavirus pandemic, was that things such as adverse reactions to the vaccine, or doubts of the efficacy of the vaccine, or the source of the coronavirus virus, were called “misinformation” and banned from social media at the behest of the government, only to be subsequently confirmed as true subsequently.

Moreover I have had personally had situations such as this, when my own website archive.ph was called “misinformation” by the USAToday, for alleging that there were dead people voting for president in Pennsylvania, despite having verified all of the voter records, and the fact that the State of Pennslyvania settled a lawsuit against it with the same allegations, agreeing to remove the dead voters. Public Interest Legal Foundation v. Boockvar, 1:20-cv-01905 – CourtListener.com

As a result of this claim by USA today, we were deplatformed from fundraising and from the domain registrar, and the specific person who was the reporter was previously an employee of the US state department, and an intelligence firm which is funded by the US government.

swapneelm · September 12, 2022, 11:51pm

Thanks @endomorphosis or raising an important point that we’ve been acutely aware of but the website (still a WIP bc it talks about too many different use-cases) does not highlight, for want of a more nuanced discussion and clarity on what the goals are (maybe a blog post is warranted!). The ends, in our view, are an ability for the public* to audit the on-platform sharing behaviors of users in relation to articles or topics of interest. We framed it originally as a tool for newsrooms and their audiences, which makes it unclear that our information landscapes tool is designed to be an open audit tool for you to use as long as there are articles** that you care about tracking in online discourse. If your point is about not framing it as “disinformation” tracking, that is absolutely taken and we should make it clear that we aren’t the arbiters of truth including rephrasing the “fight” part on the website.

A side note about our motivations: As external researchers, we have severely limited abilities to look into how purported propaganda is spread on the platform, and what harms have actually impacted other users, until platforms track the behaviors in multiple instances listed out by Meta and Twitter. This is only released months after such networks have been removed, hindering research potential into public harms from such ‘influence operations’. We aren’t the arbiters of truth and our goal is simply to introduce more transparency into the risks from online disinformation.

Now, to your point about journalists misspecifying what “misinformation” is or abusing this system we will create–in this case, mis-labeling your own website as misinformation. I don’t disagree that there will be false positives on what constitutes misinformation and that consensus among a single subgroup of incentivized people is generally a really bad metric to identify valid misinformation. However, the purpose of creating a tool like this is to lower the barrier to information access, to allow the everyday researcher (or journalist) to access historical data on events where the establishment has acted against public interests, and cite it as evidence to limit their abilities to abuse this power in the future. I could also cite examples where researchers were able to quickly come up with some externally verified sets of URLs that constituted misinformation (see Co-AID and ESOC COVID-19 data) which, despite potential false positives, largely accelerated the dispelling of rumors. So there is clearly potential for good to come out of this. It is an important but disjoint discussion to debate the merits of third-party verification and “who fact-checks the fact-checkers”.

For instance, if you could track multiple instances of when accounts are wrong about what constitutes misinformation (which you can do with the tool we are creating), you can cite it as evidence to have lower confidence in what such journalists/so-called fact-checkers say in the future. Even better, if you can establish that there are networks of collusive accounts based on public data, it improves your ability to make a case for yourself in the future. This idea of historical credibility is similar, in principle, to a tool called Birdwatch that Twitter is developing internally to try and improve their misinformation and online harm detection systems.

Our proposed system isn’t a perfect solution and misinformation is a complex, multi-headed beast to tackle. Introduce bad actors on top of this and every system bears potential as an attack vector for exploitation. But seeing as no one has been able to solve the misinformation problem in a significant way, including platforms worth billions and researchers with decades of experience, I think this is a small step in a direction that is worth exploring to improve the public accountability of platforms.

*or the independent oversight boards commissioned by regulatory authorities, or disillusioned collectives of tech employees hoping to make platforms accountable, such as the Integrity Institute

**in the future projects, any URLs or keywords

Mark · September 26, 2022, 4:02pm

A serious problem I see in ML is projects developing a technology while knowing it is does not solve the problem and justifying this with “it is the best we can do at the moment” without considering the consequences (side-effects and intentional abuses) that will arise. The idea of fact-checking is dystopian when people get control of that resource and any system that is successful in orienting opinion is ripe for abuse/capture. My impression of fact-checking is that it has furthered the centralization of power and reduce the hope for introducing critical thinking skills to the public. I’m not claiming nothing should be done but simplifying the problem because we only have simple tools is certain to lead to unintended side-effects. So I think these issues need to be central rather than secondary to technical challenges.

swapneelm · September 28, 2022, 6:14pm

Thanks Mark, I think there have been conversations scattered across the Algovera discord, my DMs, and this post so once I’m back from a bunch of travel lined up right now, I’m going to try and put up a Google Doc with an FAQ-style summary of what everyone has asked and we can pick up the discussion there.

In general I don’t think there is “a solution” to misinformation, and I agree fact-checking alone is certainly insufficient. I think every proposed solution attempts to tackle a small segment in its own way and I also don’t see the value in limiting those efforts purely because “they don’t solve the problem entirely”. I believe it is immensely clear that this is an adversarial space and iterative solutions will be the way to go. We have billion-dollar companies that are struggling to tackle the problem after having complete control over the online ecosystem and we have state-backed actors working actively to game these platforms and manipulate public discourse (Fronton: A Botnet for Creation, Command, and Control of Coordinated Inauthentic Behavior - Nisos). In such a situation we’re trying to create a super-simple accountability mechanism to track what people, so-called “influencers”, agencies, and indeed fact-checkers have published by way of social media over time. This allows the introduction of a historic lens with which to view the things they say in the future which is much better than what we have right now for the public–which is not a lot for non-technical folks. I don’t say this lightly–I’ve actually spent a few months talking to journalists from all over the world trying to understand how they use tools to study how information spreads online and they just don’t have good ways to do so. It is a similar situation with “fact-checking” websites publishing checks, but very few ways to tie those with the actual impact that an article has online. It seems that this problem of mapping fact-checks to impact is left to platforms to discover and address.

(Note: It is a different story as to whether a fact-check is actually correct or itself propaganda/manipulation/inaccurate. I’d redirect you to discord for two long threads with some others about who fact-checks the fact-checkers and what mechanisms are useful including consensus and what happens when even collective wisdom is wrong, instead of rehashing the same discussion here.)

Now, having worked as a researcher in data science, the social sciences, and also on the other end at a large social network company in a role interfacing with 4 teams dealing with “misinfo”, I’ve seen that it’s incredibly challenging to make judgments purely based on a single instance of sharing any sort of content–misinformation or not–which is why we’re trying to create an auditing mechanism. In its current form, this is a summary of an account’s historical sharing tendencies for the public to view. In the future, we expect it to evolve into a more comprehensive system to enable transparency into account-level behaviors on social networks. There is a lot of mature work and some nascent work e.g. Botometer at the Observatory on Social Media, and others that “score” accounts for bot-like behavior or projects like Birdwatch on Twitter that attempt to use ‘credible’ crowdsourcing for policy-violating tweets. We’re planning to study a bunch of metrics like / inspired by these to evaluate corner cases and flaws in such systems hoping to expand on similar work.

To your point about potential harms, we’re also trying to connect with Trust and Safety folks (I’m at Stanford presenting at Trust and Safety Research Conference, ironically, as we speak :)) to identify the potential harms of these kinds of tools so that we can put safeguards in place against abuse. As I’ve mentioned on Discord, it’s a slow and steady process of creating an active community / team to tackle this issue, and we’re working hard to bring in a diversity of perspectives including the Algovera community so that we can avoid any obvious pitfalls moving forward. No tool is ever going to be perfect or even permanent in such an actively evolving space, so if you have suggestions to improve it, we’d always love to hear them!!

swapneelm · September 28, 2022, 10:55pm

I’ve edited the proposal following the discussions on Discord, to avoid rehashing the same discussion for posterity, and to clearly establish that this is an auditing system that will not make a determination of any kind as to the “truth” or “factual accuracy” of any content. Edits on the website will also follow.

Since the discussion of disinformation (tracking) is really important and continuously evolving, we will, as planned, aggregate the disjoint conversations across platforms into an FAQ section so that it helps clarify what the broader goals for SimPPL are, and invite the community to have a discussion in a structured manner so that suggestions can immediately be acted upon and not remain buried in posts and replies to be forgotten/repeated later. This will also ensure that concerns about abuse of such a system can be concretely formulated, documented, and a discussion can be had about its merits and potential safeguards to put into place.

Mark · September 29, 2022, 7:10am

Thanks for the thoughtful response. I’ll focus on a few points below:

I agree. The concept of “misinformation” includes mistakes and misunderstandings, this is literally the history of science, we need to accept best attempts. A lot of nuance is lost in the binary of true/false and I think we can have near certainty about falsity but in many cases the truth is temporary as understanding changes.

There is a more important category of “disinformation” which is not so much about the “facts” or the “truth” but about the intentions (conscious and unconscious) of the people promulgating the information. This is a far bigger problem and points to things like ideological capture, cancel culture, propaganda, etc.

I think you should position your project relative to these terms off misinformation and disinformation.

I am not sure they struggle to deal with the actual problem, they struggle to serve the interests of a billion-dollar corporation while secondarily trying to resolve legal and regulatory concerns.

There is an important (perhaps unintentional) assumption here - there is not one history (that would assume facts are purely objective). I think it would be interesting to have different versions of history, for example I would like to be able to see how credible a source in in relation to different perspectives. So if I want to adopt a particular perspective then I can find relevant sources in which I can have confidence. This gets away from an idea of centralized fact-checking with some Orwellian hive mind. A central question is who controls the “perspective” from which claims are made - this is why fact-checking organizations are, by definition, propagandist.

This is where you will introduce your organizations biases. Perhaps unknowingly this is pushing the modern liberal agenda of there being “only one path forward” and then focusing on “efficiency”.

I think I’ve made some suggestions. One problem I see is that if you are driven to produce a solution through iteration, starting with a simple system, then you are going to build in your assumptions. If you can identify places where biases will appear then expose this as a choice i.e. let the user inject their bias. For example let me compare the truth of a Trump support in the mid-west with the truth of a Biden supporter in California.

An important point is to realize that your project is political and to be transparent about your politics. Otherwise it becomes, intentionally or not, another tool for disinformation.

It will be good to see a safety analysis that shows the team has thought through how the project could be problematic and taken actions to reduce risk.