Slaying Fake News

Marin Smiljanic
9 min readApr 28, 2020

An important consequence of the current COVID-19 crisis is that numerous other important problems have been starved of the attention they might otherwise be getting. Today we’ll occupy ourselves with one such problem: the problem of fake news.

And we could rightly claim that the problem is particularly important in crises like this one, given the importance of keeping the public up-to-date with the most relevant information out there, and keeping misinformation to a bare minimum, or ideally eradicating it altogether. Not to mention the importance of getting this right in the context of elections, where the problem has so far been most acute.

And so we dive in, first taking a look at a misinformation campaign, then attempting to root cause why social media platforms are so ill-equipped to deal with them, and then cutting the Gordian knot in, I hope, a satisfactory way.

Volodya and the trolls

Whether fake news on social media swung the 2016 election will remain a contentious point for decades to come. But that there were efforts by foreign agents to influence it is beyond doubt. Efforts by Russian intelligence agencies to influence the outcome, allegedly working through a company calling itself the “Internet Research Agency”, have been amply documented, including in the Mueller report.

Not only foreign agents but various hooligans availed themselves of the opportunity to spread misinformation. Among the more legendary examples: a Romanian character operating endingthefed dot com (whose fabricated story on Fox anchor Megyn Kelly went viral), and junk-posting high schoolers in Veles, Macedonia. Content of this sort was disseminated all over Facebook:

The problem is twofold: firstly, the article looks just like any other link shared on Facebook, there being no visual signal that this is a potentially problematic item; and secondly, that there was no mechanism to proactively alarm users to the nature of the article — rather, it needed to be flagged, reported, and taken down by Facebook’s moderators, which is anything but a foolproof method. The tally is astounding: the IRA promoted 80,000 pieces of content that Facebook’s own internal investigation estimated to have reached some 129 million users. Mind you, this is just the IRA, not all of fake news.

The social media problem

Why is social media so bad at tackling the problem? The main problem of social media can in scientific terms be stated as follows: that any idiot is free to share whatever they see fit, and is eligible to get wildly re-shared/retweeted, regardless of reputation or trustworthiness.

How did we get there? The point is quite subtle, and we’ll start at the inception of Facebook. Mark Zuckerberg made a couple of interesting remarks in a 2016 interview with Y Combinator’s Sam Altman:

One of the things you learn in psychology is that there are all these parts of the brain which are geared towards understanding people, understanding language, how to communicate with each other, facial expressions…

Yet when I looked at the Internet in 2004, you could find anything else you wanted: news, movies, music, reference materials, but what mattered the most to people, which is other people, just wasn’t there… [emphasis added]

And then a key sentence:

… what was going on was that all that other content was just out there able to be indexed by search engines and other services, but to understand what’s going on with people you needed to build tools to express what’s going on with themselves.

Hence, we see here two categories of content: one, all the news, movies, music and whatnot that is “out there”; and the other, all the information related to people — their friendships, their life events, their relationship statuses (the early killer feature, wasn’t it), their locations and educational institutions. The former category was not the focus. The latter would with considerable rapidity end up being dominated by Facebook, as a de facto parallel web.

The alert reader will notice, many years after those early observations, that all hell broke loose when the “other content” started making its way to Facebook’s pristine social network. How so? Few will remember the time before Facebook’s privacy infractions, but the initial Facebook was very much a safe space — in the very early days only people from the same college could access your profile. Furthermore, it was meant for seeing information about your real-world friends, and since it took (and still takes) two to tango, it was assumed that if you become Facebook friends with someone, you know and trust them in real life.

To zero in on the point now: Facebook was created for an environment where trust was implied. As such, there was no immediate need to develop a robust reputation system — if one of your friends posts something, it can be trusted since you trust your friend. This becomes problematic even at a very early point: just sharing links within statuses. A link points to something outside of Facebook’s walled garden, but since it’s being shared by a friend, it’s assumed to be fine regardless of the actual content (unless it breaks Facebook’s content rules, in which case it can get removed, but reactively after being flagged).

Twitter shares many of the same problems: it doesn’t have bi-directional friendships, but assumes you care about the updates from the accounts you follow. They have tried to address problems by giving out “verified” badges, but this is weak — conspiracy peddler Alex Jones had one (before getting banned). Thus, we must look to other places for solutions!

Solutions from search engine land

On the surface it might seem surprising that one tech giant has stayed remarkably immune to the deluge of fake news: Google. How so? Let me be provocative for a moment: I believe that the key to Google’s immense success was not solving search better than anyone, but that the crux of it was solving reputation better than anyone (yes, yes, and search consequently).

Consider this: a Google search always gives you the results for your query sorted by relevance. The relevance itself is a function of multiple parameters, for instance the frequency of your search term in a page. One key ingredient, however, is the “reputation” of the pages, or publishers, themselves. As an example, if I were to copy a page from the BBC and host it on quicknews.ai, I would still rank nowhere near the top for those search results.

To computer science people this is no surprise: after all, the idea of introducing ranking into search results was the major coup of Google co-founders Larry Page and Sergey Brin, and was in fact the secret sauce that made Google leave their competitors in the dust. Now how does this secret sauce work? In short, Google’s PageRank score is calculated based on citations, much like scientific papers — your paper’s reputation depends on the reputation of the papers that cite it. They simply applied the same idea to the web — with links taking the place of citations.

Why can’t social media giants do this? At the risk of losing my non-technical reader, the answer is that they don’t crawl and index the web. To put it in less mystical terms, the only thing Facebook processes and stores about quicknews.ai is the link that points to the page¹. Google, or any other search engine, processes all the contents of the page, and uses the connections between pages to determine reputation.

How can these lessons be applied? The question is not a trivial one to answer, not least because of a major difference in how social media display information versus search engines: search engines rank the results they display, with recency not being a huge factor; social media, on the other hand, displays information in a news feed, generally in reverse chronological order. From a user’s perspective, then, search engines thus signal reputation through ranking. In a reverse chronological news feed we have no such option. How are we to work around this?

The coup de grâce

A simple solution presents itself: for every link to an article in a user’s social media feed, attach a number signifying the reputation of the source. For a striking visual effect, a color code can be used, from red (totally disreputable) to green (angelically honest and reputable). Begging your pardon for my atrocious graphic design skills, this is what I had in mind, on the example of the fake story on Megyn Kelly that went viral:

With something like this (the red “15” bubble, in case anyone missed it) appearing next to every article shared on social media. Could the social media giants implement this? Not trivially, since as we’ve said, they don’t have numbers like this readily available, in contrast to search engines. However, there is a way out.

The great thing about Google’s original PageRank algorithm was that it was published as a paper. Hence, variations abound, including the domain authority metric from Moz (on a 0–100 scale). What is particularly nice about that metric is that the methodology of calculating it is public. This does wonders to prevent allegations of bias from political crybabies — again, notice how many fewer complaints of this sort were levied against Google as opposed to the principal social media platforms.

So, in short, using information from a third party with a transparent methodology is a good start (Moz, Ahrefs, Open PageRank). It would even be good inasmuch as the numbers would be consistent between social networks. And since the methodology is public, anyone could implement it, even including the social networks themselves.

Now, importantly, that junk content can still get shared and re-tweeted, and thus reach people. However, due to the indicators of quality being prominent users will remain more vigilant and will (I sincerely hope) refrain from sharing dubious content or outright propaganda.

Downsides

Using methods of this sort is not without its downsides, most importantly the following:

  • Needs per-country calibration: It’s perfectly reasonable to see US sources with a domain authority of 95/100. Less so in my native Croatia, where the absolute highest domain authority I could find, per Ahrefs, was 77/100.
  • Extra vigilance required from publishers when linking: Given that linking to a site gives it relevance, it’s important for publishers discussing fake news sites to follow procedures and either not link to them, or include nofollow tags. This, however, is no different than the care they need to exercise today, lest they inadvertently improve a fake source’s search engine position.²
  • Works only for textual articles: Since links are key for determining PageRank/domain authority, it follows that this approach will work only for articles, rather than for, say, YouTube videos, which have no connections to each other that could be used to determine scores. Ditto for images and memes that get posted directly, rather than through links. These things can still get shared, and thus cascade.
  • Penalizes small, new, and niche publishers: Guess what wonderful shade of red a fly-fishing magazine might get assigned. Or for that matter quicknews.ai. Less than ideal, isn’t it?

Still, it’s a start. And indeed, a major step in the right direction. With something like this in place in 2016, many of the effects of disinformation campaigns may have been averted. Ditto for misinformation on vaccines and such topics.³ In any case, just going with this approach for a couple of crucial topics would already go a long way in preventing machinations.

Our approach

Our product, QuickNews, will practice what we preach. We’ve rolled out the feature in one of our auxiliary applications, the Corona page (which aggregates coronavirus-related articles from across the web), where you can see the implementation of the outlined approach in the real world. I hope more sites will adopt this approach as well, and do their best to help weed out the junk from the social media landscape.

[1] It also probably stores some additional metadata, but the point is that it doesn’t store or process the content of the page.

[2] Though a SEO expert might correct me, it does seem that endingthefed dot com (see, no link) was linked to by numerous sources, like The Atlantic, Gizmodo and BuzzFeed without a nofollow tag, which added to its score.

[3] Information about masks during the current crisis is a slightly different story.

The idea for the article came from both the work we’ve done on QuickNews, as well as the new book “Facebook: The Inside Story” by the incomparable Steven Levy. The reader is advised to read the book (and of course download QuickNews).

I occasionally tweet, so follow me for more updates of this sort.

--

--