The views expressed by contributors are their own and not the view of The Hill

Taming the fake web starts in 2019

by Kartik Hosanagar, opinion contributor - 01/31/19 3:30 PM ET

Consider these facts:

As much as half of the traffic on the web is generated by bots.

For certain product categories, the vast majority of reviews on Amazon are fake.

False news stories on the web reach more people and spread faster than true news stories.

Trust-sapping failures of the web were a dominant theme of 2018, with fake social media accounts, identity theft, online fraud and false news all contributing to a sense that our digital life is a tangle of lies. Can we expect anything different in 2019? Is this the year we finally tame the fake web or are our woes just beginning?

The fake web: A pervasive problem

It’s hard to pin down a precise number of fake accounts on the web, but Facebook gives us a glance.

The company disabled nearly 1.3 billion accounts over six months ending in March 2018 (disabled accounts included both fake profiles as well as duplicate or misclassified accounts), or nearly 25 percent of all active accounts on the platform. Twitter’s proportions are similar, and experts agree this is just the tip of the iceberg.

Meanwhile, fake reviews on websites such as Amazon and Yelp are pervasive, and even some of the purchasing activity on the web is fake.

“Brushing” — when sellers generate fake orders for their own products — increases the sales volume for these sellers, enabling their products to rank higher in the search results on Amazon and Alibaba.

According to an executive at Alibaba, 1.2 million merchants on its Taobao platform, nearly one-fifth of all its vendors, faked transactions worth 10 billion RMB ($1.45 billion) in one year.

Why 2019 will bring some change

There are three ways to approach the problems of the fake web: prevention, education and detection.

Technical solutions focused on prevention usually attempt to verify the identities of people or accounts that create and share content. Deploying these solutions effectively will require a paradigm shift, however, in how we use the internet.

Anonymity and pseudonymity are built into the internet, and users will resist efforts to tie online accounts to real identities. This is why I think it is unrealistic to imagine we can rid the web of fake accounts and activity.

Nor am I terribly optimistic about education. For consumers, it is clear that we all have to be cautious when we interact with people online or make decisions based on online information.

Consumer education will be important, but hoping that better-informed citizens will help put an end to fake news is being too optimistic. So, I don’t think education will be the silver bullet in the fight against fake media.

That leaves us with detection. That is, developing machine-learning algorithms that can evaluate content and classify it automatically as real or fake.

These algorithms would tackle everything, including analyzing accounts that post or share content, looking at writing styles and determining the extent to which similar information is reported elsewhere.

Not surprisingly, there is a lot of academic research going on in this area, and some studies report accuracy rates over 90 percent.

If fake content can be detected with high-enough accuracy, we will be able to not only reduce its reach but also drive up the cost of distributing it, serving as a deterrent to creating it. Some startups are starting to take the research from academia into practice.

Is 100-percent detection a reasonable ask of these algorithms? No. But I do believe that we can get to a point where fake accounts don’t have the kind of impact they currently do.

Look at the problem of click fraud in search engines. In the mid-2000s, merchants advertising on Google started reporting that a lot of clicks they were getting from their Google ad campaigns were generated by bots. These fraudulent clicks cost the advertisers money, but the resulting traffic was worthless.

Google ignored the issue initially in part because these clicks helped boost ad revenues. But as discontent spread and advertisers threatened to pull out, the search engine recognized that click fraud was a fundamental threat to its ad-supported revenue model.

{mossecondads}Some of its best engineers started to put their minds to detecting click fraud, and based on patterns of who clicked on which ad and when, Google’s algorithms got better at identifying whether a click was from a legitimate user or a bot. Although the issue hasn’t completely disappeared, it is no longer a critical factor ailing online advertising.

I believe that something similar is about to happen with fake accounts and fake information.

With social media platforms facing flak from their users, they are starting to realize the seriousness of the matter. After all, social media is built on users trusting the platform, and that trust is now at stake.

Further, the issue isn’t far off from affecting these companies’ revenues: Advertisers are now realizing that fake accounts artificially inflate the engagement their brands receive on social media and have become more skeptical about social media advertising. In extremity, this would spell the end of the social media business model.

Thus, here is my prediction for the near future: Social platforms will put some of their best minds to work on the problem of detecting fake accounts and content, and this should lead to fake accounts following the path of click fraud.

While it may take a few years to fix the problem, 2019 will be the inflection point in the story of how we tamed the fake web.

Kartik Hosanagar is the John C. Hower professor at the Wharton School of The University of Pennsylvania where he studies technology and the digital economy. He is the author of “A Human’s Guide to Machine Intelligence.”