Fighting fake news has become a growing problem in the past few years, and one that begs for a solution involving artificial intelligence. Verifying the near-infinite amount of content being generated on news websites, video streaming services, blogs, social media, etc. is virtually impossible
There has been a push to use machine learning in the moderation of online content, but those efforts have only had modest success in finding spam and removing adult content, and to a much lesser extent detecting hate speech.
Fighting fake news is a much more complicated challenge. Fact-checking websites such as Snopes, FactCheck.org, and PolitiFact do a decent job of impartially verifying rumors, news, and remarks made by politicians. But they have limited reach.
It would be unreasonable to expect current artificial intelligence technologies to fully automate the fight against fake news. But there’s hope that the use of deep learning can help automate some of the steps of the fake news detection pipeline and augment the capabilities of human fact-checkers.
In a paper presented at the 2019 NeurIPS AI conference, researchers at DarwinAI and Canada’s University of Waterloo presented an AI system that uses advanced language models to automate stance detection, an important first step toward identifying disinformation.
The automated fake-news detection pipeline
Before creating an AI system that can fight fake news, we must first understand the requirements of verifying the veracity of a claim. In their paper, the AI researchers break down the process into the following steps:
- Retrieving documents that are relevant to the claim
- Detecting the stance or position of those documents with respect to the claim
- Calculating a reputation score for the document, based on its source and language quality
- Verify the claim based on the information obtained from the relevant documents
Instead of going for an end-to-end AI-powered fake-news detector that takes a piece of news as input and outputs “fake” or “real”, the researchers focused on the second step of the pipeline. They created an AI algorithm that determines whether a certain document agrees, disagrees, or takes no stance on a specific claim.
Using transformers to detect stance
This is not the first effort to use AI for stance detection. Previous research has used various AI algorithms and components, including recurrent neural networks (RNN), long short-term memory (LSTM) models, and multi-layer perceptrons, all relevant and useful artificial neural network (ANN) architectures. The efforts have also leveraged other research done in the field, such as work on “word embeddings,” numerical vector representations of relationships between words that make them understandable for neural networks.
However, while those techniques have been efficient for some tasks such as machine translation, they have had limited success on stance detection. “Previous approaches to stance detection were typically earmarked by hand-designed features or word embeddings, both of which had limited expressiveness to represent the complexities of language,” says Alex Wong, co-founder and chief scientist at DarwinAI.
The new technique uses a transformer, a type of deep learning algorithm that has become popular in the past couple of years. Transformers are used in state-of-the-art language models such as GPT-2 and Meena. Though transformers still suffer from the fundamental flaws, they are much better than their predecessors in handling large corpora of text.
Transformers use special techniques to find the relevant bits of information in a sequence of bytes instead. This enables them to become much more memory-efficient than other deep learning algorithms in handling large sequences. Transformers are also an unsupervised machine learning algorithm, which means they don’t require the time- and labor-intensive data-labeling work that goes into most contemporary AI work.
“The beauty of bidirectional transformer language models is that they allow very large text corpuses to be used to obtain a rich, deep understanding of language,” Wong says. “This understanding can then be leveraged to facilitate better decision-making when it comes to the problem of stance detection.”
Transformers come in different flavors. The University of Waterloo researchers used a variation of BERT (RoBERTa), also known as deep bidirectional transformer. RoBERTa, developed by Facebook in 2019, is an open-source language model.
Transformers still require very large compute resources in the training phase (our back-of-the-envelope calculation of Meena’s training costs amounted to approx. $1.5 million). Not everyone has this kind of money to spare. The advantage of using ready models like RoBERTa is that researchers can perform transfer learning, which means they only need to fine-tune the AI for their specific problem domain. This saves them a lot of time and money in the training phase.
“A significant advantage of deep bidirectional transformer language models is that we can harness pre-trained models, which have already been trained on very large datasets using significant computing resources, and then fine-tune them for specific tasks such as stance-detection,” Wong says.
Using transfer learning, the University of Waterloo researchers were able to fine-tune RoBERTa for stance-detection with a single Nvidia GeForce GTX 1080 Ti card (approx. $700).
The stance dataset
For stance detection, the researchers used the dataset used in the Fake News Challenge (FNC-1), a competition launched in 2017 to test and expand the capabilities of AI in detecting online disinformation. The dataset consists of 50,000 articles as training data and a 25,000-article test set. The AI takes as input the headline and text of an article, and outputs the stance of the text relative to the headline. The body of the article may agree or disagree with the claim made in the headline, may discuss it without taking a stance, may be unrelated to the topic.
The RoBERTa-based stance-detection model presented by the University of Waterloo researchers scored better than the AI models that won the original FNC competition as well as other algorithms that have been developed since.
Fake News Challenge (FNC-1) results: The first three rows are the language models that won the original competition (2017). The next five rows are AI models that have been developed in the following years. The final row is the transformer-based approach proposed by researchers at the University of Waterloo.
To be clear, developing AI benchmarks and evaluation methods that are representative of the messiness and unpredictability of the real world is very difficult, especially when it comes to natural language processing.
The organizers of FNC-1 have gone to great lengths to make the benchmark dataset reflective of real-world scenarios. They have derived their data from the Emergent Project, a real-time rumor tracker created by the Tow Center for Digital Journalism at Columbia University. But while the FNC-1 dataset has proven to be a reliable benchmark for stance detection, there is also criticism that it is not distributed enough to represent all classes of outcomes.
“The challenges of fake news are continuously evolving,” Wong says. “Like cybersecurity, there is a tit-for-tat between those spreading misinformation and researchers combatting the problem.”
The limits of AI-based stance detection
One of the very positive aspects of the work done by the researchers of the University of Waterloo is that they have acknowledged the limits of their deep learning model (a practice that I wish some large AI research labs would adopt as well).
For one thing, the researchers stress that this AI system will be one of the many pieces that should come together to deal with fake news. Other tools that need to be developed in the area of gathering documents, verifying their reputation, and making a final decision about the claim in question. Those are active areas of research.
The researchers also stress the need to integrate AI tools into human-controlled procedures. “Provided these elements can be developed, the first intended end-users of an automated fact-checking system should be journalists and fact-checkers. Validation of the system through the lens of experts of the fact-checking process is something that the system’s performance on benchmark datasets cannot provide,” the researchers observe in their paper.
The researchers explicitly warn about the consequences of blindly trusting machine learning algorithms to make decisions about truth. “A potential unintended negative outcome of this work is for people to take the outputs of an automated fact-checking system as the definitive truth, without using their own judgment, or for malicious actors to selectively promote claims that may be misclassified by the model but adhere to their own agenda,” the researchers write.
This is one of many projects that show the benefits of combining artificial intelligence and human expertise. “In general, we combine the experience and creativity of human beings with the speed and meticulousness afforded by AI. To this end, AI efforts to combat fake news are simply tools that fact-checkers and journalists should use before they decide if a given article is fraudulent,” Wong says. “What an AI system can do is provide some statistical assurance about the claims in a given news piece. That is, given a headline, they can surface that, for example, 5,000 ‘other’ articles disagree with the claim whereas only 50 support it. Such as distinction would serve a warning to the individual to doubt the veracity of what they are reading.”
One of the central efforts of DarwinAI, Wong’s company, is to tackle AI’s explainability problem. Deep learning algorithms develop very complex representations of their training data, and it’s often very difficult to understand the factors behind their output. Explainable AI aims to bring transparency to deep learning decision-making. “In the case of misinformation, our goal is to provide journalists with an understanding of the critical factors that led to a piece of news being classified as fake,” Wong says.
The team’s next step is to tackle reputation-assessment to validate the truthfulness of an article through its source and linguistics characteristics.