Combination of linguistic features accurately detects fake news

Text analysis tools can detect potential fake news. They 'flag' suspicious texts that need further investigation. CWI researcher Davide Ceolin and Sandro Barres-Hamers of Vrije Universiteit Amsterdam wrote a conference paper on a promising library of such tools: faKy. The research will be presented at MISDOOM, a symposium on disinformation in online media.

Publication date
21 Nov 2023

Fake news is of all ages. But the world wide web and social media platforms give misinformation such a boost that it can disrupt society. With the rise of artificial intelligence (AI) people find it harder to distinguish information from misinformation. Also, fake news can spread extremely fast on social media and reach a large group of people in a very short period of time.

There are numerous Neural Networks and Large Language Models that can classify fake news with very high accuracy - in some cases up to 99% - but the reasoning of these models is often hardly interpretable by humans. Interpretability is fundamental to make users trust them.

Hints on misinformation

Ceolin and Barres-Hamers wanted to know whether linguistic features, obtained using Natural Language Processing (NLP), can provide a basis for assessing fake news. For this they used a library that offers a number of tools to find misinformation in texts: faKy. Features they considered, were readability (the ease with which the text is read), information complexity (quantification of the amount of information contained in the text), and sentiment analysis (emotional tone of the text).

“These tools don’t tell you if a text is true or false. They answer specific questions that you ask them. And this gives you hints: you need to check passages X and Y, because it could be fake news”, Ceolin explains. “Can the truthfulness of textual information be accurately predicted using specific linguistic features?”

Significance

The researchers used texts with political claims that were already fact checked, to test faKy’s reliability. They concluded that linguistic features can accurately predict the truthfulness of a text. Texts with misinformation are for example more complex in terms of readability, convey more information, and significantly differ between style and syntax. Ceolin and Barres Hamers wrote: “Our study highlights the significance of the textual features and shows that by combining them with machine learning classification algorithms, the truthfulness of text objects can be predicted.”

About faKy

FaKy is an extensive library that collects a comprehensive list of Natural Language Processing features known to have shown a correlation with fake news assessment. It provides a validated toolkit for extracting features from a text that are potentially correlated to fake news, thus contributing to the explainability of the assessment process.

Although faKy is still in its early stages of development, people can already download and use it.

About MISDOOM

On 21 and 22 November CWI will host the Multidisciplinary International Symposium on Disinformation in Open Online Media (MISDOOM). It has an extensive program with 6 sessions.

More information