Technical Development of WhoDis

Within the WhoDis website and the conditions of the SIDN partnership, we would like to explain the different stages of development, as well as thematic angles that the WhoDis project aims to target and address. Within the following blog, we will explain the development of the WhoDis Visualisation tool, as well as the challenges encountered during the development and the implications of WhoDis in an ever changing online environment.

Source: SIDN Fonds (2024)

Therefore, within the realms of the SIDN Fonds, which is kindly sponsoring the WhoDis project; there is the focus and intention to improve internet safety and allows users to use the internet innovatively to assist the safety of individuals and the state online. Therefore, to develop the lexicon to be utilised as the basis of the development of the WhoDis Visualisation tool, it was essential to identify different target groups to focus on. While identifying different target groups (LGBTQI+ community, women, people with disabilities, people whose ethnicity is not white and people with migration background), it was also important to identify different platforms to focus on ,notably Facebook, X, Tiktok and Instagram, as well as and within the context of these platforms, the nuances in language and culture behind hate speech analysed. Therefore, JfP, together with Textgain utilised a NLP (Natural Language Processing) to develop a lexicon utilising automatic and manual procedures, in order to be able to develop the WhoDis Visualisation tool to autotomise this procedure. This automatisation allows to track the spread of hate speech, by detecting the actors who spread hate, as well as the way in which certain trends are spread and shared online.

What is Natural Language Processing (NLP)

Natural Language Processing (NLP) refers to a field of AI that allows the digital processing of human language. Due to language constantly adapting due to changes in society, culture, and technology, NLP requires human intervention to ensure the fluidity of language can be processed digitally, for computers to understand, interpret and generate human language. 

According to Textgain, “Sciences often lack a deep understanding of language intricacies, which initially impacted the performance of early AI models in language processing. However, recent advances are showing a greater overlap and more consistent applications of language understanding. For instance, our project involves a specialised tool for toxicity detection, currently used by EOJ in forensic analysis. The WhoDis tool is being fine-tuned to better detect toxicity and threats, relying heavily on a team of 40–50 annotators with expertise in civil society, communication studies, and linguistics. These annotators, many of whom are students or professionals in linguistics and related fields, meticulously categorise and score phrases for toxicity. They also define the context of certain expressions, such as QAnon-specific rhetoric or slang used as insults, which a computer alone can’t interpret without extensive input. This ‘human-in-the-loop’ approach enhances our models, providing the nuanced understanding necessary for accurate language processing and is essential in all our projects.”

How to determine whether an online debate is toxic?

Source: Jost et al. (2022)

“There’s ongoing debate around the use of terms like the N-word. It’s an obvious example of toxic language, but not everyone understands why it’s harmful, just as some might not fully grasp the impact of phrases like ‘I don’t like Trump.’

These subjective interpretations lead to endless theorising” (Textgain 2024).  Therefore, TextGain developed their own  in-house toxicity measurement development,  creating a 0-to-5 toxicity scale used in our annotation environment. This allows annotators to assign a decimal score or use a slider for more precision.

Identifying the polarity of online debates helps uncover false positives, where language is strategically framed in a positive light to avoid being classified as hate speech. This tactic, aimed at bypassing content moderation, allows subversive actors to promote harmful ideas under the guise of affirming language. We observe a similar methodology offline, where such actors infiltrate safe spaces meant for human rights defenders. By adopting seemingly positive and supportive language, they manage to evade initial scrutiny, embedding themselves within these environments despite their more sinister intentions. This highlights the importance of polarity detection as a tool to reveal underlying motives masked by manipulated language.

Within the functionality of the WhoDis Tool, Textgain has developed a methodology of identifying toxicity within different messages within the tool itself, as they consider “Reviewing AI-generated output is essential, as end users hold responsibility for its interpretation.In developing our toxicity detection algorithm, we consider language subjectivity by using at least two independent annotators per language to measure inter-annotator agreement.” This ensures consistent toxicity ratings, which are then averaged by the algorithm. Although still in internal use, this approach helps us account for divided opinions and reduces bias through ‘annotation clinics’ where annotators discuss language trends and personal biases. While sentiment analysis relies on Google’s tools, which offer 60-80% accuracy, subjectivity remains inherent—similar to the adage, ‘beauty is in the eye of the beholder.’ Thus, while AI tools can detect trends, individual interpretations still impact evaluation.” (Textgain 2024).

Source: Justice for Prosperity (2024).

The platforms used within the WhoDis visualisation tool consist of Facebook, X (formerly Twitter), as well as Tiktok and Instagram. Facebook, Instagram and Tiktok were chosen for being the social media platforms present within the top 5 of the most popular social networks worldwide. We aimed to observe  how hateful rhetoric would be disseminated on a large scale, due to the widespread reach that these social media platforms have upon the population. 

We also chose X (formerly Twitter), as under Elon Musk’s new ownership of the social media platform , there has been a rise in the rhetoric of hate speech and legitimisation of hate speech on the platform. Elon Musk’s goal of legitimising hate speech is also exhibited through the fact that Elon Musk requested to legally penalise non-profit researchers who were tracking the rise of hate speech on X (AP 2024). 

WhoDis, at the present moment, is able to operate in 48 different languages from around the world that all use the Latin alphabet. At this present moment, we are in the process of developing the ability to use other languages that do not use the Latin script, such as Russian, Arabic or Mandarin – where the method of Romanization is either not possible or not accurate. Hate speech is by no means limited to English and can have unique manifestations in other languages. A lexicon for hate speech in Spanish or Polish, for instance, would need to consider specific terms and cultural nuances, which is especially of importance as lexicon helps understand the emergence of hate speech as being multifaceted as it encompasses historical, social, economic and technological perspectives. By carefully defining the scope and objectives and then methodically collecting and curating relevant data, you create a solid foundation for building a comprehensive lexicon. This lexicon can then be used to train and refine your NLP tool for more accurate detection and analysis of hate speech in the chosen contexts.

WhoDis & Hate Speech:

The motivation behind the creation of the WhoDis Visualisation tool is that within recent years, there has been a rapid development in the spread and capabilities of Artificial Intelligence, as well as how the anti-rights movement has utilised these new technological developments to their advantage. Therefore, during the development of the WhoDis tool as a method to use ‘AI for Good’, it was vital to be aware of new technological developments, such as the emergence of GPT, (transforming Large Language Models (LLMs) and to think of how LLMs can be used for good. Also, there was relatively new legislation that had emerged, which significantly influenced the development of the WhoDis tool, namely the EU GDPR regulations and Digital Service Act (DSA).  These external changes were important elements that affected how JfP would both interact with hate speech and also process hate speech during this period.

Source: United Nations (n.d). “What is Hate speech?”

The GDPR regulations influenced the process through its sets of guidelines for the collection and processing of personal information. The GDPR heavily influenced the processes for storage of past data, resulting in challenges in how to sustain both the Data processor and Data processing behind the emergence of WhoDis Visualisation tool.  The EU’s DSA, implemented in February 2024, applies to all online platforms based in the EU, regardless of their size. This legislation has set a new legal precedent, making platforms liable to tackle illegal content and disinformation. The DSA has been a very positive development to launch a multilateral approach to moderate online discourse and protect digital consumers.  

However, despite the effectiveness of due diligence obligations of online platforms to report disinformation or illegal content to prevent this from going viral. A fundamental issue remains which is that if hate speech does not contain a ‘biased or intolerant motive’, there would be no criminal offence to be punished. The DSA only allows each national jurisdiction, under the parameter of each member state appointing an individual Digital Service Coordinators (DSC) to specifically define the parameters of what is illegal content.

Therefore, the legal definition of hate speech as dangerous and illegal, with the grounds for prosecution is very ambiguous and hence it is very challenging to specifically target hate speech, as legally, hate speech can be framed as being “only an expression by a person”, as each jurisdiction has different interpretations of the ‘biased and intolerant motives’ that surround hate speech.

Since hate speech is not limited to X, Instagram, Tiktok and Facebook, JfP aims to extend the tool’s capacity to analyse hate speech dissemination on other platforms, namely YouTube, Telegram as well as websites, and PDFs.  Especially as we found within our preliminary research, that there has been an emergence of extremist conspiracy theories and hate speech being spread through websites and particularly through PDF documents. It is particularly complicated to extract information an automated way from PDFs as written text within PDFs are often stored as images, as well as PDFs often contain complex layouts and intricate formatting, which hinders the ability to extract structured text. 

Ultimately, WhoDis envisages itself to be part of a development and change through building solidarity across different like-minded patterns, as well as to diversify the target experience for different groups who are using the tool, focusing more upon either a journalistic, legal or political focus.