These subjective interpretations lead to endless theorising” (Textgain 2024). Therefore, TextGain developed their own in-house toxicity measurement development, creating a 0-to-5 toxicity scale used in our annotation environment. This allows annotators to assign a decimal score or use a slider for more precision.
Identifying the polarity of online debates helps uncover false positives, where language is strategically framed in a positive light to avoid being classified as hate speech. This tactic, aimed at bypassing content moderation, allows subversive actors to promote harmful ideas under the guise of affirming language. We observe a similar methodology offline, where such actors infiltrate safe spaces meant for human rights defenders. By adopting seemingly positive and supportive language, they manage to evade initial scrutiny, embedding themselves within these environments despite their more sinister intentions. This highlights the importance of polarity detection as a tool to reveal underlying motives masked by manipulated language.
Within the functionality of the WhoDis Tool, Textgain has developed a methodology of identifying toxicity within different messages within the tool itself, as they consider “Reviewing AI-generated output is essential, as end users hold responsibility for its interpretation.In developing our toxicity detection algorithm, we consider language subjectivity by using at least two independent annotators per language to measure inter-annotator agreement.” This ensures consistent toxicity ratings, which are then averaged by the algorithm. Although still in internal use, this approach helps us account for divided opinions and reduces bias through ‘annotation clinics’ where annotators discuss language trends and personal biases. While sentiment analysis relies on Google’s tools, which offer 60-80% accuracy, subjectivity remains inherent—similar to the adage, ‘beauty is in the eye of the beholder.’ Thus, while AI tools can detect trends, individual interpretations still impact evaluation.” (Textgain 2024).
Source: Justice for Prosperity (2024).
The platforms used within the WhoDis visualisation tool consist of Facebook, X (formerly Twitter), as well as Tiktok and Instagram. Facebook, Instagram and Tiktok were chosen for being the social media platforms present within the top 5 of the most popular social networks worldwide. We aimed to observe how hateful rhetoric would be disseminated on a large scale, due to the widespread reach that these social media platforms have upon the population.
We also chose X (formerly Twitter), as under Elon Musk’s new ownership of the social media platform , there has been a rise in the rhetoric of hate speech and legitimisation of hate speech on the platform. Elon Musk’s goal of legitimising hate speech is also exhibited through the fact that Elon Musk requested to legally penalise non-profit researchers who were tracking the rise of hate speech on X (AP 2024).
WhoDis, at the present moment, is able to operate in 48 different languages from around the world that all use the Latin alphabet. At this present moment, we are in the process of developing the ability to use other languages that do not use the Latin script, such as Russian, Arabic or Mandarin – where the method of Romanization is either not possible or not accurate. Hate speech is by no means limited to English and can have unique manifestations in other languages. A lexicon for hate speech in Spanish or Polish, for instance, would need to consider specific terms and cultural nuances, which is especially of importance as lexicon helps understand the emergence of hate speech as being multifaceted as it encompasses historical, social, economic and technological perspectives. By carefully defining the scope and objectives and then methodically collecting and curating relevant data, you create a solid foundation for building a comprehensive lexicon. This lexicon can then be used to train and refine your NLP tool for more accurate detection and analysis of hate speech in the chosen contexts.
WhoDis & Hate Speech:
The motivation behind the creation of the WhoDis Visualisation tool is that within recent years, there has been a rapid development in the spread and capabilities of Artificial Intelligence, as well as how the anti-rights movement has utilised these new technological developments to their advantage. Therefore, during the development of the WhoDis tool as a method to use ‘AI for Good’, it was vital to be aware of new technological developments, such as the emergence of GPT, (transforming Large Language Models (LLMs) and to think of how LLMs can be used for good. Also, there was relatively new legislation that had emerged, which significantly influenced the development of the WhoDis tool, namely the EU GDPR regulations and Digital Service Act (DSA). These external changes were important elements that affected how JfP would both interact with hate speech and also process hate speech during this period.
Source: United Nations (n.d). “What is Hate speech?”
The GDPR regulations influenced the process through its sets of guidelines for the collection and processing of personal information. The GDPR heavily influenced the processes for storage of past data, resulting in challenges in how to sustain both the Data processor and Data processing behind the emergence of WhoDis Visualisation tool. The EU’s DSA, implemented in February 2024, applies to all online platforms based in the EU, regardless of their size. This legislation has set a new legal precedent, making platforms liable to tackle illegal content and disinformation. The DSA has been a very positive development to launch a multilateral approach to moderate online discourse and protect digital consumers.
However, despite the effectiveness of due diligence obligations of online platforms to report disinformation or illegal content to prevent this from going viral. A fundamental issue remains which is that if hate speech does not contain a ‘biased or intolerant motive’, there would be no criminal offence to be punished. The DSA only allows each national jurisdiction, under the parameter of each member state appointing an individual Digital Service Coordinators (DSC) to specifically define the parameters of what is illegal content.
Therefore, the legal definition of hate speech as dangerous and illegal, with the grounds for prosecution is very ambiguous and hence it is very challenging to specifically target hate speech, as legally, hate speech can be framed as being “only an expression by a person”, as each jurisdiction has different interpretations of the ‘biased and intolerant motives’ that surround hate speech.
Since hate speech is not limited to X, Instagram, Tiktok and Facebook, JfP aims to extend the tool’s capacity to analyse hate speech dissemination on other platforms, namely YouTube, Telegram as well as websites, and PDFs. Especially as we found within our preliminary research, that there has been an emergence of extremist conspiracy theories and hate speech being spread through websites and particularly through PDF documents. It is particularly complicated to extract information an automated way from PDFs as written text within PDFs are often stored as images, as well as PDFs often contain complex layouts and intricate formatting, which hinders the ability to extract structured text.
Ultimately, WhoDis envisages itself to be part of a development and change through building solidarity across different like-minded patterns, as well as to diversify the target experience for different groups who are using the tool, focusing more upon either a journalistic, legal or political focus.