“Organising Knowledge Within WhoDis: The Role of Lexicons and Taxonomies”

Developing a Lexicon for the WhoDis Project

Within the lexicon generated as part of the WhoDis Project, there was a significant focus on:

The subject of the Lexicon generated for the WhoDis Project consisted of information related to different topics that have evoked polarising differences within Dutch society and have resulted in hate speech.

Firstly, to develop the lexicon, it was essential to charter the usage of language across different target groups to highlight how specific phrases and terms vary across various demographics. Within the lexicon generated, an emphasis was observed on tracking terms used about cultural or political issues, such as anti-immigration or anti-Islam sentiment. Terms like “Defend Europe,” “14 words,” and “FGM” illustrate recurring themes in narratives targeting specific groups or ideologies. Another core emphasis found was on terms with references to conspiracy theories or harmful ideologies, examples of which included “elite pedosexuals” and “keep off the children,” which may indicate topics flagged as potentially harmful or inflammatory. A large proportion of the keywords generated to be used within the WhoDis Visualisation tool suggested a focus on anti-immigration and nationalist narratives, with recurrent phrases that could signal underlying ideological patterns, as well as a focus on conspiracy theories and potentially harmful narratives, which aligns with JfP’s mission of monitoring online disinformation and polarisation around LGBTIQ+ and other marginalised communities.

Furthermore, using language-specific keywords allowed us to understand the context of the lexicon used and its implications. We identified keywords from English, Dutch, German, French, Italian, Hungarian, and Scandinavian languages (Danish, Swedish, and Norwegian) within the lexicon generated. We approached these sections by grouping the lexicon within targeted themes to reflect terms in different languages, emphasising the importance of understanding how particular narratives and ideologies manifest across languages, allowing for targeted analysis of regional or language-specific disinformation, and tracking any language-specific narratives.

Often, the keywords found reflect a thematic focus (in this case against sexuality education). These keywords are used in misinformation to evoke emotional responses and garner support against sexuality education. Thereby, through categorizing these terms, the WhoDis project can likely monitor and analyze the spread of specific narratives tied to these emotionally charged topics

An example of the selection of lexicon gathered to be used for the WhoDis Project.

“elite van hooggeplaatste pedoseksuelen” (“elite of high-ranking pedophiles”):

Context: This phrase suggests the involvement of elites in a supposed network of child exploitation, a common theme in specific conspiracy theories. This term is likely flagged for monitoring due to its frequent association with misinformation regarding high-profile figures or organisations.

“blijf van de kinderen af” (“leave the children alone”):

Context: This phrase may appear in narratives portraying a threat to children’s safety, commonly used in online discourse that alleges endangerment of children by social or political groups.

“kindermisbruik” (“child abuse”):

Context: This is a general term but may serve as a keyword for tracking specific narratives around alleged abuse scandals. Its presence in this taxonomy likely highlights its relevance to misinformation or disinformation topics where claims of child abuse are weaponized.

“Rutgers”:

Context: Rutgers is an organization that may be targeted in misinformation related to sex education, as seen in certain narratives accusing it of inappropriate influence over children’s education.

“Week van de Lentekriebels” (“Spring Fever Week”):

Context: This is an educational initiative in the Netherlands focused on sexual education. Some misinformation narratives may use this term to criticize sex education programs.

“Hou je kinderen thuis bij deze bagger” (“Keep your children at home from this rubbish”):

Context: This phrase may relate to misinformation encouraging parents to keep children away from public education or specific programs. Such expressions are often used in narratives that portray mainstream educational content as harmful.

“onze kinderen worden gehersenspoeld” (“our children are being brainwashed”):

Context: This reflects a broader conspiracy narrative suggesting children are being indoctrinated by certain agendas, often aimed at social or political institutions accused of “brainwashing” through education.

“Laat een kind inderdaad gewoon kind zijn” (“let a child indeed just be a child”):

Context: This phrase is common in anti-sex-education rhetoric, used to argue that children should be “protected” from exposure to certain topics. It aligns with narratives suggesting that children are being forced to confront adult issues prematurely.

Developing the lexicon used in the data processing of the WhoDis tool supports the identification and analysis of language that could contribute to online polarisation or disinformation against specific communities. The lexicon generated was used to identify the origins, dissemination, and analysis of language that could contribute to online polarisation or disinformation against specific communities as part of Justice for Prosperity’s mission of using Intelligence and Security for Good.  Ultimately, the lexicon developed was intended to highlight how the collected keywords help track and analyse evolving narratives across languages and demographics, which enables more precise responses to harmful discourse in online and offline contexts.

Developing a Taxonomy for the WhoDis Project

Within the development of a structured taxonomy for the WhoDis project, we focused on developing a taxonomy that would include:

  • Keywords: Vocabulary relevant to the spread of disinformation, conspiracy theories and hate speech.
  • Accounts: Catalogues platforms and publishers associated with disinformation.
  • Actors: profiles on influential figures within misinformation networks.
  • Wishlist Sources: List compiled of potentially valuable databases or information sources.
  • Data Sources compiles accounts that are actively monitored on platforms like X for disinformation trends.

When developing the Actors (or sources) within the WhoDis website, we focused on containing platform information, URLs, and remarks on publishers or influencers who disseminated disinformation. We also recorded the social media and publication sources relevant to the project’s tracking of misinformation.

We focused on individuals or groups identified as influencers or critical figures in spreading conspiracy theories. This included their social media profiles, relevancy as conspiracy theorists, links, and additional comments.

Firstly, we collected all the keywords based on desk research and Open Source Intelligence (OSINT). Secondly, we identified different accounts and the sources from which we collected our data. The listed accounts contain platform information, URLs, and remarks on publishers or influencers disseminating disinformation. Within this process, we catalogued all the different social media and publication sources relevant to the project’s tracking of hate speech and conspiracy theories over time and how (and who) spreads this over a given period. This led us to identify specific actors, focusing on individuals or groups identified as influencers or critical figures in spreading conspiracy theories. We identified their social media profiles and previous history of spreading hate speech and/or disinformation. The actors’ platforms contained platform information, URLs, and remarks.

Finally, we identified other sources that we wanted to cross-check. Within this process, we included the name, URL, platform, and relevance of individuals or sources, with possible additional remarks. During our work with external partners, such as the article Justice for Prosperity published with De Groene Amsterdammer in June 2024, where we focused on this process to identify different wishlists of data, in which we developed a list of source names, URLs, benefits, and costs, providing a structured list of potential external data sources to provide further data or context on the topics of interests, such as the financial records of individuals who are part of a coordinated effort to disseminate hate speech, which we can observe through the WhoDis visualisation tool.

You can read more about the lexicon used within the WhoDis Visualisation Tool and the date generated below: