The Codera Trump Index

In today’s post, Jacques Quass de Vos constructs a ‘Trump Index’ and ‘Trump-Trade Index’ to measure the frequency of mentions of US President Trump and the combination of ‘Trump’ and ‘Trade’ in the same South African and global news stories.

They are constructed using a simple regular expression (regex) matching algorithm:

  • First, we match patterns containing Donald Trump’s full or last name; e.g., “Trump administration”, “President Donald Trump”, etc.
  • We also employ contextual rules to ensure that cases which do not explicitly mention Trump’s full name or office are also included. These rules include the use of possessive forms (indicating a person as the subject) and verbs related to leadership positions; word pairs like “Trump’s campaign” and “Trump announced” are thereby included.
  • Negative look-arounds ensure that only whole words are included (“trumpet” will be omitted).
  • Finally, case sensitivity excludes words like “trump card.”

Additionally, we validate our Trump Index results using spaCy’s Named Entity Recognition (NER):

  • spaCy’s NER system uses a deep neural network which is pre-trained to recognise words and sort them into common categories (PERSON, ORG, DATE, EVENT, and so on).
  • In the validation, we ensure that mentions of “Trump” must be recognised as a PERSON.

According to our validation approach, the regex model yields a 98% accuracy. Upon inspection, we find that the 2% which are flagged as false positives are produced by word pairs we’d like to keep (e.g., “the Trump administration”). Following validation, however, we explicitly removed mentions of “Trump Tower(s)” which do not include other instances referring to Donald Trump.

The Trump-Trade Index works similarly, where we match all mentions of Trump to common trade uncertainty terms:

  • We look for complete phrases, like “import/export tariff(s)”, “trade war”, “trade uncertainty”, “protectionist policy”.
  • These terms alone proved to flag too many articles as false negatives, so we included instances where “tariffs” and “trade” are followed by contextually relevant prepositions, so that “tariffs on…[China]” or “trade with…[South Africa]” would be included.
  • Trade uncertainty terms are case-insensitive (both “Tariff” or “tariff” are accepted).

Codera Blog Newsletter

Sign up to receive a weekly summary of our blog posts

Check your inbox for a confirmation email