Cool digital humanities research on neologisms

Cool digital humanities research on neologisms

Neologisms are new words, simply put. The tech revolution has brought many such words to our everyday vocabulary like googling or youtuber. In this post, I will go through some of my digital humanities research on neologisms in modern TV shows in English and historical neologisms in old letters.

Neologisms in modern TV shows

The paper by Landert et al. (2023) presents a novel method for identifying words in TV series that appear in corpus data before being attested in dictionaries, focusing on the TV Corpus and the Oxford English Dictionary (OED). This method combines automatic extraction of candidate terms with manual analysis and verification, finding 32 words used in TV series prior to their first dictionary attestation. It highlights the significance of TV series and fictional texts in disseminating emerging vocabulary, attributing this to their mass appeal and potential to influence language development.

The analysis covers the words’ distribution across decades and genres, their origins, semantic domains, and word-formation processes, offering insights into the role of TV series in lexical innovation. This approach underscores the importance of integrating automatic and manual analyses to uncover early word attestations in large corpora, showcasing the potential of TV series as valuable resources for lexicographical and linguistic research.

TV shows can make neologisms spread like a wildfire

The analysis identified 32 words that appeared in TV series before their first recorded attestation in the OED. These words span various parts of speech, with adjectives and nouns being the most common, followed by interjections, a couple of verbs, and one adverb. Notably, the prevalence of interjections among these words suggests a distinctive pattern of informal and expressive language use in TV series.

The majority of the emerging words are English derivatives, with only one word (shuriken) identified as a borrowing from another language. This distribution underscores the creativity in English word formation, including compounding and the use of informal suffixes like -y/-sy, -o, and -ly. The words fall into three major semantic domains: the world, society, and the mind, with the largest category being ‘the world’. This reflects the broad range of topics covered in TV series, from technology and science to social interactions and personal emotions.

The first attestations of these words in the TV Corpus range from 1951 to 1989, with a surprisingly high number of words from the 1950s. This suggests that TV series from the 1950s were particularly rich sources of emerging vocabulary, possibly due to the innovative use of language in this era’s programming. The analysis also looked at the distribution of emerging words across TV series genres. Comedy was found to utilize a high number of emerging words, followed by drama and crime. This pattern indicates that certain genres, especially comedy, may be more conducive to the use and spread of new words due to their creative and informal language use.

Some TV series were identified as particularly significant sources of emerging vocabulary. For example, Perry Mason and The Phil Silvers Show were highlighted for their contributions, suggesting that certain shows have played a pivotal role in introducing and popularizing new words.

These findings demonstrate the rich potential of TV series as sources of lexical innovation and the importance of combining automatic and manual methods to uncover early word attestations in large corpora. The study highlights how TV series, through their wide appeal and creative use of language, can significantly influence the evolution of vocabulary, contributing to the dissemination and establishment of new words in English.

Neologisms in Early English Letters

The paper From Plenipotentiary to Puddingless: Users and Uses of New Words in Early English Letters by Tanja Säily and colleagues (2021) explores the use of neologisms in early English correspondence during two specific periods, 1640–1660 and 1760–1780. The study focuses on the early adopters of new vocabulary, the social groups they represent, and the functions and types of their neologisms.

A key NLP method used is neural machine translation for automatic normalization, aimed at accurately mapping historical spellings to contemporary ones. Despite challenges, this approach enabled the identification of neologisms by comparing the corpus against the Oxford English Dictionary.

Back in the days, spelling words in unconventional ways was a way to showcase one’s creativity

It turned out that male writers used neologisms more frequently than female writers, although the 18th century saw increased participation from women and lower social ranks. Neologisms were most common in letters between close friends, suggesting creative language use in less stable relationships. The influence of the English Civil War on neologism use in the 17th century and a shift towards maintaining social relationships in the 18th century. The study also highlights the challenges in normalizing historical texts and the need for manual verification of automatically generated data due to the complexity and variability of historical language use.

This research provides insights into the sociolinguistic aspects of neologism use in historical contexts, contributing to our understanding of language evolution and social dynamics in early modern English society.


Säily, T., Mäkelä, E., & Hämäläinen, M. (2021). From plenipotentiary to puddingless: Users and uses of new words in early English lettersMultilingual Facilitation.

Landert, D., Säily, T., & Hämäläinen, M. (2023). TV series as disseminators of emerging vocabulary: Non-codified expressions in the TV CorpusICAME journal47(1), 63-79.