Home » Uncategorized » The Words That Give Away Generative AI Text

The Words That Give Away Generative AI Text

Eric Elliot

Home » Uncategorized » The Words That Give Away Generative AI Text

As generative AI models become more sophisticated, detecting AI-generated text has become a significant focus for researchers and companies alike. A recent study highlighted by WIRED reveals that certain words have surged in usage following the mainstream adoption of large language models (LLMs). This surge provides a clue for identifying AI-generated content.

Key Findings

Researchers found that words like “delves,” “showcasing,” and “underscores” have increased significantly in frequency in scientific papers post-2023. These words were relatively uncommon before the widespread use of LLMs. For example, “delves” appeared 25 times more frequently in 2024 papers compared to pre-LLM trends. Similarly, the words “showcasing” and “underscores” saw a ninefold increase.

Common Marker Words

The study identified hundreds of such “marker words” that indicate potential AI authorship. These include not just rare nouns but also common style words such as verbs, adjectives, and adverbs. Examples include “additionally,” “comprehensive,” “crucial,” and “notably.” This shift contrasts with previous spikes in specific nouns during major world events, like “ebola” and “coronavirus” during their respective outbreaks​.

Implications and Detection

Detecting AI-generated text is crucial due to the propensity of LLMs to generate plausible yet incorrect or misleading information. By identifying these marker words, researchers can estimate the extent of LLM use in various texts. For instance, at least 10% of post-2022 papers in the PubMed database showed signs of LLM assistance. This percentage was even higher in countries where non-native English speakers might use LLMs for editing​​.

Future Directions

As AI models and their usage evolve, so will the methods for detecting AI-generated content. Researchers suggest that future models might even adjust their outputs to avoid using identifiable marker words, further complicating detection efforts. This continuous interplay between AI development and detection underscores the need for advanced tools and strategies to maintain the integrity of written content​.