More and more evidence is emerging into how large language models, such as Generative Pre-trained Transformer 3 (GPT-3) used by the likes of OpenAI’s advanced ChatGPT chatbot, seem to be highly vulnerable to abuse through creative prompt engineering by malicious actors.
Moreover, as the capabilities of such models hit the mainstream, new approaches will be needed to fight cyber crime and digital fraud, and everyday consumers will need to become much more sceptical about what they read and believe.
Such are some of the findings of a research project conducted by Finland’s WithSecure with support from the CC-Driver project, a project of the European Union’s Horizon 2020 programme that is focusing on disciplines such as anthropology, criminology, neurobiology and psychology in a collective effort to combat cyber crime.
WithSecure’s research team said universal access to models that deliver human-sounding text in seconds represents a “turning point” in human history.
“With the wide release of user-friendly tools that employ autoregressive language models such as GPT-3 and GPT-3.5, anyone with an internet connection can now generate human-like speech in seconds,” wrote the research team.
“The generation of versatile natural language text from a small amount of input will inevitably interest criminals, especially cyber criminals – if it hasn’t already. Likewise, anyone who uses the web to spread scams, fake news or misinformation in general may have an interest in a tool that creates credible, possibly even compelling, text at superhuman speeds.”
Andrew Patel and Jason Sattler of WithSecure conducted a series of experiments using prompt engineering, a technique used to discover inputs that can yield desirable or useful results, to produce content that they deemed harmful.
During their experiments, they explored how changing the initial human input into GPT-3 models affected the artificial intelligence (AI) text output to identify how…