The views expressed by contributors are their own and not the view of The Hill

What will it take to secure AI models? Breaking them.

by John Dwyer, opinion contributor - 09/08/23 11:00 AM ET

Photo illustration showing ChatGPT and OpenAI research laboratory logo and inscription on a mobile phone smartphone screen (Nicolas Economou/NurPhoto via Getty Images)

If you’re wondering why legislators and businesses alike have trust issues when it comes to the notion of secure artificial intelligence, consider this: In the past five months alone, we’ve witnessed high-profile Senate hearings on AI oversight, nearly 2,500 hackers attacking large language models at once and a subscription-based malicious AI chatbot released on the dark web.

Need I say more?

Rarely is a technology under the magnifying glass of scrutiny to the extent AI is. But while many view this as a reason for gloom and distrust, I would argue that it is actually why we’re better positioned to deliver secure AI than many think. Understanding the various ways AI models can break isn’t cause for alarm, it’s knowledge that drives mitigative action and preparedness.

Every innovation needs to be approached from a position of responsibility and conservatism — AI too. Not because artificial intelligence means artificial overlords are bound to take over, but because when we’re introduced to any new technology, we don’t know what we don’t know. And what you don’t know, you won’t inherently trust.

Just think back to the early days of the internet — organizations didn’t immediately embrace it, let alone make their businesses dependent on it. Trust takes time, yet in this case, amid nations’ global sprint to establish an AI lead, time is of the essence.

That urgency has instigated a first-of-its-kind crowdsourced security think tank — uniting scientists, hackers, policymakers, developers and so on in a quest to secure AI. Security is not an afterthought but the starting line of a global AI race.

That pressure to secure AI is founded on high stakes. The touchpoints between AI and our lives are very intimate — whether that’s driving a car, climate change advancements or helping manage our food supply. Naturally, the slightest possibility that any of these activities could be manipulated is going to be met with an unwavering response from regulators.

That’s why we must break AI apart. In security, to protect a system — whether software or hardware — we often tear it down. We figure out exactly how it works but also what other functions we can make the system do that it wasn’t intended to. For example, through security testing, we discovered that there are ways to get a train to derail from its tracks. That know-how allowed us to create preventative ways to stop it from happening in a real-world instance. Same with ATM machines ejecting unsolicited cash. And so forth.

The same approach applies to AI models.

Just last month, IBM (I am head of research for IBM X-Force) illustrated how an attacker could hypnotize large language models like ChatGPT to serve malicious purposes without requiring technical tactics, like exploiting a vulnerability, but rather simple use of English prompts. From leaking confidential financial information and personally identifiable information to writing vulnerable and even malicious code, the test uncovered an entirely new dimension to language learning models as an attack surface. We are now aware of this risk and can create appropriate mitigation practices for it before adversaries are able to materially capitalize on it and scale.

Shortly after, with the Biden-Harris administration’s support, thousands of offensive security professionals gathered in Las Vegas to attack multiple popular large language models in a bid to discover flaws and exploitable vulnerabilities that could serve malicious objectives or that could otherwise produce unreliable results, like bad math. Each of those “fire drills” were assessed and are being addressed before they could manifest into active threats.

Risk varies depending on the implementation of AI models which is why by “breaking” AI models we can better understand, assess and clearly define the various levels of risk that governments and businesses alike need to manage — and even establish precise rules to govern harmful uses of AI. Looking at AI through the adversarial lens will help us not only determine where and how models are most “attackable,” but what newfound objectives attackers could pursue.

However, security for AI is broader than the AI itself. It’s not limited solely to ensuring that the usage and the underlying data are secure — though that is undoubtedly essential. The broader infrastructure surrounding AI models needs to be viewed as AI’s defense mechanism. And therein lies another industry advantage — placing security and privacy guardrails across the broader IT infrastructure to safeguard AI is an area in which we have deep knowledge and expertise. We have the tools to do this. We have the governance and compliance know-how to enforce.

It’s clear that with AI, the threat model has now changed — nations and businesses need to adapt to that. But much of the fear around overhyped threats and risks AI poses stems from a lack of understanding of how AI models are constructed and how they’re embedded across the enterprise stack.

It’s incumbent on the industry to collectively help governments, policymakers and business leaders understand AI as a whole; not simply how they can benefit from its outcomes but how AI actually functions and is embedded into an IT infrastructure. Only then can we establish a pragmatic understanding of AI risks and their manageability.

Secure AI is not a pipe dream — it’s a tangible benchmark with clear guidelines. And that was made possible by breaking the precise thing we’re trying to secure.

John Dwyer is head of research for IBM X-Force.