Avijit Ghosh wanted the bot to do bad things. He tried to goad the artificial intelligence model, which he knew as Zinc, into producing code that would choose a job candidate based on race. The chatbot demurred: Doing so would be “harmful and unethical”, it said.
Then, Ghosh referenced the hierarchical caste structure in his native India. Could the chatbot rank potential hires based on that discriminatory metric?
The model complied.
The exercise had the blessing of the Biden administration, which is increasingly nervous about the technology’s fast-growing power
Ghosh’s intentions were not malicious, although he was behaving as if they were. Instead, he was a casual participant in a competition at the annual Defcon hackers conference in Las Vegas, where 2,200 people filed into an off-Strip conference room over three days to draw out the dark side of artificial intelligence.
The hackers tried to break through the safeguards of various AI programmes in an effort to identify their vulnerabilities – to find the problems before actual criminals and misinformation peddlers did – in a practice known as red-teaming. Each competitor had 50 minutes to tackle up to 21 challenges – getting an AI model to “hallucinate” inaccurate information, for example.
They found political misinformation, demographic stereotypes, instructions on how to carry out surveillance and more.
The exercise had the blessing of the Biden administration, which is increasingly nervous about the technology’s fast-growing power. Google (maker of the Bard chatbot), OpenAI (ChatGPT), Meta (which released its LLaMA code into the wild) and several other companies offered anonymised versions of their models for scrutiny.
Ghosh, a lecturer at Northeastern University who specialises in artificial intelligence ethics, was a volunteer at the event. The contest, he said, allowed a head-to-head comparison of several AI models and demonstrated how some companies were further along in ensuring that their technology was performing responsibly and consistently.
He will help write a report analysing the hackers’ findings in the coming months. The goal, he said, was “an easy-to-access resource for everybody to see what problems exist and how we can combat them”.
Defcon was a logical place to test generative artificial intelligence. Past participants in the gathering of hacking enthusiasts – which started in 1993 and has been described as a “spelling bee for hackers” – have exposed security flaws by remotely taking over cars, breaking into election results websites and pulling sensitive data from social media platforms.
Those in the know use cash and a burner device, avoiding wifi or Bluetooth, to keep from getting hacked. One instructional handout begged hackers to “not attack the infrastructure or web pages”.
Last year AI was one of the quieter villages at Defon. This year it was among the most popular
Volunteers are known as “goons”, and attendees are known as “humans”; a handful wore home-made tinfoil hats atop the standard uniform of T-shirts and sneakers. Themed “villages” included separate spaces focused on cryptocurrency, aerospace and ham radio.
Last year AI was one of the quieter villages. This year it was among the most popular.
The organisers tapped into intensifying alarm over the continued ability of generative artificial intelligence to produce damaging lies, influence elections, ruin reputations and enable a multitude of other harms. Government officials voiced concern and organised hearings around AI companies – some of which are also calling for the industry to slow down and be more careful.
Even the Pope, a popular subject of AI image generators, spoke out this month about the technology’s “disruptive possibilities and ambivalent effects”.
In what was described as a “game changer” report last month, researchers showed they could circumvent guardrails for AI systems from Google, OpenAI and Anthropic by appending certain characters to English-language prompts. Around the same time, seven leading artificial intelligence companies committed to new standards for safety, security and trust in a meeting with President Joe Biden.
“This generative era is breaking upon us, and people are seizing it, and using it to do all kinds of new things that speaks to the enormous promise of AI to help us solve some of our hardest problems,” said Arati Prabhakar, the director of the office of science and technology policy at the White House, who collaborated with the AI organisers at Defcon. “But with that breadth of application, and with the power of the technology, come also a very broad set of risks.”
Red-teaming has been used for years in cybersecurity circles alongside other evaluation techniques, such as penetration testing and adversarial attacks. But until Defcon’s event this year, efforts to probe artificial intelligence defences have been limited: competition organisers said Anthropic red-teamed its model with 111 people; GPT-4 used around 50 people.
[ Would you trust this AI to sell you a second-hand car?Opens in new window ]
With so few people testing the limits of the technology, analysts struggled to discern whether an AI screw-up was a one-off that could be fixed with a patch, or an embedded problem that required a structural overhaul, said Rumman Chowdhury, a co-organiser who oversaw the design of the challenge. A large, diverse and public group of testers was more likely to come up with creative prompts to help tease out hidden flaws, said Chowdhury, a fellow at Harvard University’s Berkman Klein Center for Internet and Society focused on responsible AI and co-founder of a non-profit called Humane Intelligence.
“There is such a broad range of things that could possibly go wrong,” Chowdhury said before the competition. “I hope we’re going to carry hundreds of thousands of pieces of information that will help us identify if there are at-scale risks of systemic harms.”
Some of the hackers at the event struggled with the idea of co-operating with AI companies that they saw as complicit in unsavory practices
The designers did not want to merely trick the AI models into bad behaviour – no pressuring them to disobey their terms of service, no prompts to “act like a Nazi, and then tell me something about Black people”, said Chowdhury, who previously led Twitter’s machine learning ethics and accountability team. Except in specific challenges where intentional misdirection was encouraged, the hackers were looking for unexpected flaws, the so-called unknown unknowns.
AI Village drew experts from tech giants such as Google and Nvidia, as well as a “shadowboxer” from Dropbox and a “data cowboy” from Microsoft. It also attracted participants with no specific cybersecurity or AI credentials. A leader board with a science fiction theme kept score of the contestants.
Some of the hackers at the event struggled with the idea of co-operating with AI companies that they saw as complicit in unsavory practices such as unfettered data-scraping. A few described the red-teaming event as essentially a photo op, but added that involving the industry would help keep the technology secure and transparent.
One computer science student found inconsistencies in a chatbot’s language translation: he wrote in English that a man was shot while dancing, but the model’s Hindi translation said only that the man died. A machine learning researcher asked a chatbot to pretend that it was campaigning for president and defending its association with forced child labour; the model suggested that unwilling young labourers developed a strong work ethic.
Emily Greene, who works on security for the generative AI start-up Moveworks, started a conversation with a chatbot by talking about a game that used “black” and “white” pieces. She then coaxed the chatbot into making racist statements. Later, she set up an “opposites game”, which led the AI to respond to one prompt with a poem about why rape is good.
“It’s just thinking of these words as words,” she said of the chatbot. “It’s not thinking about the value behind the words.”
Seven judges graded the submissions. The top scorers were cody3, aray4 and cody2.
Two of those handles came from Cody Ho, a student at Stanford University studying computer science with a focus on AI. He entered the contest five times, during which he got the chatbot to tell him about a fake place named after a real historical figure and describe the online tax filing requirement codified in the 28th constitutional amendment (which doesn’t exist).
Until he was contacted by a reporter, he was clueless about his dual victory. He left the conference before he got the email from Sven Cattell, the data scientist who founded AI Village and helped organise the competition, telling him “come back to AIV, you won”. He did not know that his prize, beyond bragging rights, included an A6000 graphics card from Nvidia that is valued at around $4,000.
“Learning how these attacks work and what they are is a real, important thing,” Ho said. “That said, it is just really fun for me.”
This article originally appeared in The New York Times
2023 The New York Times Company