Facebook’s ‘Red Team’ Hacks Its Own AI Programs

In 2018, Canton organized a “risk-a-thon” in which people from across Facebook spent three days competing to find the most striking way to trip up those systems. Some teams found weaknesses that Canton says convinced him the company needed to make its AI systems more robust.

One team at the contest showed that using different languages within a post could befuddle Facebook’s automated hate-speech filters. A second discovered the attack used in early 2019 to spread porn on Instagram, but it wasn’t considered an immediate priority to fix at the time. “We forecast the future,” Canton says. “That inspired me that this should be my day job.”

In the past year, Canton’s team has probed Facebook’s moderation systems. It also began working with another research team inside the company that has built a simulated version of Facebook called WW that can be used as a virtual playground to safely study bad behavior. One project is examining the circulation of posts offering goods banned on the social network, such as recreational drugs.

The red team’s weightiest project aims to better understand deepfakes, imagery generated using AI that looks like it was captured with a camera. The results show that preventing AI trickery isn’t easy.

Keep Reading

Deepfake technology is becoming easier to access and has been used for targeted harassment. When Canton’s group formed last year, researchers had begun to publish ideas for how to automatically filter out deepfakes. But he found some results suspicious. “There was no way to measure progress,” he says. “Some people were reporting 99 percent accuracy, and we were like ‘That is not true.’”

Facebook’s AI red team launched a project called the Deepfakes Detection Challenge to spur advances in detecting AI-generated videos. It paid 4,000 actors to star in videos featuring a variety of genders, skin tones, and ages. After Facebook engineers turned some of the clips into deepfakes by swapping people’s faces around, developers were challenged to create software that could spot the simulacra.

The results, released last month, show that the best algorithm could spot deepfakes not in Facebook’s collection only 65 percent of the time. That suggests Facebook isn’t likely to be able to reliably detect deepfakes soon. “It’s a really hard problem, and it’s not solved,” Canton says.

Canton’s team is now examining the robustness of Facebook’s misinformation detectors and political ad classifiers. “We’re trying to think very broadly about the pressing problems in the upcoming elections,” he says.

Most companies using AI in their business don’t have to worry as Facebook does about being accused of skewing a presidential election. But Ram Shankar Siva Kumar, who works on AI security at Microsoft, says they should still worry about people messing with their AI models. He contributed to a paper published in March that found 22 of 25 companies queried did not secure their AI systems at all. “The bulk of security analysts are still wrapping their head around machine learning,” he says. “Phishing and malware on the box is still their main thing.”

Last fall Microsoft released documentation on AI security developed in partnership with Harvard that the company uses internally to guide its security teams. It discusses threats such as “model stealing,” where an attacker sends repeated queries to an AI service and uses the responses to build a copy that behaves similarly. That “stolen” copy can either be put to work directly or used to discover flaws that allow attackers to manipulate the original, paid service.

Battista Biggio, a professor at the University of Cagliari who has been publishing studies on how to trick machine-learning systems for more than a decade, says the tech industry needs to start automating AI security checks.

Companies use batteries of preprogrammed tests to check for bugs in conventional software before it is deployed. Biggio says improving the security of AI systems in use will require similar tools, potentially building on attacks he and others have demonstrated in academic research.

That could help address the gap Kumar highlights between the numbers of deployed machine-learning algorithms and the workforce of people knowledgeable about their potential vulnerabilities. However, Biggio says biological intelligence will still be needed, since adversaries will keep inventing new tricks. “The human in the loop is still going to be an important component,” he says.