Can Go AIs be adversarially robust?

Researchers found that superhuman Go AIs remain vulnerable to adversarial attacks despite implementing multiple defensive countermeasures, including adversarial training and architectural changes. None of the tested defenses withstood freshly trained adversaries, with most successful attacks falling into the same class of "cyclic" strategies previously documented. The findings demonstrate that achieving robust AI systems remains challenging even in highly favorable, narrow domains with superhuman performance.

Computer Science Machine Learning Submitted on 18 Jun 2024 v1 https://arxiv.org/abs/2406.12843v1 , last revised 14 Jan 2025 this version, v3 Title:Can Go AIs be adversarially robust? View PDF /pdf/2406.12843 Abstract:Prior work found that superhuman Go AIs can be defeated by simple adversarial strategies, especially "cyclic" attacks. In this paper, we study whether adding natural countermeasures can achieve robustness in Go, a favorable domain for robustness since it benefits from incredible average-case capability and a narrow, innately adversarial setting. We test three defenses: adversarial training on hand-constructed positions, iterated adversarial training, and changing the network architecture. We find that though some of these defenses protect against previously discovered attacks, none withstand freshly trained adversaries. Furthermore, most of the reliably effective attacks these adversaries discover are different realizations of the same overall class of cyclic attacks. Our results suggest that building robust AI systems is challenging even with extremely superhuman systems in some of the most tractable settings, and highlight two key gaps: efficient generalization of defenses, and diversity in training. For interactive examples of attacks and a link to our codebase, see this https URL . Submission history From: Tom Tseng view email /show-email/9d747df0/2406.12843 Tue, 18 Jun 2024 17:57:49 UTC 3,528 KB v1 /abs/2406.12843v1 Tue, 24 Sep 2024 08:38:38 UTC 3,596 KB v2 /abs/2406.12843v2 v3 Tue, 14 Jan 2025 03:08:02 UTC 2,089 KB Current browse context: cs.LG References & Citations Loading... Bibliographic and Citation Tools Bibliographic Explorer What is the Explorer? https://info.arxiv.org/labs/showcase.html arxiv-bibliographic-explorer Connected Papers What is Connected Papers? https://www.connectedpapers.com/about Litmaps What is Litmaps? https://www.litmaps.co/ scite Smart Citations What are Smart Citations? https://www.scite.ai/ Code, Data and Media Associated with this Article alphaXiv What is alphaXiv? https://alphaxiv.org/ CatalyzeX Code Finder for Papers What is CatalyzeX? https://www.catalyzex.com DagsHub What is DagsHub? https://dagshub.com/ Gotit.pub What is GotitPub? http://gotit.pub/faq Hugging Face What is Huggingface? https://huggingface.co/huggingface ScienceCast What is ScienceCast? https://sciencecast.org/welcome Demos Recommenders and Search Tools Influence Flower What are Influence Flowers? https://influencemap.cmlab.dev/ CORE Recommender What is CORE? https://core.ac.uk/services/recommender IArxiv Recommender What is IArxiv? https://iarxiv.org/about arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs https://info.arxiv.org/labs/index.html .