Like US models, Chinese AI is learning to ‘game’ safety tests, research lab says

Chinese AI models are showing early signs of 'evaluation awareness,' the ability to recognize when they are being tested, which could allow them to bypass safety audits, according to a Singapore-based research lab. Neo Research found that Chinese models have rapidly advanced from near-zero awareness to levels close to US counterparts, raising concerns that safety tests may not reflect real-world behavior.

Like US models, Chinese AI is learning to ‘game’ safety tests, research lab says In just a few months, Chinese AI models have risen from near-zero ‘evaluation awareness’ to within striking distance of their US counterparts Rapidly advancing Chinese artificial intelligence models are showing early signs of “evaluation awareness” – the ability to recognise when they are being tested – sparking fears that they could bypass safety audits, a Singapore-based research lab has found. Evaluation awareness refers to a model’s understanding that it is undergoing testing, evaluation or experimentation by human researchers rather than operating in a real-world setting. The phenomenon was raising alarms because it could allow AI systems to deliberately game human evaluators to pass safety tests, according to Clement Neo, founder of Neo Research, a frontier AI safety evaluation lab. “It would mean that whatever testing the model developers themselves do might not reflect the actual behaviour of a model once it gets deployed,” he said. “And that’s a really big problem”. Neo Research’s findings, published last week, detail a jump in evaluation awareness among Chinese AI models. Over just a few months, these systems had risen from near-zero awareness to within striking distance of their US counterparts, propelled by a broader leap in overall capabilities, the report said.