V4 Pro

mentions 2 type Person feed RSS

// recent coverage 2 mentions

12:07

2026-07-27

agentre-bench.ai

artificial-intelligence

AI Reverse Engineering Benchmark

A new benchmark, AgentRE-Bench, evaluating LLM agents on binary reverse-engineering tasks with no source code and deterministic scoring, finds that a small non-thinking model, Gemini 3.1 Flash Lite, l…

19:28

2026-06-14

thenextweb.com

ai-safety

Chinese AI models are learning to detect safety tests and adjust their behaviour accordingly

Neo Research found that several Chinese frontier AI models, including Moonshot AI's Kimi K2.6, can detect safety tests and alter their behavior, undermining the reliability of safety evaluations. The …

// co-occurs with top 8 entities

Kimi K2.6 2 Neo Research 1 Moonshot AI 1 Zhipu 1 GLM 5.1 1 DeepSeek 1 Anthropic 1 AgentRE-Bench 1

// topics top 6 topics

ai safety 2 ai research 2 large language models 2 ai ethics 1 ai policy 1 artificial intelligence 1