arXiv:2605.22971v1 Announce Type: new Abstract: Employees often struggle to identify ``who knows what,'' leading to organizational productivity losses. We investigate whether Large Language Models (LLMs) can infer individual domain knowledge directly from long-term Slack logs. Analyzing 27,188 messages from 43 users, we evaluated seven models (including Gemini, Claude, and GPT families) by comparing their zero-shot estimates against self-reported skill ratings from 27 participants. Gemini 2.5 Flash achieved the lowest error (MAE 21.13%), while GPT models showed significantly larger discrepancies. Notably, estimation accuracy depended only weakly on message volume, indicating that more text alone does not guarantee better inference. These findings demonstrate the feasibility and current limits of automated expertise mapping, highlighting the need for privacy-preserving deployments and richer, structure-aware representations of human knowledge.
Can AI Guess What You Know? Performance Comparison of Large Language Models for Human Domain Knowledge Estimation From Communication Logs
Researchers tested seven large language models on their ability to infer employee expertise from 27,188 Slack messages, finding that Gemini 2.5 Flash achieved the lowest estimation error at 21.13% mean absolute error. The study, which compared model estimates against self-reported skill ratings from 27 participants, revealed that GPT models produced significantly larger discrepancies and that accuracy depended only weakly on message volume. The findings demonstrate the feasibility of automated expertise mapping from communication logs while highlighting current limitations and the need for privacy-preserving implementations.
Run your AI side-project on zahid.host
EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.