{"slug": "can-post-training-turn-llms-into-good-medical-coders-an-empirical-study-of-icd", "title": "Can Post-Training Turn LLMs into Good Medical Coders? An Empirical Study of Generative ICD Coding", "summary": "A new study finds that post-training techniques, including supervised fine-tuning and reinforcement learning, can transform large language models into effective medical coders for ICD coding, challenging prior assumptions that LLMs are weak at this task. The researchers introduce PHI, a diagnostic curriculum that improves recall of missed codes, and release their code and data.", "body_md": "arXiv:2606.13940v1 Announce Type: new\nAbstract: Automated International Classification of Diseases (ICD) coding is a core medical-coding task for billing, epidemiology, and clinical decision support. Generative large language models (LLMs) are often reported as weak medical coders, but this finding mainly comes from inference-time settings such as prompting, retrieval, reranking, or tool use, leaving the role of task-specific post-training underexplored. We present a controlled empirical study of post-training for generative ICD coding, comparing discriminative baselines with LLM coders across prompting, supervised fine-tuning, and reinforcement learning under a common protocol and metric set. To our knowledge, this is the first study to evaluate RL-based post-training for generative LLM coders in ICD coding. We further introduce PHI, a diagnostic curriculum that extends GRPO to refine missed-code cases. Our results show that prompting-only evaluation substantially underestimates the potential of LLMs for ICD coding. SFT provides the main capability jump, GRPO further improves code-set prediction beyond SFT, and PHI provides targeted gains on macro-level performance. These findings suggest that the main bottleneck is not the generative formulation alone, but how the model is adapted and optimized for full-taxonomy recall. We release our code, data splits, and checkpoints at https://github.com/AlexandreWANG915/LLM4ICD.", "url": "https://wpnews.pro/news/can-post-training-turn-llms-into-good-medical-coders-an-empirical-study-of-icd", "canonical_source": "https://arxiv.org/abs/2606.13940", "published_at": "2026-06-15 04:00:00+00:00", "updated_at": "2026-06-15 04:16:55.468474+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-research", "ai-products"], "entities": ["LLM4ICD", "GRPO", "PHI", "International Classification of Diseases", "arXiv"], "alternates": {"html": "https://wpnews.pro/news/can-post-training-turn-llms-into-good-medical-coders-an-empirical-study-of-icd", "markdown": "https://wpnews.pro/news/can-post-training-turn-llms-into-good-medical-coders-an-empirical-study-of-icd.md", "text": "https://wpnews.pro/news/can-post-training-turn-llms-into-good-medical-coders-an-empirical-study-of-icd.txt", "jsonld": "https://wpnews.pro/news/can-post-training-turn-llms-into-good-medical-coders-an-empirical-study-of-icd.jsonld"}}