Open-source SDS tooling for Japanese MHLW compliance: the gap nobody filled In March 2025, Japan's Ministry of Health, Labour and Welfare (MHLW) published a structured JSON schema for Safety Data Sheet (SDS) data exchange, containing roughly 200 deeply nested fields to standardize machine-readable chemical information. The schema, driven by a 2022 amendment to the Industrial Safety and Health Act, includes fields unique to Japanese law with no equivalent in EU REACH or US OSHA formats, making it incompatible with most international SDS tooling. While major global SDS authoring platforms support Japanese language output, they do not provide export to the MHLW JSON schema, leaving a gap that only Japanese-market tools partially address. In March 2025, Japan's Ministry of Health, Labour and Welfare MHLW published a structured JSON schema for Safety Data Sheet data exchange. The schema covers roughly 200 deeply nested fields and is intended to standardize how SDS information moves between chemical management systems. Most SDS tooling was not built for this. Japan's SDS requirements come from two laws: the Industrial Safety and Health Act ISAH, 労働安全衛生法 and the Chemical Substances Control Law 化審法 . Both mandate SDS for regulated chemicals, with format requirements governed by JIS Z 7253 — Japan's implementation of the UN Globally Harmonized System GHS . JIS Z 7253 follows the standard 16-section GHS structure. In principle, any GHS-compliant SDS satisfies the content requirements. What makes Japanese compliance distinct is a digital layer: the MHLW schema specifies how SDS content should be structured as machine-readable data, with field-level granularity that PDF documents cannot capture. GHS uses a "building block" approach — each country adopts the elements it chooses. The result is that the same GHS-aligned document varies by jurisdiction: The MHLW schema includes fields with no equivalent in EU REACH or US OSHA HazCom formats. These are the main reason international SDS tooling does not cover the schema out of the box: Section 15 Regulatory Information is the most complex section in the schema — it contains separate subsections for each of these laws, each with its own field structure. The MHLW published the schema in 2025, but the driver was a 2022 amendment to the Industrial Safety and Health Act. The amendment shifted Japan's chemical substance regulation from a prescriptive model government designates specific hazardous substances to an autonomous management model companies assess and manage risk themselves . The practical impact: With risk assessment coverage expanding significantly, companies need to process SDS data faster and more accurately. Manual PDF entry does not scale. The JSON schema is the infrastructure layer for automating this. The major SDS authoring platforms — Sphera, EcoOnline, Chemwatch, Verisk 3E — have broad international coverage. Japanese is typically a supported output language. What they do not provide, as far as I have found, is export to the MHLW JSON schema. They produce Word or PDF output in the correct section structure, which satisfies the document requirement but not the structured data exchange requirement. Japanese-market products like SDS Meister and SmartSDS support MHLW JSON output, but their PDF-to-JSON conversion coverage is limited — they are primarily SDS authoring tools, not bulk conversion tools for incoming supplier documents. sds parser and tungsten solve a different problem: extracting SDS data in English, for specific known manufacturer formats. Neither targets the MHLW schema. Even within JIS Z 7253-compliant documents, format varies by manufacturer: A rule-based parser must enumerate every variant. In practice, manufacturer-specific headings add another layer of variation on top of the standard differences. Two properties of the MHLW schema are worth knowing before implementing against it. Section 3 stores component information as a repeating array. Each component object has nested fields for chemical identity, concentration range, and hazard classification. The same data appears differently depending on whether the source document covers a pure substance, a mixture, or a trade secret formulation. { "Composition": { "CompositionAndConcentration": { "ChemicalIdentity": { "CASNumber": "64-17-5", "ISHActNotificationNumber": "2-396" }, "ConcentrationRange": { "ConcentrationRangeFrom": 95.0, "ConcentrationRangeTo": 100.0, "ConcentrationRangeUnit": "%" }, "TradeSecretFlag": false } } } The schema contains field name errors that are now part of the specification: HumanExposureAndEmergencyMeasuress ← trailing double-s TestGuidline ← missing 'e' not Guideline Desclaimer ← transposed letters not Disclaimer gazetteNo ← lowercase first character Correcting these would break all existing implementations, so they cannot be fixed in v1.0. An implementation that normalizes these to standard English spellings will fail schema validation. I built sds-converter to address the MHLW schema gap. It handles both directions: PDF/DOCX/XLSX to MHLW JSON, and MHLW JSON to a JIS Z 7253-compliant Word document. The core approach: rather than enumerating format variants with rules, the tool passes raw section text and the corresponding MHLW schema fields to an LLM and asks it to map values. The LLM handles heading label variation naturally. The output is validated against the schema before writing. cargo install sds-converter PDF → MHLW JSON sds-converter to-json --input input.pdf --output output.json MHLW JSON → JIS Z 7253 Word document sds-converter to-docx --input output.json --output result.docx --lang ja The LLM backend is pluggable — Claude, GPT, Gemini, Mistral, Groq, or local models via Ollama. A --quality flag adjusts cost versus accuracy for batch workloads. Known limitations: These are open problems, not design decisions. The MHLW schema represents a real need for anyone handling chemical compliance in Japan at volume. Commercial tools cover the authoring side; the bulk conversion of incoming supplier PDFs to structured data has no open-source solution targeting this schema — other than sds-converter, which I developed and which is the only implementation I am aware of. The repository is open. Contributions on the extraction side — particularly Section 3 table handling — are welcome. If you work in cheminformatics or chemical compliance and have approached the MHLW compliance problem differently, I would be interested to hear it.