# Huawei-led team completes post-training of DeepSeek 1.6T model

> Source: <https://letsdatascience.com/news/huawei-led-team-completes-post-training-of-deepseek-16t-mode-26dee267>
> Published: 2026-06-06 13:21:53.159482+00:00

# Huawei-led team completes post-training of DeepSeek 1.6T model

According to the Shenzhen municipal government, reported by Tom's Hardware, a Huawei-led research group completed full-parameter post-training of DeepSeek's V4-Pro, a **1.6-trillion-parameter** model, on a cluster of at least **1,000** Huawei **Ascend 910C** chips. Tom's Hardware reports the effort involved Huawei, the Shenzhen Loop Area Institute, the Shenzhen campus of Harbin Institute of Technology, and the Shenzhen Research Institute of Big Data. The article cites DeepSeek documentation that places V4-Pro's pre-training corpus at more than **32 trillion** tokens and notes that the team updated every model weight rather than using adapter layers. Tom's Hardware also notes that completing post-training on Ascend silicon does not demonstrate the chips can pre-train a frontier model from scratch. Editorial analysis: Industry observers will view a full-parameter post-training run on domestic accelerators as a meaningful step toward reducing reliance on foreign GPUs for alignment and tuning workloads.

### What happened

According to the Shenzhen municipal government, reported by Tom's Hardware, a research group that includes **Huawei** completed full-parameter post-training of DeepSeek's V4-Pro, a **1.6-trillion-parameter** model, using a cluster of at least **1,000** **Ascend 910C** chips. Tom's Hardware reports the collaborators included the Shenzhen Loop Area Institute, the Shenzhen campus of Harbin Institute of Technology, and the Shenzhen Research Institute of Big Data. The article cites DeepSeek documentation that places V4-Pro's pre-training corpus above **32 trillion** tokens. Tom's Hardware reports the team performed full-parameter post-training, meaning every weight was updated rather than using a thin adapter layer, and explicitly notes this demonstration does not show Ascend chips can pre-train a frontier model from scratch.

### Technical details

Tom's Hardware describes the **Ascend 910C** as Huawei's current flagship AI accelerator, a dual-die part that earlier DeepSeek testing returned roughly **60%** of an Nvidia **H100**'s inference performance. The report frames post-training as the tuning stage after pre-training, used for instruction-following, safety alignment, and task-specific behaviour shaping. The Shenzhen municipal government is the primary on-record source for the cluster size claim in the coverage cited.

### Editorial analysis - technical context

Demonstrations of full-parameter post-training on non-Nvidia silicon address a narrower but important slice of the ML lifecycle: alignment and fine-tuning at scale. Industry-pattern observations: groups that move tuning workloads off incumbent GPUs often still rely on foreign accelerators for the heavier, costlier pre-training phase. Successful post-training runs reduce a practical barrier for domestic deployment of model-alignment workflows, but they do not, by themselves, prove parity for pre-training at frontier scale.

### Context and significance

Industry context: This report sits at the intersection of chip capability, national supply-chain strategy, and large-model engineering. Observers tracking compute sovereignty and export-control impacts will treat a verified full-parameter post-training run as a notable milestone for training-class workloads on domestic accelerators, even if it stops short of pre-training demonstrations claimed elsewhere.

### What to watch

Indicators to follow include independent performance and reproducibility benchmarks for V4-Pro post-training on Ascend hardware, detailed power and throughput metrics for the **Ascend 910C** cluster, and any lab or open documentation from DeepSeek or participating institutes about dataset splits, optimizer settings, and wall-clock training time. Reporting that clarifies whether multiple parallel cabinets or networking fabrics were involved will also matter for assessing generalisability.

## Scoring Rationale

A reported full-parameter post-training of a **1.6-trillion** model on a 1,000-chip Ascend cluster is a notable infrastructure milestone for training-class workloads on domestic accelerators. It matters to practitioners assessing compute options, but it stops short of demonstrating frontier-scale pre-training parity.

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

[Try 250 free problems](/problems)
