{"slug": "semantic-aware-generative-image-transmission-for-resource-constrained-visual-iot", "title": "Semantic-Aware Generative Image Transmission for Resource-Constrained Visual IoT Systems", "summary": "Researchers propose a semantic-aware generative image transmission framework for resource-constrained visual IoT systems. The method selects and transmits only task-relevant image tokens based on semantic importance, achieving 29.9 dB PSNR at 0.074 bpp while using 44.6% of the bits of a 0.167-bpp reference. Experiments show it preserves task-relevant objects better than random masking under narrowband wireless links.", "body_md": "arXiv:2606.28398v1 Announce Type: new\nAbstract: Resource-constrained visual Internet of Things (IoT) systems, such as edge cameras, unmanned sensing platforms, industrial inspection nodes, and remote monitoring sensors, often need to transmit task-relevant visual evidence over low-rate wireless links to an edge/cloud service. Existing image communication methods usually compress or transmit complete global representations, leaving limited room to exploit receiver-side generative restoration. This paper proposes a semantic-aware generative image transmission framework for edge-assisted visual IoT. The image captured by an IoT visual sensor is encoded into a discrete token grid by a VQ encoder. At the IoT transmitter or nearby gateway, token recoverability, estimated from prediction entropy and local structure complexity, is fused with semantic importance obtained from instance segmentation and category-aware scoring. A spatial dispersal sampler then selects the tokens to be transmitted under a bitrate budget. The transmitter sends only the quantization indices of kept tokens and a binary mask map, while the edge/cloud receiver recovers masked tokens through MaskGIT with Halton sequence scheduling. Experiments on Kodak and VisDrone scenes under AWGN and Rayleigh channels show that the proposed method provides a flexible bitrate-quality tradeoff for narrowband visual IoT links. At 0.074 bpp, it uses 44.6% of the transmitted bits of the 0.167-bpp DeepJSCC/WITT reference while achieving 29.9 dB PSNR. A pseudo-GT downstream detection study on Kodak further shows that semantic-aware masking preserves task-relevant objects better than random masking at both 30% and 50% mask ratios.", "url": "https://wpnews.pro/news/semantic-aware-generative-image-transmission-for-resource-constrained-visual-iot", "canonical_source": "https://arxiv.org/abs/2606.28398", "published_at": "2026-06-30 04:00:00+00:00", "updated_at": "2026-06-30 04:25:08.166881+00:00", "lang": "en", "topics": ["computer-vision", "machine-learning", "generative-ai", "ai-research"], "entities": ["MaskGIT", "DeepJSCC", "WITT", "Kodak", "VisDrone", "AWGN", "Rayleigh"], "alternates": {"html": "https://wpnews.pro/news/semantic-aware-generative-image-transmission-for-resource-constrained-visual-iot", "markdown": "https://wpnews.pro/news/semantic-aware-generative-image-transmission-for-resource-constrained-visual-iot.md", "text": "https://wpnews.pro/news/semantic-aware-generative-image-transmission-for-resource-constrained-visual-iot.txt", "jsonld": "https://wpnews.pro/news/semantic-aware-generative-image-transmission-for-resource-constrained-visual-iot.jsonld"}}