Revolutionizing Dermatology: Building an Offline Skin Lesion Segmenter with Med-SAM and CoreML

A developer built an offline skin lesion segmenter for mobile devices by optimizing the Med-SAM model with CoreML and ONNX Runtime. The approach enables real-time, privacy-preserving segmentation on iOS and cross-platform apps using React Native. The pipeline includes model distillation, quantization, and hardware acceleration to achieve medical-grade performance.

In the world of digital health, the gap between "research-grade AI" and "production-ready mobile apps" is often a chasm. When it comes to dermatology AI , precision is non-negotiable. Identifying a suspicious mole or a flare-up of dermatitis requires more than just a bounding box; it requires pixel-perfect edge detection. Today, we are diving deep into Skin Lesion Screening Engineering . We will explore how to bridge the gap by taking the powerful Med-SAM Medical Segment Anything Model and optimizing it for real-time, offline performance on mobile devices using CoreML and ONNX Runtime . By implementing mobile image segmentation and edge-side inference , we ensure user privacy while maintaining high-fidelity diagnostic assistance. Scaling a transformer-based model like Med-SAM for a mobile environment requires a strategic pipeline. We can't just throw a 1GB model at a smartphone and expect 60FPS. We need a hybrid approach: distillation, quantization, and efficient hardware acceleration. php graph TD A User Camera / Gallery -- |Raw Image| B Preprocessing & Resizing B -- C{Inference Engine} C -- |iOS/Neural Engine| D CoreML Model C -- |Cross-Platform| E ONNX Runtime Mobile D -- F Med-SAM Encoder/Decoder E -- F F -- |Logits| G Post-processing & Thresholding G -- |Segmentation Mask| H UI Overlay: React Native H -- I Final Assessment & Heatmap To follow this advanced guide, you should be comfortable with: Med-SAM is a specialized variant of Meta’s Segment Anything Model, fine-tuned on massive medical datasets. However, the standard ViT-H Vision Transformer backbone is too heavy for mobile. We use a Mobile-SAM architecture but initialize it with Med-SAM's medical weights through knowledge distillation. First, we need to convert the PyTorch weights. Using coremltools , we can target the Apple Neural Engine ANE . python import torch import coremltools as ct from med sam import medsam model registry 1. Load the fine-tuned Med-SAM model model = medsam model registry "vit b" checkpoint="medsam vit b.pth" model.eval 2. Trace the model with a dummy input example input = torch.rand 1, 3, 1024, 1024 traced model = torch.jit.trace model, example input 3. Convert to CoreML Optimize for ANE mlmodel = ct.convert traced model, inputs= ct.TensorType shape=example input.shape , compute units=ct.ComputeUnit.ALL, Utilize GPU and Neural Engine minimum deployment target=ct.target.iOS16 mlmodel.save "SkinMedSAM.mlpackage" While the "brain" of our app is the model, the "body" is React Native . For high-performance vision tasks, we use react-native-vision-camera combined with a custom frame processor or the ONNX Runtime React Native library. js import { OrtClient, Tensor } from 'onnxruntime-react-native'; const runSegmentation = async imagePath: string = { try { // Load the model const session = await OrtClient.create 'skin medsam quantized.onnx' ; // Convert image to Tensor Logic for resizing to 1024x1024 const inputTensor = await preprocessImageToTensor imagePath ; // Run Inference const outputs = await session.run { "input": inputTensor } ; const maskData = outputs "output" .data; // Process mask to display on UI return maskData; } catch e { console.error "Inference failed", e ; } }; Building a prototype is easy, but making it "medical-grade" requires rigorous attention to lighting conditions, skin tone diversity, and latency. For advanced architectural patterns, such as implementing streaming inference or federated learning to improve your skin lesion models without compromising user data, I highly recommend checking out the technical deep-dives at WellAlly Blog https://www.wellally.tech/blog . They provide extensive resources on productionizing vision models and navigating the complexities of healthcare AI engineering. Once we get the segmentation mask, we need to overlay it on the camera feed. This is where we calculate the area of the lesion and its regularity —key features for early screening. js import { Canvas, Path, Skia } from "@shopify/react-native-skia"; // Drawing the mask over the lesion const LesionMask = { points } = { const path = Skia.Path.Make ; // Simplified logic to convert model output to SVG path path.moveTo points 0 .x, points 0 .y ; points.forEach p = path.lineTo p.x, p.y ; path.close ; return <Canvas style={{ flex: 1 }} <Path path={path} color="rgba 255, 0, 0, 0.4 " style="fill" / </Canvas ; }; Engineering a skin lesion screening tool is a perfect example of "Learning in Public." By combining Med-SAM 's specialized medical knowledge with CoreML 's hardware acceleration, we can create life-saving tools that live right in our pockets. Key Takeaways: Have you tried deploying Vision Transformers to mobile? What was your biggest hurdle? Let's discuss in the comments 👇 If you're looking for more production-ready examples and deep architectural insights, don't forget to visit wellally.tech/blog for the latest in AI and Healthcare engineering. 💻✨