Implementation of AI in mobile applications: Comparative analysis of On-Device and On-Server approaches on Native Android and Flutter This article presents a comparative analysis of implementing AI in mobile applications, examining both On-Device (using Google ML Kit) and On-Server (using Hugging Face Inference API) approaches on Native Android (Kotlin) and Flutter (Dart) platforms. The author shares practical experience from developing two MVP applications, including code examples for handling network requests, image compression, and local model configuration, while also discussing the trade-offs between device resources, computational accuracy, and security requirements. The source code for both projects is publicly available under the MIT license. Hi everyone Today I want to share practical experience in integrating machine learning models into mobile ecosystems. I recently completed the research and development of two MVP applications on Native Android and Flutter , defended this project at an international conference and now I want to share my integration experience with you. In this article, we will analyze in detail the difference between local and server AI computing, compare the implementation features of the native layer on Kotlin and the cross-platform layer on Dart, and also analyze non-obvious bugs that you may encounter when working with the camera and file system. If you are interested in a specific platform, you can skip directly to the relevant section in the navigation below. The source code of both projects is open under the MIT license, links to repositories GitHub/GitLab can be found below. Navigation: Analysis of approaches On-Device / On-Server on-device-vs-on-server Implementation on Native Android Kotlin android-implementation Implementation in Flutter Dart flutter-implementation Conclusions and links to projects conclusions-links Part 1. On-Device vs On-Server: Architectural Choice Выбор места деплоя мобильной модели — это всегда компромисс между ресурсами устройства, точностью вычислений и требованиями к безопасности. Local approach On-Device Local approach On-Device The model is deployed directly in the application sandbox and carries out inference inference locally, using the computing power of the CPU, GPU or specialized neural processors NPU/TPU of the smartphone. As an On-Device solution, I used Google ML Kit Image Labeling SDK . Pros: - Minimum ping Latency : There are no network delays for the transmission of heavy media data. - Autonomy: Complete independence from the availability and quality of the Internet connection. - Confidentiality Data Privacy : User data does not leave the device and is not transferred to third parties. Cons: - Resource limitation: To prevent the application from burning out the battery and taking up gigabytes of memory, models are heavily quantized and cut down. - Reduced accuracy: Due to compression, the accuracy of lightweight models decreases on average, the upper threshold of accuracy in basic classification tasks is about 61-65% Server approach On-Server Server approach On-Server The model lives on a remote server and is accessible via the API. In my research, I used the Hugging Face Inference API model google/vit-base-patch16-224 . Pros: - High accuracy: You can deploy heavy State-of-the-Art SOTA models, LLMs, or huge ensembles of neural networks with a colossal class base on the server. - Client unloading: The smartphone only fulfills the network request, does not heat up and does not waste power on complex mathematical calculations. Cons: - Network dependence: No Internet - no AI. - Infrastructure and security costs: It is necessary to provide encryption of communication channels TLS/SSL , protect API keys from reverse engineering and pay for server capacity Part 2. Native implementation: Android In a native application, the key task is to isolate heavy operations from the main interface thread Main Thread to avoid UI friezes and Application Not Responding ANR errors. Working with API On-Server Working with API On-Server To send POST requests transmitting an image byte array , a combination of OkHTTP and Retrofit was used. Conversion of the server's JSON response into strictly typed Kotlin data classes occurs automatically thanks to converters. The network call is encapsulated in a suspend function, which allows declarative control of asynchrony. js private const val TOKEN = "..." data class PredictionResponse val label: String, val score: Float interface HuggingFaceApi { @POST suspend fun postImage @Url url: String, @Header "Authorization" token: String, @Body body: RequestBody : List