Implementation of AI in mobile applications: Comparative analysis of On-Device and On-Server approaches on Native Android and Flutter

This article presents a comparative analysis of implementing AI in mobile applications, examining both On-Device (using Google ML Kit) and On-Server (using Hugging Face Inference API) approaches on Native Android (Kotlin) and Flutter (Dart) platforms. The author shares practical experience from developing two MVP applications, including code examples for handling network requests, image compression, and local model configuration, while also discussing the trade-offs between device resources, computational accuracy, and security requirements. The source code for both projects is publicly available under the MIT license.

Hi everyone Today I want to share practical experience in integrating machine learning models into mobile ecosystems. I recently completed the research and development of two MVP applications on Native Android and Flutter , defended this project at an international conference and now I want to share my integration experience with you. In this article, we will analyze in detail the difference between local and server AI computing, compare the implementation features of the native layer on Kotlin and the cross-platform layer on Dart, and also analyze non-obvious bugs that you may encounter when working with the camera and file system. If you are interested in a specific platform, you can skip directly to the relevant section in the navigation below. The source code of both projects is open under the MIT license, links to repositories GitHub/GitLab can be found below. Navigation: Analysis of approaches On-Device / On-Server on-device-vs-on-server Implementation on Native Android Kotlin android-implementation Implementation in Flutter Dart flutter-implementation Conclusions and links to projects conclusions-links Part 1. On-Device vs On-Server: Architectural Choice Выбор места деплоя мобильной модели — это всегда компромисс между ресурсами устройства, точностью вычислений и требованиями к безопасности. Local approach On-Device Local approach On-Device The model is deployed directly in the application sandbox and carries out inference inference locally, using the computing power of the CPU, GPU or specialized neural processors NPU/TPU of the smartphone. As an On-Device solution, I used Google ML Kit Image Labeling SDK . Pros: - Minimum ping Latency : There are no network delays for the transmission of heavy media data. - Autonomy: Complete independence from the availability and quality of the Internet connection. - Confidentiality Data Privacy : User data does not leave the device and is not transferred to third parties. Cons: - Resource limitation: To prevent the application from burning out the battery and taking up gigabytes of memory, models are heavily quantized and cut down. - Reduced accuracy: Due to compression, the accuracy of lightweight models decreases on average, the upper threshold of accuracy in basic classification tasks is about 61-65% Server approach On-Server Server approach On-Server The model lives on a remote server and is accessible via the API. In my research, I used the Hugging Face Inference API model google/vit-base-patch16-224 . Pros: - High accuracy: You can deploy heavy State-of-the-Art SOTA models, LLMs, or huge ensembles of neural networks with a colossal class base on the server. - Client unloading: The smartphone only fulfills the network request, does not heat up and does not waste power on complex mathematical calculations. Cons: - Network dependence: No Internet - no AI. - Infrastructure and security costs: It is necessary to provide encryption of communication channels TLS/SSL , protect API keys from reverse engineering and pay for server capacity Part 2. Native implementation: Android In a native application, the key task is to isolate heavy operations from the main interface thread Main Thread to avoid UI friezes and Application Not Responding ANR errors. Working with API On-Server Working with API On-Server To send POST requests transmitting an image byte array , a combination of OkHTTP and Retrofit was used. Conversion of the server's JSON response into strictly typed Kotlin data classes occurs automatically thanks to converters. The network call is encapsulated in a suspend function, which allows declarative control of asynchrony. js private const val TOKEN = "..." data class PredictionResponse val label: String, val score: Float interface HuggingFaceApi { @POST suspend fun postImage @Url url: String, @Header "Authorization" token: String, @Body body: RequestBody : List<PredictionResponse } class ApiModel { private val retrofit = Retrofit.Builder .baseUrl "https://router.huggingface.co/hf-inference/" .addConverterFactory GsonConverterFactory.create .build val service: HuggingFaceApi = retrofit.create HuggingFaceApi::class.java suspend fun classifyImage imageBitmap: Bitmap : Pair<String, Float { return try { val imageToButeArray = compressBitmap imageBitmap val requestBody = imageToButeArray.toRequestBody "image/jpeg".toMediaTypeOrNull val result = service.postImage "models/google/vit-base-patch16-224", "Bearer $token", requestBody val firstLabel = result.firstOrNull ?: return "Не распознано" to 0f firstLabel.label to firstLabel.score }catch e: Exception { Log.e "network", "Request failed", e "Oшибка сервера" to 0f } } } Please note that before sending the image to the server we compress it and that we do not send too many bytes over the network suspend fun compressBitmap bitmap: Bitmap : ByteArray = withContext Dispatchers.IO { val stream = ByteArrayOutputStream bitmap.compress Bitmap.CompressFormat.JPEG, 80, stream stream.toByteArray } Local inference On-Device via ML Kit Local inference On-Device via ML Kit To work with the local ML Kit model, the library is configured via ImageLabelerOptions. We explicitly set setConfidenceThreshold 0.4f - the model’s confidence threshold. By increasing this threshold, we cut off false positives, but force the algorithm to work more intensively. To ensure stability and save RAM, the labeler object is initialized through the Kotlin delegate mechanism by lazy : private val labeler by lazy { val options = ImageLabelerOptions.Builder .setConfidenceThreshold 0.4f .build ImageLabeling.getClient options } Why is by lazy here? - Saving resources: An instance of the heavy ML Kit client is created not when the Activity is launched, but strictly at the time of the first request when the user takes a photo . - Context safety: Initialization is guaranteed to occur when the applicationContext is already fully formed by the operating system, which prevents NullPointerException from occurring. The call to labeler.process image is asynchronous in nature runs on Google's Task API . To make it linear and MVVM-friendly, we wrap it with coroutines and wait for the execution result. Architectural layer and flow control Architectural layer and flow control In MainViewModel, all calls are wrapped in viewModelScope.launch. Depending on the position of the state switch On-Device / On-Server , the required method is launched: private fun classifyImage bitmap: Bitmap? { if bitmap == null return uiState.update { it.copy isLoading = true } viewModelScope.launch { val startTime = System.currentTimeMillis try { val label, confidence = if uiState.value.isOnDevice { mlKit.analyze bitmap } else { apiModel.classifyImage bitmap } val duration = System.currentTimeMillis - startTime uiState.update { it.copy classificationText = label, confidenceValue = confidence, timeTakenDuration = duration, isLoading = false } } catch e: Exception { uiState.update { it.copy classificationText = "Ошибка", confidenceValue = 0f, timeTakenDuration = 0L } } } Working with the camera on Native Android Working with the camera on Native Android On the Native Android side, working with the camera looks concise thanks to the modern SDK CameraX. This is a Lifecycle-aware library: it knows when an Activity is minimized onPause or destroyed onDestroy , and automatically releases camera resources and closes streams ImageAnalysis / ImageCapture . We do not need to manually write the onDispose logic, and the result of a successful snapshot in the code can be a ready-made Bitmap object held in RAM, which eliminates unnecessary disk read-write operations. Part 3. Cross-platform implementation: Flutter Dart, Dio, Method Channels The Flutter application conceptually solves the same problems, but faces the specifics of Dart’s single-threaded architecture Event Loop . Network Inference Dio + Futures Network Inference Dio + Futures To communicate with Hugging Face on Flutter, we used the Dio package. To prevent a heavy request and network packet processing from blocking the rendering of UI frames after all, Dart runs on a single thread , we package the call into an asynchronous Future/Await model. While the network is chasing bytes, Event Loop calmly continues to render the interface. final dio = Dio ; Future<List<dynamic ? apiModel String path async { final Uint8List? imageBytes = await compressImage path ; if imageBytes == null { return null; } try { final response = await dio.post "https://router.huggingface.co/hf-inference/models/google/vit-base-patch16-224", data: imageBytes, options: Options headers: { "Authorization": "Bearer 'your token'", // put your token from hugging face here "Content-Type": "image/jpeg", }, , ; return response.data; } on DioException catch e { debugPrint "Error: $e" ; } return null; } Please note that before sending the image to the server we compress it and that we do not send too many bytes over the network Future<Uint8List? compressImage String path async { final Uint8List? result = await FlutterImageCompress.compressWithFile path, quality: 80, format: CompressFormat.jpeg, ; return result; } Native bridge: MethodChannel for ML Kit Native bridge: MethodChannel for ML Kit Since there is no full-fledged direct SDK for ML Kit Image Labeling on Dart that provides the required level of customization, a Production approach is used: creating a MethodChannel native bridge . The Dart code acts as a client: it generates the predictOnDevice event and passes the path to the saved photo through the channel. js class NativeMlService { static const MethodChannel channel = MethodChannel "mlkit photo analyze" ; static Future<Map onDeviceMethod String imagePath async { final result = await channel.invokeMethod 'imageLabeling', {'imagePath': imagePath}, ; return Map.from result ; } } On the Android side MainActivity.kt we catch this call through setMethodCallHandler. The same rules apply here: we deploy the coroutine on a background thread, process the image via ML Kit, but we transmit the response to result.success strictly returning to the Main Thread, since the Flutter engine will not be able to accept data from the Android side thread. override fun configureFlutterEngine flutterEngine: FlutterEngine { super.configureFlutterEngine flutterEngine MethodChannel flutterEngine.dartExecutor.binaryMessenger, CHANNEL .setMethodCallHandler { call, result - if call.method == "imageLabeling" { val imagePath = call.argument<String "imagePath" if imagePath == null { result.error "ArgError", "Image path is null", null return@setMethodCallHandler } // Run ML inference on background thread to avoid blocking UI CoroutineScope Dispatchers.IO .launch { try { // image processing and model calling.... // Return result on main thread withContext Dispatchers.Main { result.success response } }.... //rest of the code on GitHub/ GitLab.... Camera in Flutter and Data Race Race Condition Camera in Flutter and Data Race Race Condition The most difficult and interesting stage of developing the Flutter version was the integration of the camera plugin and debugging the interaction of file systems. Here two important differences from the native were revealed: Manual Lifecycle Management: In Flutter, the developer must manually initialize the CameraController, catch available lenses by selecting CameraLensDirection.back and, most importantly, be sure to call controller?.dispose in the dispose method of the widget. If you forget, the camera will remain locked in the operating system, and other applications will not be able to open it . Ghost File Problem Race Condition : The controller?.takePicture method in Flutter returns an XFile object that physically stores the snapshot in the device cache directory image.path . This is where the classic data engineering race comes into play. When Flutter happily reports that the photo has been taken and passes the path to the native code via MethodChannel, the native part Kotlin instantly tries to execute BitmapFactory.decodeFile imagePath . But at the level of the Android operating system, the file in the cache may still be blocked - the stream of data writing from the camera buffer to the disk has not yet had time to physically close. This was reflected in the logs as a hard crash: E/ple.flutter mvp: FrameInsert open fail: No such file or directory The native code crashed, Bitmap returned null, and Flutter received an empty null reference instead of a data structure. We get a similar error when we send a picture to the server because we are practically sending an empty picture Solution to the problem: To eliminate this data race, two-way protection was applied: On the Dart Provider side: Before calling the native method/sending to the server, we artificially let the system “breathe out” by adding a micro-delay: await Future.delayed const Duration milliseconds: 200 ; This time is enough for the OS to complete disk operations. Conclusion and conclusions The conducted MVP study clearly proves: On-Device and On-Server approaches do not compete, but complement each other. - On-Server is indispensable for heavy computing LLM, GPT, high-definition video processing . - On-Device is ideal for utilitarian tasks scanning documents, recognizing simple objects, working in strict offline conditions . In modern Production applications, the best practice is a hybrid approach : fast primary output is done locally, and deep data validation is sent to the backend server. Regarding the choice of platform: Native Android gives absolute control over resources, hardware and threads out of the box. Flutter , despite the limitations of single-threading Dart, with proper use of MethodChannel, compliance with the rules for dispatching coroutines in the native layer and taking into account file system timings, allows you to create responsive and productive AI applications. GitHub RatRatatyu https://github.com/RatRatatyu / mobile-ai-mvp https://github.com/RatRatatyu/mobile-ai-mvp Two MVP applications demonstrating on-device and on-server AI model integration in Jetpack Compose Android and Flutter. Mobile AI Integration: On-Device vs On-Server MVP Comparison This repository contains two MVP applications developed for the International Scientific and Practical Conference "Student Research: Challenges and Development Trends" . 🏫 Conference Information - Event: International Scientific and Practical Conference "Student Research: Challenges and Development Trends" - Organizers: Ministry of Education of the Republic of Kazakhstan, Department of Education of Aktobe Region, Aktobe Higher Humanitarian College, National Centre for Professional Development "Orleu" - Section: Science, Technology, and Digital Innovations - Date: May 22, 2026 📱 Project Overview The project explores the architectural choice between running AI models directly on a smartphone On-Device versus processing them on a remote server On-Server For this research, image classification was chosen as the primary use case to demonstrate the differences in performance and accuracy 🤖 Applied Models On-Device: Powered by the ML Kit Image Labeling API from Google for local, real-time inference On-Server: Powered by the Hugging Face google/vit-base-patch16-224… GitLab I would like to note that I am just developing and learning in this direction, so perhaps my conclusions may be inaccurate, or the descriptions may not be entirely correct, so I will be grateful if you point out my mistakes in the comments, and I will also be glad if you put stars in GitHub and GitLab if the projects are useful to you