Building a Real-Time Camera Classifier

Architecture and implementation of a real-time camera-based object classifier, detailing its use in applications like retail analytics, security, and self-driving cars. It provides a technical guide for building the system, including hardware requirements, software dependencies, and code structure for camera input and model management. The project uses Python libraries such as OpenCV and TensorFlow to capture video frames and classify objects from a structured image dataset.

Building a Real-Time Camera Classifier Ever wonder how modern interactive displays in malls identify objects, like glasses or accessories, in real-time? These systems rely on computer vision models to classify live video input into predefined categories. This paper outlines the architecture and implementation of a custom camera-based object classifier. Usage of Camera Classifier Camera classifiers are instrumental in scenarios where automated visual identification is required without human intervention. Common use cases include: Retail Analytics : Identifying products or accessories a customer is trying on. Security & Surveillance : Detecting specific items or prohibited objects. Human-Computer Interaction : Enabling gesture or item-based control interfaces. Quality Control : Automatically sorting objects on an assembly line based on visual appearance. Famous Examples of Camera Classifiers Google Lens : A sophisticated classifier that identifies objects, plants, and text in real-time. Self-Driving Car Vision Systems : Used to classify road signs, pedestrians, and other vehicles to ensure safe navigation. Smart Home Appliances : Cameras on refrigerators or ovens that identify food items to suggest recipes. Prerequisites To ensure this code executes correctly and avoids common runtime exceptions, please verify the following requirements before running the script: Hardware : A functional webcam must be physically connected to your system and recognized by your operating system.- System Permissions :- macOS/Linux: If you are running this code via a terminal or an IDE such as VS Code or PyCharm , ensure that the application has been granted explicit Camera Access in your system settings. - Common Troubleshooting: If you encounter a PermissionError or an OSError: Errno 16 Device or resource busy, it is typically because the webcam is already being utilized by another application e.g., Zoom, Microsoft Teams, or a browser tab . Please close all other applications that may be accessing the camera and try again. Note : If you are working within a virtual environment or a containerized system like Docker , ensure that the device path e.g., /dev/video0 is correctly mapped and accessible to the environment. Implementation Step 1: Environment Setup To build this project, you need the necessary libraries for image processing, GUI creation, and deep learning. Install them using the following command in terminal: pip install opencv-python tensorflow pillow numpy Project Directory Structure Use the following structure for your dataset so that tf.keras.utils.image dataset from directory can automatically infer the labels from the folder names: /your project folder ├── 1/ │ ├── frame1.jpg │ └── frame2.jpg ├── 2/ │ ├── frame1.jpg │ └── frame2.jpg ├── camera.py ├── model.py └── app.py Step 2: Creating the Camera Module camera.py The camera.py file serves as the interface between your physical hardware and the software. Below is the implementation broken down by function to ensure you understand how video data is handled. Sub-Step 2.1: Initialization init This function initializes the connection to your webcam. It attempts to open the default camera index 0 and captures the video feed dimensions, which are necessary for setting the GUI canvas size later. Sub-Step 2.2: Clean Shutdown del This is a destructor method. It ensures that the camera hardware is properly released when the Camera object is destroyed or the application is closed, preventing the camera from remaining "busy" or locked. Sub-Step 2.3: Frame Acquisition get frame This is the core functional unit. It captures an individual image frame from the video stream and converts the color space from BGR OpenCV default to RGB required for display and processing . Implementation Code python import cv2 as cv class Camera: Sub-Step 2.1: Initialize the hardware connection def init self : self.camera = cv.VideoCapture 0 if not self.camera.isOpened : raise ValueError 'Unable to open camera.' Fetching properties for GUI scaling self.width = self.camera.get cv.CAP PROP FRAME WIDTH self.height = self.camera.get cv.CAP PROP FRAME HEIGHT Sub-Step 2.2: Ensure proper resource release def del self : if self.camera.isOpened : self.camera.release Sub-Step 2.3: Process and return the current frame def get frame self : if self.camera.isOpened : ret, frame = self.camera.read if ret: Convert BGR to RGB for standard image processing return ret, cv.cvtColor frame, cv.COLOR BGR2RGB else: return ret, None else: return None Step 3: Creating the Model Module model.py The model.py file acts as the intelligence core of your application. It manages data ingestion, neural network architecture, and the lifecycle of your classifier training, saving, and inference . Sub-Step 3.1: Data Preparation load data This function reads your images from disk. It creates a tf.data.Dataset, applies a normalization layer scaling pixel values to a $ 0, 1 $ range , and splits the data into training and validation sets. Sub-Step 3.2: Architecture Design create model Here, we define a Convolutional Neural Network CNN . We use Conv2D layers to extract visual features and MaxPooling2D to reduce dimensionality, ending with a Dense layer to output the final classification probability. Sub-Step 3.3: Training Procedure train This function invokes the data loader and model creator. It executes the training process over multiple epochs, saving the final trained weights to a file so you don't have to retrain every time you open the app. Sub-Step 3.4: Loading and Inference load trained model & predict load trained model checks for existing files to resume work. predict processes a raw frame by resizing and reshaping it to match the neural network's expected input format, then returns the class index. Implementation Code python import tensorflow as tf from tensorflow.keras import layers, models import os import numpy as np Global configurations Image size = 64, 64 Batch size = 16 MODEL PATH = 'Camera classifier.keras' DATA DIR = r"YOUR PATH HERE" Update this to your local directory Sub-Step 3.1: Load and normalize images def load data : train ds = tf.keras.utils.image dataset from directory DATA DIR, image size=Image size, batch size=Batch size, color mode="grayscale" Scale pixel values normalization layer = layers.Rescaling 1./255 train ds = train ds.map lambda x, y: normalization layer x , y val size = int len train ds 0.2 val ds = train ds.take val size train ds = train ds.skip val size return train ds.prefetch tf.data.AUTOTUNE , val ds.prefetch tf.data.AUTOTUNE Sub-Step 3.2: Define CNN structure def create model : model = models.Sequential layers.Conv2D 32, 3, 3 , activation='relu', input shape= 64, 64, 1 , layers.MaxPooling2D 2, 2 , layers.Conv2D 64, 3, 3 , activation='relu' , layers.MaxPooling2D 2, 2 , layers.Flatten , layers.Dense 64, activation='relu' , layers.Dense 2, activation='softmax' model.compile optimizer='adam', loss='sparse categorical crossentropy', metrics= 'accuracy' return model Sub-Step 3.3: Train and Save def train : train ds, val ds = load data model = create model model.fit train ds, epochs=10, validation data=val ds model.save MODEL PATH return model Sub-Step 3.4: Helper functions for loading and prediction def load trained model : return tf.keras.models.load model MODEL PATH if os.path.exists MODEL PATH else None def predict frame, model : img = tf.image.resize frame, Image size img = np.expand dims img, axis= 0, -1 / 255.0 Reshape and normalize return np.argmax model.predict img , axis=1 0 Step 4: Creating the Application Interface app.py The app.py file serves as the command center. It integrates the Camera module for data acquisition and the model module for intelligence, presenting them through a Graphical User Interface GUI built with tkinter. Sub-Step 4.1: Setup and Initialization init This function initializes the window, sets up the camera and model instances, and prompts the user for class names. It also kicks off the update loop to keep the UI responsive. Sub-Step 4.2: Building the GUI init gui This defines the layout. It creates the canvas for video display and populates the window with buttons to capture training data, train the model, trigger predictions, and reset the environment. Sub-Step 4.3: Data Collection save for class When a button is clicked, this function pulls a frame from the camera and saves it into the corresponding folder /1 or /2 . This is how you generate your training dataset. Sub-Step 4.4: Model Management & Reset train model & reset train model calls the training routine from model.py. The reset function purges existing image files and resets counters, allowing you to start a new classification task from scratch. Sub-Step 4.5: The Runtime Loop update This is the heartbeat of the app. It runs every 15ms, refreshing the canvas with the latest camera frame and, if enabled, automatically running the prediction model to display the current class. Implementation Code python import tkinter as tk from tkinter import simpledialog import cv2 as cv import os import PIL.Image, PIL.ImageTk import Camera, model class App: Sub-Step 4.1: Initialize App state def init self, window=tk.Tk , window title="Camera Classifier" : self.window = window self.window.title window title self.counters = 1, 1 self.auto predict = False self.camera = Camera.Camera self.model = model.load trained model self.classname one = simpledialog.askstring "Class 1", "Enter name:" self.classname two = simpledialog.askstring "Class 2", "Enter name:" self.init gui self.update self.window.mainloop Sub-Step 4.2: Construct the UI layout def init gui self : self.canvas = tk.Canvas self.window, width=self.camera.width, height=self.camera.height self.canvas.pack tk.Button self.window, text="Toggle Auto", command=self.auto predict toggle .pack tk.Button self.window, text=self.classname one, command=lambda: self.save for class 1 .pack tk.Button self.window, text=self.classname two, command=lambda: self.save for class 2 .pack tk.Button self.window, text="Train Model", command=self.train model .pack self.class label = tk.Label self.window, text="CLASS", font= "Arial", 20 self.class label.pack Sub-Step 4.3: Save frames for training def save for class self, class num : ret, frame = self.camera.get frame if not os.path.exists str class num : os.mkdir str class num cv.imwrite f'{class num}/frame{self.counters class num-1 }.jpg', cv.cvtColor frame, cv.COLOR RGB2BGR self.counters class num-1 += 1 Sub-Step 4.4: Train and Reset functionality def train model self : self.model = model.train def reset self : for d in '1', '2' : for f in os.listdir d : os.unlink os.path.join d, f self.counters = 1, 1 Sub-Step 4.5: Main UI refresh loop def update self : ret, frame = self.camera.get frame if ret: self.photo = PIL.ImageTk.PhotoImage image=PIL.Image.fromarray frame self.canvas.create imag