{"slug": "integrating-llm-with-other-machine-learning-models", "title": "Integrating LLM with Other Machine Learning Models", "summary": "A developer built a support ticket intelligence pipeline that combines Oxlo.ai embeddings, a local random forest classifier, and an LLM to automate triage and draft contextual replies. The system uses Oxlo.ai's OpenAI-compatible API for both vector and text generation, enabling standardized workflows. By embedding historical tickets and training a classifier on those vectors, the pipeline predicts priority and retrieves similar cases to inform LLM-generated responses.", "body_md": "We are building a support ticket intelligence pipeline that classifies incoming requests and drafts contextual replies by combining Oxlo.ai embeddings, a local random forest classifier, and an LLM. It helps support teams cut response time by automating triage and surfacing relevant historical resolutions. The entire stack runs through Oxlo.ai's OpenAI-compatible API, so we can standardize on one provider for both vector and text generation workloads.\n\n`pip install openai scikit-learn numpy`\n\nI start by instantiating the client. Oxlo.ai exposes a fully OpenAI-compatible endpoint, so the standard SDK works without wrappers.\n\n``` python\nfrom openai import OpenAI\nimport os\n\nclient = OpenAI(\n    base_url=\"https://api.oxlo.ai/v1\",\n    api_key=os.environ[\"OXLO_API_KEY\"]\n)\n```\n\nWe need a knowledge base of resolved tickets. I embed them using Oxlo.ai's BGE-Large model so we can retrieve later and train a classifier on the same vectors.\n\n``` python\nimport numpy as np\n\nhistorical_tickets = [\n    {\"id\": 1, \"text\": \"Cannot connect to database after latest deploy.\", \"priority\": \"high\", \"resolution\": \"Rollback the migration and re-run with SSL enabled.\"},\n    {\"id\": 2, \"text\": \"Logo is misaligned on the billing page in Safari.\", \"priority\": \"low\", \"resolution\": \"Add flex-center to the container CSS.\"},\n    {\"id\": 3, \"text\": \"Intermittent 500 errors on checkout API.\", \"priority\": \"high\", \"resolution\": \"Increase the connection pool size in Redis.\"},\n    {\"id\": 4, \"text\": \"Dark mode toggle missing in settings.\", \"priority\": \"low\", \"resolution\": \"Shipped in v2.4.1 under Appearance.\"},\n    {\"id\": 5, \"text\": \"Webhook signatures fail verification.\", \"priority\": \"high\", \"resolution\": \"Document the HMAC-SHA256 format and provide a code sample.\"},\n]\n\ndef embed_text(text):\n    response = client.embeddings.create(\n        input=text,\n        model=\"bge-large\"\n    )\n    return response.data[0].embedding\n\nfor t in historical_tickets:\n    t[\"embedding\"] = embed_text(t[\"text\"])\n\nprint(f\"Embedded {len(historical_tickets)} tickets.\")\n```\n\nNow we treat the embeddings as feature vectors and train a lightweight random forest to predict priority. This is where traditional ML meets the LLM stack.\n\n``` python\nfrom sklearn.ensemble import RandomForestClassifier\n\nX = np.array([t[\"embedding\"] for t in historical_tickets])\ny = [t[\"priority\"] for t in historical_tickets]\n\nclf = RandomForestClassifier(n_estimators=100, random_state=42)\nclf.fit(X, y)\n\nprint(\"Classifier trained on embedding features.\")\n```\n\nWhen a new ticket arrives, we find the closest historical matches by cosine similarity over the same embedding space.\n\n``` python\nfrom sklearn.metrics.pairwise import cosine_similarity\n\ndef retrieve_similar(embedding, tickets, top_k=2):\n    embeddings = np.array([t[\"embedding\"] for t in tickets])\n    sims = cosine_similarity([embedding], embeddings)[0]\n    top_indices = np.argsort(sims)[-top_k:][::-1]\n    return [tickets[i] for i in top_indices]\n```\n\nThe system prompt tells the LLM how to behave, referencing the classification and retrieved context we will inject.\n\n```\nSYSTEM_PROMPT = \"\"\"You are a senior support engineer assistant.\n\nYour job is to:\n1. Acknowledge the user's issue in one sentence.\n2. State the predicted priority and explain why based on similar tickets.\n3. Draft a concise, actionable reply that references the resolution of the most relevant historical ticket.\n\nKeep the tone technical and direct. Do not ask the user to verify information already provided.\"\"\"\n```\n\nThis function ties the pieces together: embed the new ticket, predict priority, retrieve neighbors, and call Oxlo.ai's Llama 3.3 70B to generate the response.\n\n``` python\ndef process_ticket(ticket_text):\n    # Embed the incoming ticket\n    emb = embed_text(ticket_text)\n    \n    # Predict priority using the local classifier\n    priority = clf.predict([emb])[0]\n    \n    # Retrieve similar historical tickets\n    neighbors = retrieve_similar(emb, historical_tickets, top_k=2)\n    \n    # Build the context block\n    context = \"\\n\\n\".join(\n        f\"Ticket: {n['text']}\\nPriority: {n['priority']}\\nResolution: {n['resolution']}\"\n        for n in neighbors\n    )\n    \n    user_message = (\n        f\"New ticket: {ticket_text}\\n\"\n        f\"Predicted priority: {priority}\\n\\n\"\n        f\"Historical context:\\n{context}\"\n    )\n    \n    response = client.chat.completions.create(\n        model=\"llama-3.3-70b\",\n        messages=[\n            {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n            {\"role\": \"user\", \"content\": user_message},\n        ],\n    )\n    \n    return priority, response.choices[0].message.content\n```\n\nHere is a realistic incoming ticket and the output from the pipeline.\n\n```\nticket = \"API keys rotated this morning and now all requests return 401 Unauthorized.\"\n\npriority, reply = process_ticket(ticket)\n\nprint(f\"Predicted priority: {priority}\")\nprint(\"---\")\nprint(reply)\n```\n\nExample output:\n\n```\nPredicted priority: high\n---\n\nWe see you are hitting authentication failures after a key rotation.\n\nPriority: high. This matches previous high-priority incidents involving verification and access issues.\n\nAction: Please verify that your integration is using the newly issued key format. Refer to the resolution for webhook signatures fail verification to adjust your header calculation and ensure the HMAC-SHA256 code sample is applied correctly.\n```\n\nThis pipeline shows how to treat an LLM as one component in a broader ML system rather than a monolithic black box. Oxlo.ai simplifies the architecture because embeddings and chat completions share one endpoint and one request-based pricing model, which keeps costs predictable when you scale to thousands of tickets. A concrete next step is to replace the random forest with a fine-tuned classifier hosted on Oxlo.ai, or stream the generated replies into a Slack webhook for real-time triage.", "url": "https://wpnews.pro/news/integrating-llm-with-other-machine-learning-models", "canonical_source": "https://dev.to/shashank_ms_6a35baa4be138/integrating-llm-with-other-machine-learning-models-c0o", "published_at": "2026-06-18 05:36:25+00:00", "updated_at": "2026-06-18 05:51:25.200094+00:00", "lang": "en", "topics": ["machine-learning", "large-language-models", "natural-language-processing", "ai-tools", "developer-tools"], "entities": ["Oxlo.ai", "OpenAI", "scikit-learn", "Llama 3.3 70B", "BGE-Large", "RandomForestClassifier", "Redis", "HMAC-SHA256"], "alternates": {"html": "https://wpnews.pro/news/integrating-llm-with-other-machine-learning-models", "markdown": "https://wpnews.pro/news/integrating-llm-with-other-machine-learning-models.md", "text": "https://wpnews.pro/news/integrating-llm-with-other-machine-learning-models.txt", "jsonld": "https://wpnews.pro/news/integrating-llm-with-other-machine-learning-models.jsonld"}}