Integrating LLM with Other Machine Learning Models

A developer built a support ticket intelligence pipeline that combines Oxlo.ai embeddings, a local random forest classifier, and an LLM to automate triage and draft contextual replies. The system uses Oxlo.ai's OpenAI-compatible API for both vector and text generation, enabling standardized workflows. By embedding historical tickets and training a classifier on those vectors, the pipeline predicts priority and retrieves similar cases to inform LLM-generated responses.

We are building a support ticket intelligence pipeline that classifies incoming requests and drafts contextual replies by combining Oxlo.ai embeddings, a local random forest classifier, and an LLM. It helps support teams cut response time by automating triage and surfacing relevant historical resolutions. The entire stack runs through Oxlo.ai's OpenAI-compatible API, so we can standardize on one provider for both vector and text generation workloads. pip install openai scikit-learn numpy I start by instantiating the client. Oxlo.ai exposes a fully OpenAI-compatible endpoint, so the standard SDK works without wrappers. python from openai import OpenAI import os client = OpenAI base url="https://api.oxlo.ai/v1", api key=os.environ "OXLO API KEY" We need a knowledge base of resolved tickets. I embed them using Oxlo.ai's BGE-Large model so we can retrieve later and train a classifier on the same vectors. python import numpy as np historical tickets = {"id": 1, "text": "Cannot connect to database after latest deploy.", "priority": "high", "resolution": "Rollback the migration and re-run with SSL enabled."}, {"id": 2, "text": "Logo is misaligned on the billing page in Safari.", "priority": "low", "resolution": "Add flex-center to the container CSS."}, {"id": 3, "text": "Intermittent 500 errors on checkout API.", "priority": "high", "resolution": "Increase the connection pool size in Redis."}, {"id": 4, "text": "Dark mode toggle missing in settings.", "priority": "low", "resolution": "Shipped in v2.4.1 under Appearance."}, {"id": 5, "text": "Webhook signatures fail verification.", "priority": "high", "resolution": "Document the HMAC-SHA256 format and provide a code sample."}, def embed text text : response = client.embeddings.create input=text, model="bge-large" return response.data 0 .embedding for t in historical tickets: t "embedding" = embed text t "text" print f"Embedded {len historical tickets } tickets." Now we treat the embeddings as feature vectors and train a lightweight random forest to predict priority. This is where traditional ML meets the LLM stack. python from sklearn.ensemble import RandomForestClassifier X = np.array t "embedding" for t in historical tickets y = t "priority" for t in historical tickets clf = RandomForestClassifier n estimators=100, random state=42 clf.fit X, y print "Classifier trained on embedding features." When a new ticket arrives, we find the closest historical matches by cosine similarity over the same embedding space. python from sklearn.metrics.pairwise import cosine similarity def retrieve similar embedding, tickets, top k=2 : embeddings = np.array t "embedding" for t in tickets sims = cosine similarity embedding , embeddings 0 top indices = np.argsort sims -top k: ::-1 return tickets i for i in top indices The system prompt tells the LLM how to behave, referencing the classification and retrieved context we will inject. SYSTEM PROMPT = """You are a senior support engineer assistant. Your job is to: 1. Acknowledge the user's issue in one sentence. 2. State the predicted priority and explain why based on similar tickets. 3. Draft a concise, actionable reply that references the resolution of the most relevant historical ticket. Keep the tone technical and direct. Do not ask the user to verify information already provided.""" This function ties the pieces together: embed the new ticket, predict priority, retrieve neighbors, and call Oxlo.ai's Llama 3.3 70B to generate the response. python def process ticket ticket text : Embed the incoming ticket emb = embed text ticket text Predict priority using the local classifier priority = clf.predict emb 0 Retrieve similar historical tickets neighbors = retrieve similar emb, historical tickets, top k=2 Build the context block context = "\n\n".join f"Ticket: {n 'text' }\nPriority: {n 'priority' }\nResolution: {n 'resolution' }" for n in neighbors user message = f"New ticket: {ticket text}\n" f"Predicted priority: {priority}\n\n" f"Historical context:\n{context}" response = client.chat.completions.create model="llama-3.3-70b", messages= {"role": "system", "content": SYSTEM PROMPT}, {"role": "user", "content": user message}, , return priority, response.choices 0 .message.content Here is a realistic incoming ticket and the output from the pipeline. ticket = "API keys rotated this morning and now all requests return 401 Unauthorized." priority, reply = process ticket ticket print f"Predicted priority: {priority}" print "---" print reply Example output: Predicted priority: high --- We see you are hitting authentication failures after a key rotation. Priority: high. This matches previous high-priority incidents involving verification and access issues. Action: Please verify that your integration is using the newly issued key format. Refer to the resolution for webhook signatures fail verification to adjust your header calculation and ensure the HMAC-SHA256 code sample is applied correctly. This pipeline shows how to treat an LLM as one component in a broader ML system rather than a monolithic black box. Oxlo.ai simplifies the architecture because embeddings and chat completions share one endpoint and one request-based pricing model, which keeps costs predictable when you scale to thousands of tickets. A concrete next step is to replace the random forest with a fine-tuned classifier hosted on Oxlo.ai, or stream the generated replies into a Slack webhook for real-time triage.