We are building a support ticket intelligence pipeline that classifies incoming requests and drafts contextual replies by combining Oxlo.ai embeddings, a local random forest classifier, and an LLM. It helps support teams cut response time by automating triage and surfacing relevant historical resolutions. The entire stack runs through Oxlo.ai's OpenAI-compatible API, so we can standardize on one provider for both vector and text generation workloads.
pip install openai scikit-learn numpy
I start by instantiating the client. Oxlo.ai exposes a fully OpenAI-compatible endpoint, so the standard SDK works without wrappers.
from openai import OpenAI
import os
client = OpenAI(
base_url="https://api.oxlo.ai/v1",
api_key=os.environ["OXLO_API_KEY"]
)
We need a knowledge base of resolved tickets. I embed them using Oxlo.ai's BGE-Large model so we can retrieve later and train a classifier on the same vectors.
import numpy as np
historical_tickets = [
{"id": 1, "text": "Cannot connect to database after latest deploy.", "priority": "high", "resolution": "Rollback the migration and re-run with SSL enabled."},
{"id": 2, "text": "Logo is misaligned on the billing page in Safari.", "priority": "low", "resolution": "Add flex-center to the container CSS."},
{"id": 3, "text": "Intermittent 500 errors on checkout API.", "priority": "high", "resolution": "Increase the connection pool size in Redis."},
{"id": 4, "text": "Dark mode toggle missing in settings.", "priority": "low", "resolution": "Shipped in v2.4.1 under Appearance."},
{"id": 5, "text": "Webhook signatures fail verification.", "priority": "high", "resolution": "Document the HMAC-SHA256 format and provide a code sample."},
]
def embed_text(text):
response = client.embeddings.create(
input=text,
model="bge-large"
)
return response.data[0].embedding
for t in historical_tickets:
t["embedding"] = embed_text(t["text"])
print(f"Embedded {len(historical_tickets)} tickets.")
Now we treat the embeddings as feature vectors and train a lightweight random forest to predict priority. This is where traditional ML meets the LLM stack.
from sklearn.ensemble import RandomForestClassifier
X = np.array([t["embedding"] for t in historical_tickets])
y = [t["priority"] for t in historical_tickets]
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X, y)
print("Classifier trained on embedding features.")
When a new ticket arrives, we find the closest historical matches by cosine similarity over the same embedding space.
from sklearn.metrics.pairwise import cosine_similarity
def retrieve_similar(embedding, tickets, top_k=2):
embeddings = np.array([t["embedding"] for t in tickets])
sims = cosine_similarity([embedding], embeddings)[0]
top_indices = np.argsort(sims)[-top_k:][::-1]
return [tickets[i] for i in top_indices]
The system prompt tells the LLM how to behave, referencing the classification and retrieved context we will inject.
SYSTEM_PROMPT = """You are a senior support engineer assistant.
Your job is to:
1. Acknowledge the user's issue in one sentence.
2. State the predicted priority and explain why based on similar tickets.
3. Draft a concise, actionable reply that references the resolution of the most relevant historical ticket.
Keep the tone technical and direct. Do not ask the user to verify information already provided."""
This function ties the pieces together: embed the new ticket, predict priority, retrieve neighbors, and call Oxlo.ai's Llama 3.3 70B to generate the response.
def process_ticket(ticket_text):
emb = embed_text(ticket_text)
priority = clf.predict([emb])[0]
neighbors = retrieve_similar(emb, historical_tickets, top_k=2)
context = "\n\n".join(
f"Ticket: {n['text']}\nPriority: {n['priority']}\nResolution: {n['resolution']}"
for n in neighbors
)
user_message = (
f"New ticket: {ticket_text}\n"
f"Predicted priority: {priority}\n\n"
f"Historical context:\n{context}"
)
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return priority, response.choices[0].message.content
Here is a realistic incoming ticket and the output from the pipeline.
ticket = "API keys rotated this morning and now all requests return 401 Unauthorized."
priority, reply = process_ticket(ticket)
print(f"Predicted priority: {priority}")
print("---")
print(reply)
Example output:
Predicted priority: high
---
We see you are hitting authentication failures after a key rotation.
Priority: high. This matches previous high-priority incidents involving verification and access issues.
Action: Please verify that your integration is using the newly issued key format. Refer to the resolution for webhook signatures fail verification to adjust your header calculation and ensure the HMAC-SHA256 code sample is applied correctly.
This pipeline shows how to treat an LLM as one component in a broader ML system rather than a monolithic black box. Oxlo.ai simplifies the architecture because embeddings and chat completions share one endpoint and one request-based pricing model, which keeps costs predictable when you scale to thousands of tickets. A concrete next step is to replace the random forest with a fine-tuned classifier hosted on Oxlo.ai, or stream the generated replies into a Slack webhook for real-time triage.