cd /news/machine-learning/integrating-llm-with-other-machine-l… · home topics machine-learning article
[ARTICLE · art-32153] src=dev.to ↗ pub= topic=machine-learning verified=true sentiment=↑ positive

Integrating LLM with Other Machine Learning Models

A developer built a support ticket intelligence pipeline that combines Oxlo.ai embeddings, a local random forest classifier, and an LLM to automate triage and draft contextual replies. The system uses Oxlo.ai's OpenAI-compatible API for both vector and text generation, enabling standardized workflows. By embedding historical tickets and training a classifier on those vectors, the pipeline predicts priority and retrieves similar cases to inform LLM-generated responses.

read4 min views1 publishedJun 18, 2026

We are building a support ticket intelligence pipeline that classifies incoming requests and drafts contextual replies by combining Oxlo.ai embeddings, a local random forest classifier, and an LLM. It helps support teams cut response time by automating triage and surfacing relevant historical resolutions. The entire stack runs through Oxlo.ai's OpenAI-compatible API, so we can standardize on one provider for both vector and text generation workloads.

pip install openai scikit-learn numpy

I start by instantiating the client. Oxlo.ai exposes a fully OpenAI-compatible endpoint, so the standard SDK works without wrappers.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ["OXLO_API_KEY"]
)

We need a knowledge base of resolved tickets. I embed them using Oxlo.ai's BGE-Large model so we can retrieve later and train a classifier on the same vectors.

import numpy as np

historical_tickets = [
    {"id": 1, "text": "Cannot connect to database after latest deploy.", "priority": "high", "resolution": "Rollback the migration and re-run with SSL enabled."},
    {"id": 2, "text": "Logo is misaligned on the billing page in Safari.", "priority": "low", "resolution": "Add flex-center to the container CSS."},
    {"id": 3, "text": "Intermittent 500 errors on checkout API.", "priority": "high", "resolution": "Increase the connection pool size in Redis."},
    {"id": 4, "text": "Dark mode toggle missing in settings.", "priority": "low", "resolution": "Shipped in v2.4.1 under Appearance."},
    {"id": 5, "text": "Webhook signatures fail verification.", "priority": "high", "resolution": "Document the HMAC-SHA256 format and provide a code sample."},
]

def embed_text(text):
    response = client.embeddings.create(
        input=text,
        model="bge-large"
    )
    return response.data[0].embedding

for t in historical_tickets:
    t["embedding"] = embed_text(t["text"])

print(f"Embedded {len(historical_tickets)} tickets.")

Now we treat the embeddings as feature vectors and train a lightweight random forest to predict priority. This is where traditional ML meets the LLM stack.

from sklearn.ensemble import RandomForestClassifier

X = np.array([t["embedding"] for t in historical_tickets])
y = [t["priority"] for t in historical_tickets]

clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X, y)

print("Classifier trained on embedding features.")

When a new ticket arrives, we find the closest historical matches by cosine similarity over the same embedding space.

from sklearn.metrics.pairwise import cosine_similarity

def retrieve_similar(embedding, tickets, top_k=2):
    embeddings = np.array([t["embedding"] for t in tickets])
    sims = cosine_similarity([embedding], embeddings)[0]
    top_indices = np.argsort(sims)[-top_k:][::-1]
    return [tickets[i] for i in top_indices]

The system prompt tells the LLM how to behave, referencing the classification and retrieved context we will inject.

SYSTEM_PROMPT = """You are a senior support engineer assistant.

Your job is to:
1. Acknowledge the user's issue in one sentence.
2. State the predicted priority and explain why based on similar tickets.
3. Draft a concise, actionable reply that references the resolution of the most relevant historical ticket.

Keep the tone technical and direct. Do not ask the user to verify information already provided."""

This function ties the pieces together: embed the new ticket, predict priority, retrieve neighbors, and call Oxlo.ai's Llama 3.3 70B to generate the response.

def process_ticket(ticket_text):
    emb = embed_text(ticket_text)
    
    priority = clf.predict([emb])[0]
    
    neighbors = retrieve_similar(emb, historical_tickets, top_k=2)
    
    context = "\n\n".join(
        f"Ticket: {n['text']}\nPriority: {n['priority']}\nResolution: {n['resolution']}"
        for n in neighbors
    )
    
    user_message = (
        f"New ticket: {ticket_text}\n"
        f"Predicted priority: {priority}\n\n"
        f"Historical context:\n{context}"
    )
    
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    
    return priority, response.choices[0].message.content

Here is a realistic incoming ticket and the output from the pipeline.

ticket = "API keys rotated this morning and now all requests return 401 Unauthorized."

priority, reply = process_ticket(ticket)

print(f"Predicted priority: {priority}")
print("---")
print(reply)

Example output:

Predicted priority: high
---

We see you are hitting authentication failures after a key rotation.

Priority: high. This matches previous high-priority incidents involving verification and access issues.

Action: Please verify that your integration is using the newly issued key format. Refer to the resolution for webhook signatures fail verification to adjust your header calculation and ensure the HMAC-SHA256 code sample is applied correctly.

This pipeline shows how to treat an LLM as one component in a broader ML system rather than a monolithic black box. Oxlo.ai simplifies the architecture because embeddings and chat completions share one endpoint and one request-based pricing model, which keeps costs predictable when you scale to thousands of tickets. A concrete next step is to replace the random forest with a fine-tuned classifier hosted on Oxlo.ai, or stream the generated replies into a Slack webhook for real-time triage.

── more in #machine-learning 4 stories · sorted by recency
── more on @oxlo.ai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/integrating-llm-with…] indexed:0 read:4min 2026-06-18 ·