# Integrating LLM with Other Machine Learning Models

> Source: <https://dev.to/shashank_ms_6a35baa4be138/integrating-llm-with-other-machine-learning-models-c0o>
> Published: 2026-06-18 05:36:25+00:00

We are building a support ticket intelligence pipeline that classifies incoming requests and drafts contextual replies by combining Oxlo.ai embeddings, a local random forest classifier, and an LLM. It helps support teams cut response time by automating triage and surfacing relevant historical resolutions. The entire stack runs through Oxlo.ai's OpenAI-compatible API, so we can standardize on one provider for both vector and text generation workloads.

`pip install openai scikit-learn numpy`

I start by instantiating the client. Oxlo.ai exposes a fully OpenAI-compatible endpoint, so the standard SDK works without wrappers.

``` python
from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ["OXLO_API_KEY"]
)
```

We need a knowledge base of resolved tickets. I embed them using Oxlo.ai's BGE-Large model so we can retrieve later and train a classifier on the same vectors.

``` python
import numpy as np

historical_tickets = [
    {"id": 1, "text": "Cannot connect to database after latest deploy.", "priority": "high", "resolution": "Rollback the migration and re-run with SSL enabled."},
    {"id": 2, "text": "Logo is misaligned on the billing page in Safari.", "priority": "low", "resolution": "Add flex-center to the container CSS."},
    {"id": 3, "text": "Intermittent 500 errors on checkout API.", "priority": "high", "resolution": "Increase the connection pool size in Redis."},
    {"id": 4, "text": "Dark mode toggle missing in settings.", "priority": "low", "resolution": "Shipped in v2.4.1 under Appearance."},
    {"id": 5, "text": "Webhook signatures fail verification.", "priority": "high", "resolution": "Document the HMAC-SHA256 format and provide a code sample."},
]

def embed_text(text):
    response = client.embeddings.create(
        input=text,
        model="bge-large"
    )
    return response.data[0].embedding

for t in historical_tickets:
    t["embedding"] = embed_text(t["text"])

print(f"Embedded {len(historical_tickets)} tickets.")
```

Now we treat the embeddings as feature vectors and train a lightweight random forest to predict priority. This is where traditional ML meets the LLM stack.

``` python
from sklearn.ensemble import RandomForestClassifier

X = np.array([t["embedding"] for t in historical_tickets])
y = [t["priority"] for t in historical_tickets]

clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X, y)

print("Classifier trained on embedding features.")
```

When a new ticket arrives, we find the closest historical matches by cosine similarity over the same embedding space.

``` python
from sklearn.metrics.pairwise import cosine_similarity

def retrieve_similar(embedding, tickets, top_k=2):
    embeddings = np.array([t["embedding"] for t in tickets])
    sims = cosine_similarity([embedding], embeddings)[0]
    top_indices = np.argsort(sims)[-top_k:][::-1]
    return [tickets[i] for i in top_indices]
```

The system prompt tells the LLM how to behave, referencing the classification and retrieved context we will inject.

```
SYSTEM_PROMPT = """You are a senior support engineer assistant.

Your job is to:
1. Acknowledge the user's issue in one sentence.
2. State the predicted priority and explain why based on similar tickets.
3. Draft a concise, actionable reply that references the resolution of the most relevant historical ticket.

Keep the tone technical and direct. Do not ask the user to verify information already provided."""
```

This function ties the pieces together: embed the new ticket, predict priority, retrieve neighbors, and call Oxlo.ai's Llama 3.3 70B to generate the response.

``` python
def process_ticket(ticket_text):
    # Embed the incoming ticket
    emb = embed_text(ticket_text)
    
    # Predict priority using the local classifier
    priority = clf.predict([emb])[0]
    
    # Retrieve similar historical tickets
    neighbors = retrieve_similar(emb, historical_tickets, top_k=2)
    
    # Build the context block
    context = "\n\n".join(
        f"Ticket: {n['text']}\nPriority: {n['priority']}\nResolution: {n['resolution']}"
        for n in neighbors
    )
    
    user_message = (
        f"New ticket: {ticket_text}\n"
        f"Predicted priority: {priority}\n\n"
        f"Historical context:\n{context}"
    )
    
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    
    return priority, response.choices[0].message.content
```

Here is a realistic incoming ticket and the output from the pipeline.

```
ticket = "API keys rotated this morning and now all requests return 401 Unauthorized."

priority, reply = process_ticket(ticket)

print(f"Predicted priority: {priority}")
print("---")
print(reply)
```

Example output:

```
Predicted priority: high
---

We see you are hitting authentication failures after a key rotation.

Priority: high. This matches previous high-priority incidents involving verification and access issues.

Action: Please verify that your integration is using the newly issued key format. Refer to the resolution for webhook signatures fail verification to adjust your header calculation and ensure the HMAC-SHA256 code sample is applied correctly.
```

This pipeline shows how to treat an LLM as one component in a broader ML system rather than a monolithic black box. Oxlo.ai simplifies the architecture because embeddings and chat completions share one endpoint and one request-based pricing model, which keeps costs predictable when you scale to thousands of tickets. A concrete next step is to replace the random forest with a fine-tuned classifier hosted on Oxlo.ai, or stream the generated replies into a Slack webhook for real-time triage.
