We're going to build a command-line Topic Explainer that takes any subject and breaks it down for a chosen audience, from absolute beginner to expert. This is a solid first project if you are just getting started with LLMs because it teaches system prompts, message history, and streaming in one small script. I have shipped dozens of these internal tools, and this is the exact pattern I reach for first.
pip install openai
Before we add any abstractions, we will wire up the Oxlo.ai client and make a single chat completion to verify the endpoint and credentials. I am using llama-3.3-70b
here because it is a reliable general-purpose flagship model.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
user_message = "Explain how a large language model works."
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "user", "content": user_message},
],
)
print(response.choices[0].message.content)
Raw completions can wander. We will lock the behavior down with a system prompt so the assistant always explains topics at the requested level and keeps answers concise. Here is the system prompt I use for this agent. You can tune the rules later.
SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""
Now we pass it into the messages array.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""
user_message = "Explain how a large language model works at a beginner level."
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
print(response.choices[0].message.content)
Hard-coded messages are fine for one-offs, but we want a reusable function that accepts a topic and a level. This keeps the setup code clean and makes the agent easier to test.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""
def explain_topic(topic: str, level: str = "beginner") -> str:
user_message = f"Explain '{topic}' at a {level} level."
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
return response.choices[0].message.content
if __name__ == "__main__":
print(explain_topic("how neural networks learn", "beginner"))
Waiting for the full response to return feels slow. We will enable streaming and print chunks as they arrive. On Oxlo.ai, popular models like llama-3.3-70b
have no cold starts, so the first token hits the terminal quickly.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""
def explain_topic_stream(topic: str, level: str = "beginner"):
user_message = f"Explain '{topic}' at a {level} level."
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
stream=True,
)
for chunk in response:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
if __name__ == "__main__":
explain_topic_stream("how transformers handle attention", "beginner")
A real tutor answers follow-ups. We will keep a messages
list in memory and append each user question and assistant reply so the context persists across turns. This is the simplest possible conversation loop.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""
def run_tutor():
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
]
print("Topic Explainer is ready. Type 'quit' to exit.")
while True:
user_input = input("\nTopic or question: ").strip()
if user_input.lower() == "quit":
break
messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=messages,
stream=True,
)
assistant_reply = ""
for chunk in response:
if chunk.choices[0].delta.content:
text = chunk.choices[0].delta.content
assistant_reply += text
print(text, end="", flush=True)
print()
messages.append({"role": "assistant", "content": assistant_reply})
if __name__ == "__main__":
run_tutor()
Save the final script as tutor.py
, export your key, and run it. Here is a sample session I recorded earlier today.
$ export OXLO_API_KEY="oxlo_..."
$ python tutor.py
Topic Explainer is ready. Type 'quit' to exit.
Topic or question: Explain how LLMs predict the next word at a beginner level
Imagine you are playing a game where you read a sentence and guess the next word. You have read every book, article, and web page on the internet, so you have a good sense of what word usually comes next. An LLM does exactly that, but with math. It turns words into numbers, looks at the pattern of the sentence so far, and outputs the most likely next word. Then it adds that word back to the sentence and repeats the process until it finishes.
Topic or question: What are those numbers called?
They are called embeddings, or vectors. Each word gets mapped to a long list of numbers that capture its meaning, so similar words end up close together in that number space.
You now have a working conversational agent that runs against Oxlo.ai with request-based pricing. That means you can send long system prompts or multi-turn conversations without watching token costs scale, which makes this pattern cheap to experiment with. Two concrete next steps: swap in deepseek-v3.2
for math or coding explanations if you want to see stronger reasoning on a free-tier model, or add a Gradio UI so non-technical teammates can use it. If you want to see how the flat per-request pricing compares for heavier workloads, check the details at https://oxlo.ai/pricing.