cd /news/large-language-models/introduction-to-llms-for-beginners · home topics large-language-models article
[ARTICLE · art-30610] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Introduction to LLMs for Beginners

A developer built a command-line Topic Explainer using Oxlo.ai's API and the llama-3.3-70b model, demonstrating system prompts, message history, and streaming for beginners. The project includes a reusable function that accepts a topic and audience level, with streaming enabled for faster output. The developer shipped dozens of similar internal tools and recommends this pattern for those starting with large language models.

read6 min views3 publishedJun 17, 2026

We're going to build a command-line Topic Explainer that takes any subject and breaks it down for a chosen audience, from absolute beginner to expert. This is a solid first project if you are just getting started with LLMs because it teaches system prompts, message history, and streaming in one small script. I have shipped dozens of these internal tools, and this is the exact pattern I reach for first.

pip install openai

Before we add any abstractions, we will wire up the Oxlo.ai client and make a single chat completion to verify the endpoint and credentials. I am using llama-3.3-70b

here because it is a reliable general-purpose flagship model.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

user_message = "Explain how a large language model works."

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "user", "content": user_message},
    ],
)

print(response.choices[0].message.content)

Raw completions can wander. We will lock the behavior down with a system prompt so the assistant always explains topics at the requested level and keeps answers concise. Here is the system prompt I use for this agent. You can tune the rules later.

SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""

Now we pass it into the messages array.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""

user_message = "Explain how a large language model works at a beginner level."

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_message},
    ],
)

print(response.choices[0].message.content)

Hard-coded messages are fine for one-offs, but we want a reusable function that accepts a topic and a level. This keeps the setup code clean and makes the agent easier to test.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""

def explain_topic(topic: str, level: str = "beginner") -> str:
    user_message = f"Explain '{topic}' at a {level} level."
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

if __name__ == "__main__":
    print(explain_topic("how neural networks learn", "beginner"))

Waiting for the full response to return feels slow. We will enable streaming and print chunks as they arrive. On Oxlo.ai, popular models like llama-3.3-70b

have no cold starts, so the first token hits the terminal quickly.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""

def explain_topic_stream(topic: str, level: str = "beginner"):
    user_message = f"Explain '{topic}' at a {level} level."
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": user_message},
        ],
        stream=True,
    )
    for chunk in response:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

if __name__ == "__main__":
    explain_topic_stream("how transformers handle attention", "beginner")

A real tutor answers follow-ups. We will keep a messages

list in memory and append each user question and assistant reply so the context persists across turns. This is the simplest possible conversation loop.

from openai import OpenAI

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

SYSTEM_PROMPT = """You are a patient technical tutor.
Your job is to explain any topic at the exact level the user asks for.
If the user asks for a "beginner" explanation, use simple analogies and avoid jargon.
If they ask for "expert" detail, be precise and technical.
Always keep your answer under three paragraphs unless the user asks for more."""

def run_tutor():
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
    ]
    print("Topic Explainer is ready. Type 'quit' to exit.")
    while True:
        user_input = input("\nTopic or question: ").strip()
        if user_input.lower() == "quit":
            break
        messages.append({"role": "user", "content": user_input})
        response = client.chat.completions.create(
            model="llama-3.3-70b",
            messages=messages,
            stream=True,
        )
        assistant_reply = ""
        for chunk in response:
            if chunk.choices[0].delta.content:
                text = chunk.choices[0].delta.content
                assistant_reply += text
                print(text, end="", flush=True)
        print()
        messages.append({"role": "assistant", "content": assistant_reply})

if __name__ == "__main__":
    run_tutor()

Save the final script as tutor.py

, export your key, and run it. Here is a sample session I recorded earlier today.

$ export OXLO_API_KEY="oxlo_..."
$ python tutor.py

Topic Explainer is ready. Type 'quit' to exit.

Topic or question: Explain how LLMs predict the next word at a beginner level

Imagine you are playing a game where you read a sentence and guess the next word. You have read every book, article, and web page on the internet, so you have a good sense of what word usually comes next. An LLM does exactly that, but with math. It turns words into numbers, looks at the pattern of the sentence so far, and outputs the most likely next word. Then it adds that word back to the sentence and repeats the process until it finishes.

Topic or question: What are those numbers called?

They are called embeddings, or vectors. Each word gets mapped to a long list of numbers that capture its meaning, so similar words end up close together in that number space.

You now have a working conversational agent that runs against Oxlo.ai with request-based pricing. That means you can send long system prompts or multi-turn conversations without watching token costs scale, which makes this pattern cheap to experiment with. Two concrete next steps: swap in deepseek-v3.2

for math or coding explanations if you want to see stronger reasoning on a free-tier model, or add a Gradio UI so non-technical teammates can use it. If you want to see how the flat per-request pricing compares for heavier workloads, check the details at https://oxlo.ai/pricing.

── more in #large-language-models 4 stories · sorted by recency
── more on @oxlo.ai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/introduction-to-llms…] indexed:0 read:6min 2026-06-17 ·