{"slug": "introduction-to-llms-for-beginners", "title": "Introduction to LLMs for Beginners", "summary": "A developer built a command-line Topic Explainer using Oxlo.ai's API and the llama-3.3-70b model, demonstrating system prompts, message history, and streaming for beginners. The project includes a reusable function that accepts a topic and audience level, with streaming enabled for faster output. The developer shipped dozens of similar internal tools and recommends this pattern for those starting with large language models.", "body_md": "We're going to build a command-line Topic Explainer that takes any subject and breaks it down for a chosen audience, from absolute beginner to expert. This is a solid first project if you are just getting started with LLMs because it teaches system prompts, message history, and streaming in one small script. I have shipped dozens of these internal tools, and this is the exact pattern I reach for first.\n\n`pip install openai`\n\nBefore we add any abstractions, we will wire up the Oxlo.ai client and make a single chat completion to verify the endpoint and credentials. I am using `llama-3.3-70b`\n\nhere because it is a reliable general-purpose flagship model.\n\n``` python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"https://api.oxlo.ai/v1\", api_key=\"YOUR_OXLO_API_KEY\")\n\nuser_message = \"Explain how a large language model works.\"\n\nresponse = client.chat.completions.create(\n    model=\"llama-3.3-70b\",\n    messages=[\n        {\"role\": \"user\", \"content\": user_message},\n    ],\n)\n\nprint(response.choices[0].message.content)\n```\n\nRaw completions can wander. We will lock the behavior down with a system prompt so the assistant always explains topics at the requested level and keeps answers concise. Here is the system prompt I use for this agent. You can tune the rules later.\n\n```\nSYSTEM_PROMPT = \"\"\"You are a patient technical tutor.\nYour job is to explain any topic at the exact level the user asks for.\nIf the user asks for a \"beginner\" explanation, use simple analogies and avoid jargon.\nIf they ask for \"expert\" detail, be precise and technical.\nAlways keep your answer under three paragraphs unless the user asks for more.\"\"\"\n```\n\nNow we pass it into the messages array.\n\n``` python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"https://api.oxlo.ai/v1\", api_key=\"YOUR_OXLO_API_KEY\")\n\nSYSTEM_PROMPT = \"\"\"You are a patient technical tutor.\nYour job is to explain any topic at the exact level the user asks for.\nIf the user asks for a \"beginner\" explanation, use simple analogies and avoid jargon.\nIf they ask for \"expert\" detail, be precise and technical.\nAlways keep your answer under three paragraphs unless the user asks for more.\"\"\"\n\nuser_message = \"Explain how a large language model works at a beginner level.\"\n\nresponse = client.chat.completions.create(\n    model=\"llama-3.3-70b\",\n    messages=[\n        {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n        {\"role\": \"user\", \"content\": user_message},\n    ],\n)\n\nprint(response.choices[0].message.content)\n```\n\nHard-coded messages are fine for one-offs, but we want a reusable function that accepts a topic and a level. This keeps the setup code clean and makes the agent easier to test.\n\n``` python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"https://api.oxlo.ai/v1\", api_key=\"YOUR_OXLO_API_KEY\")\n\nSYSTEM_PROMPT = \"\"\"You are a patient technical tutor.\nYour job is to explain any topic at the exact level the user asks for.\nIf the user asks for a \"beginner\" explanation, use simple analogies and avoid jargon.\nIf they ask for \"expert\" detail, be precise and technical.\nAlways keep your answer under three paragraphs unless the user asks for more.\"\"\"\n\ndef explain_topic(topic: str, level: str = \"beginner\") -> str:\n    user_message = f\"Explain '{topic}' at a {level} level.\"\n    response = client.chat.completions.create(\n        model=\"llama-3.3-70b\",\n        messages=[\n            {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n            {\"role\": \"user\", \"content\": user_message},\n        ],\n    )\n    return response.choices[0].message.content\n\nif __name__ == \"__main__\":\n    print(explain_topic(\"how neural networks learn\", \"beginner\"))\n```\n\nWaiting for the full response to return feels slow. We will enable streaming and print chunks as they arrive. On Oxlo.ai, popular models like `llama-3.3-70b`\n\nhave no cold starts, so the first token hits the terminal quickly.\n\n``` python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"https://api.oxlo.ai/v1\", api_key=\"YOUR_OXLO_API_KEY\")\n\nSYSTEM_PROMPT = \"\"\"You are a patient technical tutor.\nYour job is to explain any topic at the exact level the user asks for.\nIf the user asks for a \"beginner\" explanation, use simple analogies and avoid jargon.\nIf they ask for \"expert\" detail, be precise and technical.\nAlways keep your answer under three paragraphs unless the user asks for more.\"\"\"\n\ndef explain_topic_stream(topic: str, level: str = \"beginner\"):\n    user_message = f\"Explain '{topic}' at a {level} level.\"\n    response = client.chat.completions.create(\n        model=\"llama-3.3-70b\",\n        messages=[\n            {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n            {\"role\": \"user\", \"content\": user_message},\n        ],\n        stream=True,\n    )\n    for chunk in response:\n        if chunk.choices[0].delta.content:\n            print(chunk.choices[0].delta.content, end=\"\", flush=True)\n    print()\n\nif __name__ == \"__main__\":\n    explain_topic_stream(\"how transformers handle attention\", \"beginner\")\n```\n\nA real tutor answers follow-ups. We will keep a `messages`\n\nlist in memory and append each user question and assistant reply so the context persists across turns. This is the simplest possible conversation loop.\n\n``` python\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"https://api.oxlo.ai/v1\", api_key=\"YOUR_OXLO_API_KEY\")\n\nSYSTEM_PROMPT = \"\"\"You are a patient technical tutor.\nYour job is to explain any topic at the exact level the user asks for.\nIf the user asks for a \"beginner\" explanation, use simple analogies and avoid jargon.\nIf they ask for \"expert\" detail, be precise and technical.\nAlways keep your answer under three paragraphs unless the user asks for more.\"\"\"\n\ndef run_tutor():\n    messages = [\n        {\"role\": \"system\", \"content\": SYSTEM_PROMPT},\n    ]\n    print(\"Topic Explainer is ready. Type 'quit' to exit.\")\n    while True:\n        user_input = input(\"\\nTopic or question: \").strip()\n        if user_input.lower() == \"quit\":\n            break\n        messages.append({\"role\": \"user\", \"content\": user_input})\n        response = client.chat.completions.create(\n            model=\"llama-3.3-70b\",\n            messages=messages,\n            stream=True,\n        )\n        assistant_reply = \"\"\n        for chunk in response:\n            if chunk.choices[0].delta.content:\n                text = chunk.choices[0].delta.content\n                assistant_reply += text\n                print(text, end=\"\", flush=True)\n        print()\n        messages.append({\"role\": \"assistant\", \"content\": assistant_reply})\n\nif __name__ == \"__main__\":\n    run_tutor()\n```\n\nSave the final script as `tutor.py`\n\n, export your key, and run it. Here is a sample session I recorded earlier today.\n\n``` bash\n$ export OXLO_API_KEY=\"oxlo_...\"\n$ python tutor.py\n\nTopic Explainer is ready. Type 'quit' to exit.\n\nTopic or question: Explain how LLMs predict the next word at a beginner level\n\nImagine you are playing a game where you read a sentence and guess the next word. You have read every book, article, and web page on the internet, so you have a good sense of what word usually comes next. An LLM does exactly that, but with math. It turns words into numbers, looks at the pattern of the sentence so far, and outputs the most likely next word. Then it adds that word back to the sentence and repeats the process until it finishes.\n\nTopic or question: What are those numbers called?\n\nThey are called embeddings, or vectors. Each word gets mapped to a long list of numbers that capture its meaning, so similar words end up close together in that number space.\n```\n\nYou now have a working conversational agent that runs against Oxlo.ai with request-based pricing. That means you can send long system prompts or multi-turn conversations without watching token costs scale, which makes this pattern cheap to experiment with. Two concrete next steps: swap in `deepseek-v3.2`\n\nfor math or coding explanations if you want to see stronger reasoning on a free-tier model, or add a Gradio UI so non-technical teammates can use it. If you want to see how the flat per-request pricing compares for heavier workloads, check the details at [https://oxlo.ai/pricing](https://oxlo.ai/pricing).", "url": "https://wpnews.pro/news/introduction-to-llms-for-beginners", "canonical_source": "https://dev.to/shashank_ms_6a35baa4be138/introduction-to-llms-for-beginners-197a", "published_at": "2026-06-17 05:36:06+00:00", "updated_at": "2026-06-17 05:51:26.514809+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "artificial-intelligence"], "entities": ["Oxlo.ai", "llama-3.3-70b", "OpenAI"], "alternates": {"html": "https://wpnews.pro/news/introduction-to-llms-for-beginners", "markdown": "https://wpnews.pro/news/introduction-to-llms-for-beginners.md", "text": "https://wpnews.pro/news/introduction-to-llms-for-beginners.txt", "jsonld": "https://wpnews.pro/news/introduction-to-llms-for-beginners.jsonld"}}