{"slug": "88-the-openai-api-everything-you-can-build", "title": "88. The OpenAI API: Everything You Can Build", "summary": "The OpenAI API is the most widely used interface for integrating AI models into products, offering features like chat completions, streaming, function calling, embeddings, image generation, and speech. The article provides practical guidance on using the API, including code examples for basic chat completions, cost estimation for different models (such as GPT-3.5-turbo, GPT-4o-mini, and GPT-4o), and best practices for streaming responses to improve perceived speed in user interfaces.", "body_md": "Every AI product you use is probably calling an API somewhere.\n\nThe chat assistant in your IDE. The customer service bot on a website. The document summarizer in your company's internal tools. The code reviewer. The email writer. Nearly all of them send text to a remote model, get text back, and display it to you.\n\nOpenAI built the most widely used API for this. Not the only one. Not always the cheapest. But the one with the most ecosystem support, the most tutorials, the most integrations, and the API design that others have copied.\n\nThis post covers everything: chat completions, streaming, function calling, embeddings, image generation, speech, and the patterns that make production applications reliable.\n\n## Setup and First Call\n\n``` python\nfrom openai import OpenAI\nimport json\nimport time\nimport os\n\nclient = OpenAI(api_key=os.environ.get(\"OPENAI_API_KEY\", \"your-key-here\"))\n\nresponse = client.chat.completions.create(\n    model    = \"gpt-3.5-turbo\",\n    messages = [\n        {\"role\": \"system\",  \"content\": \"You are a helpful assistant.\"},\n        {\"role\": \"user\",    \"content\": \"What is machine learning in one sentence?\"}\n    ],\n    temperature = 0.7,\n    max_tokens  = 150,\n)\n\nprint(\"Basic chat completion:\")\nprint(f\"  Response: {response.choices[0].message.content}\")\nprint()\nprint(\"Response object details:\")\nprint(f\"  model:              {response.model}\")\nprint(f\"  finish_reason:      {response.choices[0].finish_reason}\")\nprint(f\"  prompt_tokens:      {response.usage.prompt_tokens}\")\nprint(f\"  completion_tokens:  {response.usage.completion_tokens}\")\nprint(f\"  total_tokens:       {response.usage.total_tokens}\")\nprint()\n\ncost_per_1k = {\"gpt-3.5-turbo\": (0.0005, 0.0015), \"gpt-4-turbo\": (0.01, 0.03)}\nmodel = \"gpt-3.5-turbo\"\nin_cost  = response.usage.prompt_tokens     / 1000 * cost_per_1k[model][0]\nout_cost = response.usage.completion_tokens / 1000 * cost_per_1k[model][1]\nprint(f\"  Estimated cost: ${in_cost + out_cost:.6f}\")\n```\n\n## Models Available and When to Use Each\n\n```\nmodels = {\n    \"gpt-3.5-turbo\": {\n        \"context\":  \"16K tokens\",\n        \"in_cost\":  \"$0.50 / 1M tokens\",\n        \"out_cost\": \"$1.50 / 1M tokens\",\n        \"speed\":    \"very fast\",\n        \"best_for\": \"Simple Q&A, classification, extraction, high-volume tasks\"\n    },\n    \"gpt-4o-mini\": {\n        \"context\":  \"128K tokens\",\n        \"in_cost\":  \"$0.15 / 1M tokens\",\n        \"out_cost\": \"$0.60 / 1M tokens\",\n        \"speed\":    \"fast\",\n        \"best_for\": \"Most tasks — best price/performance in 2024\"\n    },\n    \"gpt-4o\": {\n        \"context\":  \"128K tokens\",\n        \"in_cost\":  \"$5.00 / 1M tokens\",\n        \"out_cost\": \"$15.00 / 1M tokens\",\n        \"speed\":    \"moderate\",\n        \"best_for\": \"Complex reasoning, long documents, multimodal, code\"\n    },\n    \"gpt-4-turbo\": {\n        \"context\":  \"128K tokens\",\n        \"in_cost\":  \"$10.00 / 1M tokens\",\n        \"out_cost\": \"$30.00 / 1M tokens\",\n        \"speed\":    \"moderate\",\n        \"best_for\": \"Highest capability tasks, legacy integration\"\n    },\n}\n\nprint(f\"{'Model':<15} {'Context':>10} {'Input cost':>14} {'Output cost':>14} {'Speed':>10}\")\nprint(\"=\" * 70)\nfor name, info in models.items():\n    print(f\"{name:<15} {info['context']:>10} {info['in_cost']:>14} \"\n          f\"{info['out_cost']:>14} {info['speed']:>10}\")\n\nprint()\nprint(\"Practical rule:\")\nprint(\"  Default: gpt-4o-mini (excellent quality, lowest cost)\")\nprint(\"  Complex reasoning: gpt-4o (worth the cost)\")\nprint(\"  High volume, simple tasks: gpt-3.5-turbo (cheapest)\")\nprint(\"  Check openai.com/pricing for updated costs (change frequently)\")\n```\n\n## Streaming Responses\n\n```\nprint(\"Streaming: Show tokens as they are generated (faster perceived response):\")\nprint()\n\nstream = client.chat.completions.create(\n    model  = \"gpt-3.5-turbo\",\n    messages = [\n        {\"role\": \"user\", \"content\": \"List 5 key concepts in machine learning, briefly.\"}\n    ],\n    stream = True,\n)\n\nprint(\"Streaming output:\")\nfull_response = \"\"\nfor chunk in stream:\n    delta = chunk.choices[0].delta\n    if delta.content:\n        print(delta.content, end=\"\", flush=True)\n        full_response += delta.content\n\nprint()\nprint()\nprint(\"Streaming patterns:\")\nprint(\"  - Use for chat interfaces (user sees tokens appear, feels faster)\")\nprint(\"  - Collect full response by accumulating chunks\")\nprint(\"  - Handle finish_reason to detect end of stream\")\nprint(\"  - Use try/finally to handle disconnects gracefully\")\n```\n\n## System Prompts: The Most Important Tool\n\n``` python\ndef create_assistant(role, constraints, output_format=None):\n    \"\"\"Build a well-structured system prompt.\"\"\"\n    parts = [f\"You are {role}.\"]\n    if constraints:\n        parts.append(\"Rules:\")\n        for constraint in constraints:\n            parts.append(f\"- {constraint}\")\n    if output_format:\n        parts.append(f\"Always respond in: {output_format}\")\n    return \"\\n\".join(parts)\n\npersonas = {\n    \"Concise Technical Writer\": create_assistant(\n        role=\"a technical writer who values precision and brevity\",\n        constraints=[\n            \"Never use more than 3 sentences per answer\",\n            \"Always use specific technical terms\",\n            \"Provide one code example when relevant\"\n        ],\n    ),\n    \"Socratic Tutor\": create_assistant(\n        role=\"a Socratic tutor who teaches through questions\",\n        constraints=[\n            \"Never give direct answers — only ask guiding questions\",\n            \"Build on the student's own reasoning\",\n            \"Acknowledge correct insights before probing further\"\n        ],\n    ),\n    \"JSON Extractor\": create_assistant(\n        role=\"a data extraction assistant\",\n        constraints=[\n            \"Extract only what is explicitly stated in the input\",\n            \"Use null for missing values\",\n            \"Never infer or guess information\"\n        ],\n        output_format=\"valid JSON only, no explanation, no markdown\"\n    ),\n}\n\nfor name, prompt in personas.items():\n    print(f\"System prompt: {name}\")\n    print(f\"  {prompt[:120]}...\")\n    print()\n\nprint(\"System prompt best practices:\")\nbest_practices = [\n    \"Be explicit about role, constraints, and output format\",\n    \"Use bullet points for rules (models follow them more reliably)\",\n    \"Specify what NOT to do, not just what to do\",\n    \"Include examples when the output format is complex\",\n    \"Keep it concise — long system prompts dilute attention\",\n]\nfor p in best_practices:\n    print(f\"  • {p}\")\n```\n\n## Function Calling: Connecting LLMs to External Tools\n\n```\nprint(\"\\nFunction Calling: The Most Powerful OpenAI Feature\")\nprint()\nprint(\"Without function calling: LLM can only talk.\")\nprint(\"With function calling:    LLM can DO things.\")\nprint()\n\ntools = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\":        \"get_weather\",\n            \"description\": \"Get current weather for a city\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"city\": {\n                        \"type\":        \"string\",\n                        \"description\": \"City name\"\n                    },\n                    \"unit\": {\n                        \"type\":        \"string\",\n                        \"enum\":        [\"celsius\", \"fahrenheit\"],\n                        \"description\": \"Temperature unit\"\n                    }\n                },\n                \"required\": [\"city\"]\n            }\n        }\n    },\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\":        \"search_database\",\n            \"description\": \"Search company knowledge base\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"query\": {\n                        \"type\":        \"string\",\n                        \"description\": \"Search query\"\n                    },\n                    \"max_results\": {\n                        \"type\":        \"integer\",\n                        \"description\": \"Max results to return\",\n                        \"default\":     3\n                    }\n                },\n                \"required\": [\"query\"]\n            }\n        }\n    }\n]\n\ndef execute_tool(tool_name, tool_args):\n    \"\"\"Simulate tool execution.\"\"\"\n    if tool_name == \"get_weather\":\n        city = tool_args.get(\"city\", \"unknown\")\n        return json.dumps({\"city\": city, \"temperature\": 28, \"condition\": \"sunny\", \"unit\": \"celsius\"})\n    elif tool_name == \"search_database\":\n        return json.dumps({\"results\": [\n            {\"text\": \"Q3 revenue was $4.2M\", \"source\": \"Q3 Report\"},\n            {\"text\": \"Premium plan costs $49/month\", \"source\": \"Pricing\"}\n        ]})\n    return json.dumps({\"error\": \"unknown tool\"})\n\ndef run_with_tools(user_message, tools, verbose=True):\n    \"\"\"Complete tool-use loop.\"\"\"\n    messages = [{\"role\": \"user\", \"content\": user_message}]\n\n    response = client.chat.completions.create(\n        model   = \"gpt-3.5-turbo\",\n        messages = messages,\n        tools   = tools,\n        tool_choice = \"auto\"\n    )\n\n    msg = response.choices[0].message\n\n    if response.choices[0].finish_reason == \"tool_calls\":\n        messages.append({\"role\": \"assistant\", \"content\": None,\n                          \"tool_calls\": [tc.model_dump() for tc in msg.tool_calls]})\n\n        for tool_call in msg.tool_calls:\n            fn_name  = tool_call.function.name\n            fn_args  = json.loads(tool_call.function.arguments)\n            if verbose:\n                print(f\"  → Calling tool: {fn_name}({fn_args})\")\n            result = execute_tool(fn_name, fn_args)\n            if verbose:\n                print(f\"  ← Tool result:  {result[:80]}\")\n\n            messages.append({\n                \"role\":         \"tool\",\n                \"tool_call_id\": tool_call.id,\n                \"content\":      result\n            })\n\n        final = client.chat.completions.create(\n            model=response.model, messages=messages)\n        return final.choices[0].message.content\n\n    return msg.content\n\ntest_queries = [\n    \"What's the weather like in Mumbai right now?\",\n    \"What is our Q3 revenue?\",\n    \"What is the capital of France?\",\n]\n\nprint(\"Function calling test:\")\nfor query in test_queries:\n    print(f\"\\nUser: {query}\")\n    answer = run_with_tools(query, tools, verbose=True)\n    print(f\"Bot:  {answer[:120]}\")\n```\n\n## Structured JSON Output\n\n```\nprint(\"\\nStructured Output: Reliable JSON from LLMs\")\nprint()\n\nresponse = client.chat.completions.create(\n    model    = \"gpt-3.5-turbo\",\n    messages = [\n        {\n            \"role\": \"system\",\n            \"content\": \"Extract information from the text. Respond with valid JSON only, \"\n                       \"no markdown, no explanation. \"\n                       \"Schema: {name: string, role: string, company: string, \"\n                       \"skills: [string], years_experience: int|null}\"\n        },\n        {\n            \"role\": \"user\",\n            \"content\": \"John Smith is a Senior ML Engineer at Anthropic. \"\n                       \"He has 8 years of experience and specializes in \"\n                       \"transformer architectures, PyTorch, and distributed training.\"\n        }\n    ],\n    temperature = 0,\n)\n\nraw_json = response.choices[0].message.content\nparsed   = json.loads(raw_json)\n\nprint(\"Input: 'John Smith is a Senior ML Engineer at Anthropic...'\")\nprint(f\"Extracted JSON:\")\nprint(json.dumps(parsed, indent=2))\nprint()\nprint(\"Using response_format for guaranteed JSON (gpt-4-turbo and above):\")\nprint(\"  response_format={'type': 'json_object'}\")\nprint(\"  Guarantees valid JSON output — no parsing errors\")\nprint(\"  Still need schema in the system prompt\")\n```\n\n## Embeddings API\n\n```\nprint(\"\\nOpenAI Embeddings API:\")\nprint()\n\ntexts = [\n    \"Machine learning learns patterns from data.\",\n    \"Deep learning uses layered neural networks.\",\n    \"The Eiffel Tower is in Paris, France.\",\n    \"Artificial intelligence mimics human thinking.\",\n]\n\nemb_response = client.embeddings.create(\n    model = \"text-embedding-3-small\",\n    input = texts,\n)\n\nembeddings = [item.embedding for item in emb_response.data]\nprint(f\"Model:           text-embedding-3-small\")\nprint(f\"Dimensions:      {len(embeddings[0])}\")\nprint(f\"Total tokens:    {emb_response.usage.total_tokens}\")\nprint(f\"Texts embedded:  {len(embeddings)}\")\nprint()\n\nimport numpy as np\nfrom sklearn.metrics.pairwise import cosine_similarity\n\nsim_matrix = cosine_similarity(embeddings)\nprint(\"Pairwise similarities:\")\nfor i in range(len(texts)):\n    for j in range(i+1, len(texts)):\n        sim = sim_matrix[i][j]\n        print(f\"  {sim:.3f}  '{texts[i][:30]}...' ↔ '{texts[j][:30]}...'\")\n\nprint()\nprint(\"Embedding models comparison:\")\nemb_models = {\n    \"text-embedding-3-small\": (\"1536 dims\", \"$0.02 / 1M tokens\",  \"Best for most use cases\"),\n    \"text-embedding-3-large\": (\"3072 dims\", \"$0.13 / 1M tokens\",  \"Higher accuracy, bigger index\"),\n    \"text-embedding-ada-002\": (\"1536 dims\", \"$0.10 / 1M tokens\",  \"Legacy, use 3-small instead\"),\n}\nfor name, (dims, cost, note) in emb_models.items():\n    print(f\"  {name:<30} {dims:<12} {cost:<22} {note}\")\n```\n\n## Image Generation: DALL-E 3\n\n```\nprint(\"\\nImage Generation with DALL-E 3:\")\nprint()\n\nimage_response = client.images.generate(\n    model   = \"dall-e-3\",\n    prompt  = \"A neural network visualized as a glowing network of nodes and connections, \"\n              \"dark background, scientific illustration style, high quality\",\n    size    = \"1024x1024\",\n    quality = \"standard\",\n    n       = 1,\n)\n\nimage_url = image_response.data[0].url\nrevised   = image_response.data[0].revised_prompt\n\nprint(f\"Generated image URL: {image_url[:60]}...\")\nprint(f\"Revised prompt: {revised[:100]}...\")\nprint()\nprint(\"DALL-E 3 vs DALL-E 2:\")\ndalle_models = {\n    \"dall-e-3\": (\"1024x1024 to 1792x1024\", \"Better quality, prompt following\", \"$0.040/image standard\"),\n    \"dall-e-2\": (\"256 to 1024px\",           \"Faster, cheaper, less capable\",   \"$0.016/image 1024px\"),\n}\nfor name, (sizes, capability, cost) in dalle_models.items():\n    print(f\"  {name}: {sizes} | {capability} | {cost}\")\n```\n\n## Speech-to-Text (Whisper)\n\n```\nprint(\"\\nWhisper API: Speech to Text\")\nprint()\nprint(\"Whisper is OpenAI's speech recognition model.\")\nprint(\"Supports 100+ languages, extremely accurate.\")\nprint()\n\nwhisper_example = \"\"\"\nimport openai\n\nclient = openai.OpenAI()\n\n# Transcribe audio file\nwith open(\"audio.mp3\", \"rb\") as audio_file:\n    transcript = client.audio.transcriptions.create(\n        model  = \"whisper-1\",\n        file   = audio_file,\n        language = \"en\",          # optional, auto-detect if omitted\n        response_format = \"text\"  # \"json\", \"srt\", \"vtt\" also available\n    )\n\nprint(transcript)  # Returns transcribed text\n\n# Translate to English from any language\nwith open(\"hindi_audio.mp3\", \"rb\") as f:\n    translation = client.audio.translations.create(\n        model = \"whisper-1\",\n        file  = f\n    )\nprint(translation.text)  # Always returns English\n\"\"\"\nprint(whisper_example)\nprint(\"Cost: $0.006 per minute of audio\")\nprint(\"Max file size: 25MB\")\nprint(\"Supported formats: mp3, mp4, m4a, wav, webm, ogg\")\n```\n\n## Error Handling and Retry Logic\n\n``` python\nimport time\nfrom openai import RateLimitError, APIError, APIConnectionError\n\ndef robust_completion(messages, model=\"gpt-3.5-turbo\",\n                       max_retries=3, base_delay=1.0, **kwargs):\n    \"\"\"Production-grade completion with retry and error handling.\"\"\"\n    for attempt in range(max_retries):\n        try:\n            response = client.chat.completions.create(\n                model=model, messages=messages, **kwargs)\n            return response.choices[0].message.content\n\n        except RateLimitError as e:\n            if attempt == max_retries - 1:\n                raise\n            wait = base_delay * (2 ** attempt)\n            print(f\"Rate limit hit. Waiting {wait:.1f}s... (attempt {attempt+1})\")\n            time.sleep(wait)\n\n        except APIConnectionError as e:\n            if attempt == max_retries - 1:\n                raise\n            print(f\"Connection error. Retrying... (attempt {attempt+1})\")\n            time.sleep(base_delay)\n\n        except APIError as e:\n            if e.status_code == 500 and attempt < max_retries - 1:\n                time.sleep(base_delay)\n                continue\n            raise\n\n    return None\n\nprint(\"Error handling patterns:\")\nerror_guide = {\n    \"RateLimitError\":     \"Too many requests. Implement exponential backoff.\",\n    \"APIConnectionError\": \"Network issue. Retry with delay.\",\n    \"AuthenticationError\":\"Invalid API key. Check OPENAI_API_KEY env var.\",\n    \"BadRequestError\":    \"Invalid request (too long, bad format). Fix the request.\",\n    \"APIError (500)\":     \"OpenAI server error. Retry a few times.\",\n}\nfor error, solution in error_guide.items():\n    print(f\"  {error:<25}: {solution}\")\n```\n\n## Cost Estimation and Monitoring\n\n```\nclass CostTracker:\n    \"\"\"Track API costs across multiple calls.\"\"\"\n\n    PRICING = {\n        \"gpt-3.5-turbo\":       (0.0005, 0.0015),\n        \"gpt-4o-mini\":         (0.00015, 0.0006),\n        \"gpt-4o\":              (0.005, 0.015),\n        \"text-embedding-3-small\": (0.00002, 0),\n    }\n\n    def __init__(self):\n        self.calls   = []\n        self.total   = 0.0\n\n    def record(self, model, prompt_tokens, completion_tokens):\n        if model in self.PRICING:\n            in_rate, out_rate = self.PRICING[model]\n            cost = (prompt_tokens / 1000 * in_rate +\n                    completion_tokens / 1000 * out_rate)\n        else:\n            cost = 0.0\n        self.calls.append({\n            \"model\":      model,\n            \"in_tokens\":  prompt_tokens,\n            \"out_tokens\": completion_tokens,\n            \"cost\":       cost\n        })\n        self.total += cost\n        return cost\n\n    def summary(self):\n        print(f\"\\nAPI Cost Summary:\")\n        print(f\"  Total calls:  {len(self.calls)}\")\n        print(f\"  Total tokens: {sum(c['in_tokens']+c['out_tokens'] for c in self.calls):,}\")\n        print(f\"  Total cost:   ${self.total:.6f}\")\n        print(f\"  Avg per call: ${self.total/len(self.calls):.6f}\" if self.calls else \"\")\n\ntracker = CostTracker()\ntracker.record(\"gpt-3.5-turbo\", 150, 80)\ntracker.record(\"gpt-4o-mini\",   200, 120)\ntracker.record(\"gpt-4o-mini\",   180, 90)\ntracker.summary()\n```\n\n## Reference Links\n\n```\nprint(\"\\nEssential OpenAI Reference Links:\")\nprint()\n\nrefs = {\n    \"Official Documentation\": [\n        (\"API Reference\",             \"platform.openai.com/docs/api-reference\"),\n        (\"Cookbook (recipes)\",        \"cookbook.openai.com\"),\n        (\"Prompt Engineering Guide\",  \"platform.openai.com/docs/guides/prompt-engineering\"),\n        (\"Function Calling Guide\",    \"platform.openai.com/docs/guides/function-calling\"),\n        (\"Rate Limits Guide\",         \"platform.openai.com/docs/guides/rate-limits\"),\n    ],\n    \"Models and Pricing\": [\n        (\"Model Overview\",            \"platform.openai.com/docs/models\"),\n        (\"Pricing Page\",              \"openai.com/pricing\"),\n        (\"Tokenizer Tool\",            \"platform.openai.com/tokenizer\"),\n        (\"Usage Dashboard\",           \"platform.openai.com/usage\"),\n    ],\n    \"Cheat Sheets and Tutorials\": [\n        (\"OpenAI Python GitHub\",      \"github.com/openai/openai-python\"),\n        (\"DeepLearning.AI ChatGPT API course\", \"learn.deeplearning.ai/chatgpt-prompt-eng\"),\n        (\"Brex Prompt Engineering\",   \"github.com/brexhq/prompt-engineering\"),\n        (\"Best practices for safety\", \"platform.openai.com/docs/guides/safety-best-practices\"),\n    ],\n}\n\nfor category, links in refs.items():\n    print(f\"  {category}:\")\n    for name, url in links:\n        print(f\"    • {name:<40} {url}\")\n    print()\n```\n\n## Try This\n\nCreate `openai_practice.py`\n\n.\n\nPart 1: basic completions. Call GPT-3.5-turbo and GPT-4o-mini with the same prompt. Compare response quality, token usage, and estimated cost. Which model gives you better value for your use case?\n\nPart 2: function calling. Define at least 3 tools (weather lookup, database search, calendar check). Implement mock versions that return fake data. Test with 5 queries: some should trigger tool calls, some should not. Verify the model picks the right tool.\n\nPart 3: streaming interface. Build a simple command-line chat that streams responses character by character. Track total tokens used across the entire conversation. Print a cost estimate at the end.\n\nPart 4: embedding + search. Use `text-embedding-3-small`\n\nto embed 30 sentences from a domain of your choice. Given a query, find the top 3 most similar sentences. Compare results to a keyword search on the same corpus. Where does semantic search win? Where does keyword search win?\n\n## What's Next\n\nThe OpenAI API covers GPT and DALL-E. The next post covers the Anthropic Claude API: different design philosophy, different strengths, and specific capabilities like the system prompt hierarchy, extended thinking, and very long context windows. After that, the Phase 8 capstone.", "url": "https://wpnews.pro/news/88-the-openai-api-everything-you-can-build", "canonical_source": "https://dev.to/yakhilesh/88-the-openai-api-everything-you-can-build-n5", "published_at": "2026-05-18 20:17:16+00:00", "updated_at": "2026-05-18 20:31:09.469422+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "developer-tools", "products", "enterprise-software"], "entities": ["OpenAI", "GPT-3.5", "GPT-4", "GPT-4o", "GPT-4o-mini"], "alternates": {"html": "https://wpnews.pro/news/88-the-openai-api-everything-you-can-build", "markdown": "https://wpnews.pro/news/88-the-openai-api-everything-you-can-build.md", "text": "https://wpnews.pro/news/88-the-openai-api-everything-you-can-build.txt", "jsonld": "https://wpnews.pro/news/88-the-openai-api-everything-you-can-build.jsonld"}}