88. The OpenAI API: Everything You Can Build The OpenAI API is the most widely used interface for integrating AI models into products, offering features like chat completions, streaming, function calling, embeddings, image generation, and speech. The article provides practical guidance on using the API, including code examples for basic chat completions, cost estimation for different models (such as GPT-3.5-turbo, GPT-4o-mini, and GPT-4o), and best practices for streaming responses to improve perceived speed in user interfaces. Every AI product you use is probably calling an API somewhere. The chat assistant in your IDE. The customer service bot on a website. The document summarizer in your company's internal tools. The code reviewer. The email writer. Nearly all of them send text to a remote model, get text back, and display it to you. OpenAI built the most widely used API for this. Not the only one. Not always the cheapest. But the one with the most ecosystem support, the most tutorials, the most integrations, and the API design that others have copied. This post covers everything: chat completions, streaming, function calling, embeddings, image generation, speech, and the patterns that make production applications reliable. Setup and First Call python from openai import OpenAI import json import time import os client = OpenAI api key=os.environ.get "OPENAI API KEY", "your-key-here" response = client.chat.completions.create model = "gpt-3.5-turbo", messages = {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is machine learning in one sentence?"} , temperature = 0.7, max tokens = 150, print "Basic chat completion:" print f" Response: {response.choices 0 .message.content}" print print "Response object details:" print f" model: {response.model}" print f" finish reason: {response.choices 0 .finish reason}" print f" prompt tokens: {response.usage.prompt tokens}" print f" completion tokens: {response.usage.completion tokens}" print f" total tokens: {response.usage.total tokens}" print cost per 1k = {"gpt-3.5-turbo": 0.0005, 0.0015 , "gpt-4-turbo": 0.01, 0.03 } model = "gpt-3.5-turbo" in cost = response.usage.prompt tokens / 1000 cost per 1k model 0 out cost = response.usage.completion tokens / 1000 cost per 1k model 1 print f" Estimated cost: ${in cost + out cost:.6f}" Models Available and When to Use Each models = { "gpt-3.5-turbo": { "context": "16K tokens", "in cost": "$0.50 / 1M tokens", "out cost": "$1.50 / 1M tokens", "speed": "very fast", "best for": "Simple Q&A, classification, extraction, high-volume tasks" }, "gpt-4o-mini": { "context": "128K tokens", "in cost": "$0.15 / 1M tokens", "out cost": "$0.60 / 1M tokens", "speed": "fast", "best for": "Most tasks — best price/performance in 2024" }, "gpt-4o": { "context": "128K tokens", "in cost": "$5.00 / 1M tokens", "out cost": "$15.00 / 1M tokens", "speed": "moderate", "best for": "Complex reasoning, long documents, multimodal, code" }, "gpt-4-turbo": { "context": "128K tokens", "in cost": "$10.00 / 1M tokens", "out cost": "$30.00 / 1M tokens", "speed": "moderate", "best for": "Highest capability tasks, legacy integration" }, } print f"{'Model':<15} {'Context': 10} {'Input cost': 14} {'Output cost': 14} {'Speed': 10}" print "=" 70 for name, info in models.items : print f"{name:<15} {info 'context' : 10} {info 'in cost' : 14} " f"{info 'out cost' : 14} {info 'speed' : 10}" print print "Practical rule:" print " Default: gpt-4o-mini excellent quality, lowest cost " print " Complex reasoning: gpt-4o worth the cost " print " High volume, simple tasks: gpt-3.5-turbo cheapest " print " Check openai.com/pricing for updated costs change frequently " Streaming Responses print "Streaming: Show tokens as they are generated faster perceived response :" print stream = client.chat.completions.create model = "gpt-3.5-turbo", messages = {"role": "user", "content": "List 5 key concepts in machine learning, briefly."} , stream = True, print "Streaming output:" full response = "" for chunk in stream: delta = chunk.choices 0 .delta if delta.content: print delta.content, end="", flush=True full response += delta.content print print print "Streaming patterns:" print " - Use for chat interfaces user sees tokens appear, feels faster " print " - Collect full response by accumulating chunks" print " - Handle finish reason to detect end of stream" print " - Use try/finally to handle disconnects gracefully" System Prompts: The Most Important Tool python def create assistant role, constraints, output format=None : """Build a well-structured system prompt.""" parts = f"You are {role}." if constraints: parts.append "Rules:" for constraint in constraints: parts.append f"- {constraint}" if output format: parts.append f"Always respond in: {output format}" return "\n".join parts personas = { "Concise Technical Writer": create assistant role="a technical writer who values precision and brevity", constraints= "Never use more than 3 sentences per answer", "Always use specific technical terms", "Provide one code example when relevant" , , "Socratic Tutor": create assistant role="a Socratic tutor who teaches through questions", constraints= "Never give direct answers — only ask guiding questions", "Build on the student's own reasoning", "Acknowledge correct insights before probing further" , , "JSON Extractor": create assistant role="a data extraction assistant", constraints= "Extract only what is explicitly stated in the input", "Use null for missing values", "Never infer or guess information" , output format="valid JSON only, no explanation, no markdown" , } for name, prompt in personas.items : print f"System prompt: {name}" print f" {prompt :120 }..." print print "System prompt best practices:" best practices = "Be explicit about role, constraints, and output format", "Use bullet points for rules models follow them more reliably ", "Specify what NOT to do, not just what to do", "Include examples when the output format is complex", "Keep it concise — long system prompts dilute attention", for p in best practices: print f" • {p}" Function Calling: Connecting LLMs to External Tools print "\nFunction Calling: The Most Powerful OpenAI Feature" print print "Without function calling: LLM can only talk." print "With function calling: LLM can DO things." print tools = { "type": "function", "function": { "name": "get weather", "description": "Get current weather for a city", "parameters": { "type": "object", "properties": { "city": { "type": "string", "description": "City name" }, "unit": { "type": "string", "enum": "celsius", "fahrenheit" , "description": "Temperature unit" } }, "required": "city" } } }, { "type": "function", "function": { "name": "search database", "description": "Search company knowledge base", "parameters": { "type": "object", "properties": { "query": { "type": "string", "description": "Search query" }, "max results": { "type": "integer", "description": "Max results to return", "default": 3 } }, "required": "query" } } } def execute tool tool name, tool args : """Simulate tool execution.""" if tool name == "get weather": city = tool args.get "city", "unknown" return json.dumps {"city": city, "temperature": 28, "condition": "sunny", "unit": "celsius"} elif tool name == "search database": return json.dumps {"results": {"text": "Q3 revenue was $4.2M", "source": "Q3 Report"}, {"text": "Premium plan costs $49/month", "source": "Pricing"} } return json.dumps {"error": "unknown tool"} def run with tools user message, tools, verbose=True : """Complete tool-use loop.""" messages = {"role": "user", "content": user message} response = client.chat.completions.create model = "gpt-3.5-turbo", messages = messages, tools = tools, tool choice = "auto" msg = response.choices 0 .message if response.choices 0 .finish reason == "tool calls": messages.append {"role": "assistant", "content": None, "tool calls": tc.model dump for tc in msg.tool calls } for tool call in msg.tool calls: fn name = tool call.function.name fn args = json.loads tool call.function.arguments if verbose: print f" → Calling tool: {fn name} {fn args} " result = execute tool fn name, fn args if verbose: print f" ← Tool result: {result :80 }" messages.append { "role": "tool", "tool call id": tool call.id, "content": result } final = client.chat.completions.create model=response.model, messages=messages return final.choices 0 .message.content return msg.content test queries = "What's the weather like in Mumbai right now?", "What is our Q3 revenue?", "What is the capital of France?", print "Function calling test:" for query in test queries: print f"\nUser: {query}" answer = run with tools query, tools, verbose=True print f"Bot: {answer :120 }" Structured JSON Output print "\nStructured Output: Reliable JSON from LLMs" print response = client.chat.completions.create model = "gpt-3.5-turbo", messages = { "role": "system", "content": "Extract information from the text. Respond with valid JSON only, " "no markdown, no explanation. " "Schema: {name: string, role: string, company: string, " "skills: string , years experience: int|null}" }, { "role": "user", "content": "John Smith is a Senior ML Engineer at Anthropic. " "He has 8 years of experience and specializes in " "transformer architectures, PyTorch, and distributed training." } , temperature = 0, raw json = response.choices 0 .message.content parsed = json.loads raw json print "Input: 'John Smith is a Senior ML Engineer at Anthropic...'" print f"Extracted JSON:" print json.dumps parsed, indent=2 print print "Using response format for guaranteed JSON gpt-4-turbo and above :" print " response format={'type': 'json object'}" print " Guarantees valid JSON output — no parsing errors" print " Still need schema in the system prompt" Embeddings API print "\nOpenAI Embeddings API:" print texts = "Machine learning learns patterns from data.", "Deep learning uses layered neural networks.", "The Eiffel Tower is in Paris, France.", "Artificial intelligence mimics human thinking.", emb response = client.embeddings.create model = "text-embedding-3-small", input = texts, embeddings = item.embedding for item in emb response.data print f"Model: text-embedding-3-small" print f"Dimensions: {len embeddings 0 }" print f"Total tokens: {emb response.usage.total tokens}" print f"Texts embedded: {len embeddings }" print import numpy as np from sklearn.metrics.pairwise import cosine similarity sim matrix = cosine similarity embeddings print "Pairwise similarities:" for i in range len texts : for j in range i+1, len texts : sim = sim matrix i j print f" {sim:.3f} '{texts i :30 }...' ↔ '{texts j :30 }...'" print print "Embedding models comparison:" emb models = { "text-embedding-3-small": "1536 dims", "$0.02 / 1M tokens", "Best for most use cases" , "text-embedding-3-large": "3072 dims", "$0.13 / 1M tokens", "Higher accuracy, bigger index" , "text-embedding-ada-002": "1536 dims", "$0.10 / 1M tokens", "Legacy, use 3-small instead" , } for name, dims, cost, note in emb models.items : print f" {name:<30} {dims:<12} {cost:<22} {note}" Image Generation: DALL-E 3 print "\nImage Generation with DALL-E 3:" print image response = client.images.generate model = "dall-e-3", prompt = "A neural network visualized as a glowing network of nodes and connections, " "dark background, scientific illustration style, high quality", size = "1024x1024", quality = "standard", n = 1, image url = image response.data 0 .url revised = image response.data 0 .revised prompt print f"Generated image URL: {image url :60 }..." print f"Revised prompt: {revised :100 }..." print print "DALL-E 3 vs DALL-E 2:" dalle models = { "dall-e-3": "1024x1024 to 1792x1024", "Better quality, prompt following", "$0.040/image standard" , "dall-e-2": "256 to 1024px", "Faster, cheaper, less capable", "$0.016/image 1024px" , } for name, sizes, capability, cost in dalle models.items : print f" {name}: {sizes} | {capability} | {cost}" Speech-to-Text Whisper print "\nWhisper API: Speech to Text" print print "Whisper is OpenAI's speech recognition model." print "Supports 100+ languages, extremely accurate." print whisper example = """ import openai client = openai.OpenAI Transcribe audio file with open "audio.mp3", "rb" as audio file: transcript = client.audio.transcriptions.create model = "whisper-1", file = audio file, language = "en", optional, auto-detect if omitted response format = "text" "json", "srt", "vtt" also available print transcript Returns transcribed text Translate to English from any language with open "hindi audio.mp3", "rb" as f: translation = client.audio.translations.create model = "whisper-1", file = f print translation.text Always returns English """ print whisper example print "Cost: $0.006 per minute of audio" print "Max file size: 25MB" print "Supported formats: mp3, mp4, m4a, wav, webm, ogg" Error Handling and Retry Logic python import time from openai import RateLimitError, APIError, APIConnectionError def robust completion messages, model="gpt-3.5-turbo", max retries=3, base delay=1.0, kwargs : """Production-grade completion with retry and error handling.""" for attempt in range max retries : try: response = client.chat.completions.create model=model, messages=messages, kwargs return response.choices 0 .message.content except RateLimitError as e: if attempt == max retries - 1: raise wait = base delay 2 attempt print f"Rate limit hit. Waiting {wait:.1f}s... attempt {attempt+1} " time.sleep wait except APIConnectionError as e: if attempt == max retries - 1: raise print f"Connection error. Retrying... attempt {attempt+1} " time.sleep base delay except APIError as e: if e.status code == 500 and attempt < max retries - 1: time.sleep base delay continue raise return None print "Error handling patterns:" error guide = { "RateLimitError": "Too many requests. Implement exponential backoff.", "APIConnectionError": "Network issue. Retry with delay.", "AuthenticationError":"Invalid API key. Check OPENAI API KEY env var.", "BadRequestError": "Invalid request too long, bad format . Fix the request.", "APIError 500 ": "OpenAI server error. Retry a few times.", } for error, solution in error guide.items : print f" {error:<25}: {solution}" Cost Estimation and Monitoring class CostTracker: """Track API costs across multiple calls.""" PRICING = { "gpt-3.5-turbo": 0.0005, 0.0015 , "gpt-4o-mini": 0.00015, 0.0006 , "gpt-4o": 0.005, 0.015 , "text-embedding-3-small": 0.00002, 0 , } def init self : self.calls = self.total = 0.0 def record self, model, prompt tokens, completion tokens : if model in self.PRICING: in rate, out rate = self.PRICING model cost = prompt tokens / 1000 in rate + completion tokens / 1000 out rate else: cost = 0.0 self.calls.append { "model": model, "in tokens": prompt tokens, "out tokens": completion tokens, "cost": cost } self.total += cost return cost def summary self : print f"\nAPI Cost Summary:" print f" Total calls: {len self.calls }" print f" Total tokens: {sum c 'in tokens' +c 'out tokens' for c in self.calls :,}" print f" Total cost: ${self.total:.6f}" print f" Avg per call: ${self.total/len self.calls :.6f}" if self.calls else "" tracker = CostTracker tracker.record "gpt-3.5-turbo", 150, 80 tracker.record "gpt-4o-mini", 200, 120 tracker.record "gpt-4o-mini", 180, 90 tracker.summary Reference Links print "\nEssential OpenAI Reference Links:" print refs = { "Official Documentation": "API Reference", "platform.openai.com/docs/api-reference" , "Cookbook recipes ", "cookbook.openai.com" , "Prompt Engineering Guide", "platform.openai.com/docs/guides/prompt-engineering" , "Function Calling Guide", "platform.openai.com/docs/guides/function-calling" , "Rate Limits Guide", "platform.openai.com/docs/guides/rate-limits" , , "Models and Pricing": "Model Overview", "platform.openai.com/docs/models" , "Pricing Page", "openai.com/pricing" , "Tokenizer Tool", "platform.openai.com/tokenizer" , "Usage Dashboard", "platform.openai.com/usage" , , "Cheat Sheets and Tutorials": "OpenAI Python GitHub", "github.com/openai/openai-python" , "DeepLearning.AI ChatGPT API course", "learn.deeplearning.ai/chatgpt-prompt-eng" , "Brex Prompt Engineering", "github.com/brexhq/prompt-engineering" , "Best practices for safety", "platform.openai.com/docs/guides/safety-best-practices" , , } for category, links in refs.items : print f" {category}:" for name, url in links: print f" • {name:<40} {url}" print Try This Create openai practice.py . Part 1: basic completions. Call GPT-3.5-turbo and GPT-4o-mini with the same prompt. Compare response quality, token usage, and estimated cost. Which model gives you better value for your use case? Part 2: function calling. Define at least 3 tools weather lookup, database search, calendar check . Implement mock versions that return fake data. Test with 5 queries: some should trigger tool calls, some should not. Verify the model picks the right tool. Part 3: streaming interface. Build a simple command-line chat that streams responses character by character. Track total tokens used across the entire conversation. Print a cost estimate at the end. Part 4: embedding + search. Use text-embedding-3-small to embed 30 sentences from a domain of your choice. Given a query, find the top 3 most similar sentences. Compare results to a keyword search on the same corpus. Where does semantic search win? Where does keyword search win? What's Next The OpenAI API covers GPT and DALL-E. The next post covers the Anthropic Claude API: different design philosophy, different strengths, and specific capabilities like the system prompt hierarchy, extended thinking, and very long context windows. After that, the Phase 8 capstone.