LAI #130: That Cheap AI API Is Probably Stealing From You

wpnews.pro

Good morning, AI enthusiasts!

There are services offering GPT and Claude API access at 90% off. Researchers tested 400 of them; one drained crypto from a wallet, others injected malicious code or grabbed cloud credentials. This week, I cover how these proxies actually make their money and why the risk is completely different when you’re routing a coding agent through them instead of just chatting. I also cover how to stop treating ChatGPT like a one-off assistant and start building repeatable workflows around your actual work.

Inside the issue:

Let’s get into it!

This week, in What’s AI, I dive into a too good to be true offer I found. There are several ultra-cheap API proxy stations where you can get the GPT or Claude API 90% cheaper. But when researchers tested 400 of these dirt-cheap AI API services, they found that one of them quietly drained crypto from a wallet. Others were injecting malicious code, or reaching for cloud credentials they were never given. In this article, I explain how these proxies make their money, why the model you pay for might not be the model you get, and why dropping one in front of a coding agent is a completely different level of risk than asking a chatbot a random question. Read the full article here or watch the full video on YouTube.

If you ask ChatGPT to rewrite emails, summarize documents, brainstorm ideas, or make something sound more professional, you are only scratching the surface. That is useful, but it is still only 1% of what ChatGPT can do. Instead of starting from scratch every single time, use Projects to keep your context, files, examples, and instructions in one place. That way, you do not need to explain your work again every time you open a new chat.

Here’s how you can start getting better at AI today: pick one task you do every week, like creating a report, preparing for a meeting, summarizing customer feedback, or planning your priorities. Build a repeatable workflow around it. You can even use ChatGPT Tasks to run recurring prompts, like preparing a weekly briefing or reminding you to review key updates.

That is how you can start using AI in your actual work.

If you want more practical tips on how to use AI at work, and not just better prompts, check out our Master AI for Work Course. — Louis-François Bouchard, Towards AI Co-founder & Head of Community

Exquisite_peacock_20933 just released Liodon AI SLM-10M, a 9.97M parameter causal language model trained from scratch. While it is not suitable for open-ended generation, it supports multiple-choice QA, log-likelihood ranking, SLM research, and perplexity evaluation. It was trained on 25B tokens and supports a context length of 1,024 tokens. Check it out on HuggingFace and support a fellow community member. If you have questions or feedback about the model, share them in the thread!

We’re exploring a paid Towards AI membership for people learning AI, becoming AI engineers, or already building AI systems. And we want to know what would actually help you enough to use it every month.

Most of you are leaning towards career outcomes: internships on real projects, jobs, gigs, referrals, career help, and portfolio coaching.

For those who picked internships or jobs: are you actively looking right now, or do you want that option to exist for when you’re ready? That changes what we build first. Let us know in the thread! The Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too — we share cool opportunities every week!

Lucazsh is building a social media app for movies and is looking for someone who can help with frontend and app design. If this sounds like your domain, reach out to them in the thread!
Mrlucasrib is deeply studying a book on deep learning and needs a partner to discuss ideas in the book and study together. If you want to get into deep learning, connect with them in the thread!
Vishacoplayz_27974 is recruiting founding board members for AIXelerate, a student-led AI nonprofit. If you’re a high school student interested in leadership, AI, marketing, operations, outreach, or event planning, contact them in the thread!

Meme shared by drdub_

Build Your Own Claude Code Using LangChain: A Deep Dive Into LangChain’s Deep Agents By Sreejith Sreejayan

The article traces Claude Code’s architecture and rebuilds each piece using LangChain’s deepagents library. The framework centers on a bare agent loop in which the model either calls tools or returns text, naturally scaling from one-turn answers to multi-step refactors. Around that loop, the harness adds planning via to-do lists, filesystem-backed context management, subagent delegation, OS-level sandboxing for safety, and LangGraph checkpointing for persistence. The full working agent assembles in under a hundred lines.

Version-Controlling Your Agents: Deployment, Rollback, and Safe Promotion Patterns By MongoDB

Code reviews do not catch how production agents break, and this piece makes a direct case for treating agent configuration with the same discipline applied to software releases. It lays out three failure modes that arise when versioning is absent: live changes without isolation, manual rollback from memory, and silent degradation without an audit trail. It also proposes fixes, such as immutable config snapshots, staged promotion through canary environments, automated release gates, and pinning LLM model versions to prevent silent behavioral drift between provider updates.

Hosting LLM-Generated Dashboards: A Governed Snowflake Architecture By Mkrishnamallik

Governing LLM-generated dashboards inside Snowflake demands more than a smart chat connector. The article proposes a three-file contract separating authoring from hosting: an LLM builds the HTML, a thin Streamlit-in-Snowflake shell wraps it with RBAC, a semantic view enforces verified metric definitions, and every chat turn lands in an audit log. CI deploys per-PR previews with a manual prod gate. The architecture treats the semantic layer as the durable unit of trust, not the dashboard itself, which the author argues is now effectively a throwaway artifact.

I Can Compress 1000 Dimensions Into 2 — Here’s What PCA Taught Me By Anas Razy

PCA cuts through the curse of dimensionality by rotating the coordinate axes to maximize the data’s spread, then projecting everything onto those best-fit directions. The author builds the full intuition from scratch, covering covariance matrices, eigenvectors, eigenvalues, and projection math before implementing a 3D-to-2D reduction in Python using NumPy and Scikit-learn. The piece also explains why libraries prefer SVD over direct eigendecomposition and points to MNIST as a classic test case for visualizing high-dimensional data.

Optimizing Local LLM Inference on Constrained Hardware By Abhinandan Malhotra

Running on a 6GB RTX 3050, the author bypassed Ollama’s Go-based wrapper and ran llama.cpp directly, doubling token-generation throughput on an 8B-parameter model. Benchmarks across three models and three prompt scenarios quantified the abstraction tax: wrappers conservatively spill KV cache to system RAM as context grows, tanking performance across the PCIe bus. Key tuning levers included matching CPU threads to physical core count, using symmetric KV cache quantization, maximizing GPU layer off, and increasing micro-batch size to accelerate prefill-heavy RAG pipelines.

WebSockets at Scale: What Nobody Tells You About Managing Millions of Connections By Rizwanhoda

WebSocket connections drain file descriptors, memory, and routing logic in ways most tutorials never address. This piece walks through seven production failure points: OS file descriptor caps that limit connections to 1,024 by default, per-connection memory overhead that scales brutally, cross-server message routing solved via Redis Pub/Sub, the eventually-consistent presence problem, thundering herd reconnection bugs fixed with jittered exponential backoff, sticky session requirements for load balancers, and the monitoring gap WebSockets create. The final architecture stitches all fixes into a predictable, debuggable production system.

If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards. LAI #130: That Cheap AI API Is Probably Stealing From You was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

source & further reading

pub.towardsai.net — original article Why Kubernetes Exists: From a Python Script to Production Orchestration Rewriting Business Rules: Artificial Intelligence in Legal Tech and Compliance DeepSeek-V4-Flash: the $0.28 Model that Just Embarrassed the AI Industry’s Pricing

LAI #130: That Cheap AI API Is Probably Stealing From You

Run your AI side-project on zahid.host