cd/entity/Llama· home› entities› Llama

grep -l @llama /news/*.json | wc -l → 101

Llama

mentions 101 type Organization page 5/6 feed RSS

sameAs · en.wikipedia.org · www.wikidata.org

// recent coverage 101 mentions

21:20

2026-05-30

dev.to

large-language-models

Open source vs closed AI: real-world tradeoffs

An engineer spent three days swapping GPT-4o for Llama 3.3 70B in a production workflow after API latency reached 4.2 seconds per call, only to encounter flaky structured JSON output and hallucinated …

11:55

2026-05-30

dev.to

ai-agents

Hermes Agent vs. The Cloud: A Developer's Guide to Local AI Agents

Hermes Agent, an open-source agentic framework, enables developers to run AI agents entirely on local infrastructure without cloud dependencies. The system supports a Reasoning + Acting cycle for mult…

04:06

2026-05-30

dev.to

artificial-intelligence

The Agent Is Easy. The Loop Is the Job. — A Developer's No-BS Guide to AI Engineering in 2026

A developer has defined AI engineering as a distinct discipline focused on building production applications using pre-trained models, contrasting it with ML engineering which involves training and opt…

18:42

2026-05-29

gadgetreview.com

large-language-models

AI Chatbots Fail Medical Questions One in Five Times, Study Reveals

A new Penn State study found that AI chatbots provide inaccurate medical information roughly one in five times, with failure rates reaching 50% for some systems. Nine board-certified physicians evalua…

20:37

2026-05-28

letsdatascience.com

generative-ai

Amnesty International Exposes Unlawful Data Pipelines Powering Generative AI

Amnesty International published a briefing on 28 May 2026 documenting that large-scale web scraping and data pipelines collect online material without explicit consent to train standalone generative A…

09:45

2026-05-28

the-decoder.com

artificial-intelligence

Meta One: Zuckerberg finally puts a price tag on all that AI spending

Meta is launching paid add-ons for Instagram, Facebook, and WhatsApp, with prices ranging from $2.99 to $19.99 per month, as part of a strategy to reduce reliance on ad revenue and justify its massive…

06:03

2026-05-28

dev.to

artificial-intelligence

Beginner’s AI Glossary

A developer has published a glossary defining over 25 key AI terms, from Large Language Models (LLMs) and Agentic AI to parameters and synthetic data. The guide breaks down common acronyms and concept…

15:57

2026-05-27

github.com

large-language-models

Show HN: Biopetals – Run biology tuned Llama, BitTorrent-style

A developer created Biopetals, a modified version of the Petals library that enables distributed, BitTorrent-style inference for biology-tuned Llama models across a network of computers. The project w…

23:00

2026-05-26

dev.to

artificial-intelligence

Is Claude API Worth $3/1M Tokens Over Self-Hosted Llama?

A developer compared the costs of Claude Sonnet 4.6 API at $3.00 per million input tokens against a self-hosted Llama 3.2 90B instance on a $20/month DigitalOcean GPU Droplet. The analysis found that …

18:17

2026-05-25

quickfix.tools

ai-tools

Show HN: API cost calculator – compare 28 models across 7 providers, no signup

A developer launched an AI API pricing calculator that compares 28 models across 7 providers without requiring signup. The tool estimates monthly costs based on usage patterns, including batch process…

06:28

2026-05-25

signal-memo.com

artificial-intelligence

Memo: Brown Just Quietly Proved That Your Prompt Is Lying to You. Project Deal Already Proved It in a Different Domain. Nobody Has Connected Them.

A Brown University study of 110 therapy sessions found that improved prompting did not resolve core ethical violations in LLMs acting as mental health counselors, a result replicated two months later …

04:00

2026-05-25

arxiv.org

large-language-models

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

A new study of small language models reveals that chain-of-thought prompting for arithmetic relies on a positional shortcut: the model copies whichever number appears last before the answer delimiter,…

05:13

2026-05-24

dev.to

large-language-models

Diffusion Language Models Are Here: Deep Dive into NVIDIA's Nemotron-Labs DLM Architecture

NVIDIA has open-sourced the Nemotron-Labs Diffusion family of language models (3B, 8B, and 14B parameters), which replace traditional left-to-right autoregressive generation with a parallel denoising …

13:35

2026-05-23

dev.to

open-source

How I Built a Free, Self-Hosted Pipeline That Auto-Generates Faceless YouTube Shorts

The article describes FreeFaceless, an open-source, self-hosted pipeline that automatically generates faceless YouTube Shorts using free tools and local models, avoiding the typical $75-100/month subs…

04:36

2026-05-23

dev.to

artificial-intelligence

From the Renaissance to the Quantum Dawn: AI, Computation, and the Next Paradigm Shift

The article argues that the current AI revolution mirrors the Renaissance by democratizing creativity and cognition, but it faces a critical bottleneck: an exponential demand for computational power t…

13:47

2026-05-20

dev.to

artificial-intelligence

Gemini 3.5 Flash vs Claude Haiku vs GPT-4o mini: Picking a Small Model

The article compares three small, fast LLMs—Gemini 3.5 Flash, Claude Haiku, and GPT-4o mini—for routine tasks like classification and code routing, emphasizing that cheap, consistent models are prefer…

04:45

2026-05-20

dev.to

artificial-intelligence

I built persistent AI memory for Claude on Cloudflare's free tier

"second-brain-cloudflare," a self-hosted MCP server that provides persistent memory for AI assistants like Claude and ChatGPT across sessions, running entirely on Cloudflare's free tier. It uses vecto…

23:15

2026-05-19

dev.to

artificial-intelligence

Eu quero Vibe: Codar! Mas a IA local me fez repensar a infraestrutura

Running local AI for software development is not a cost-free solution, as it simply shifts the expense from cloud subscriptions to expensive hardware upgrades, requiring at least 32GB to 64GB of RAM f…

18:52

2026-05-18

dev.to

artificial-intelligence

What a Fractional CTO Actually Does for AI Startups: Architecture and Timing

A fractional CTO for AI startups should focus on business constraints and preserving flexibility rather than designing perfect technical architectures. The key decisions involve managing inference cos…

15:30

2026-05-18

pytorch.org

machine-learning

Running PyTorch Models on Apple Silicon GPUs with the ExecuTorch MLX Delegate

The ExecuTorch MLX delegate now enables GPU-accelerated inference for PyTorch models on Apple Silicon Macs through Apple's MLX framework. The new backend achieves 3-6x higher throughput on generative …

← prev page 5 / 6 next →

// co-occurs with top 8 entities

Meta 25 Claude 23 Qwen 22 OpenAI 15 DeepSeek 14 Gemini 14 Anthropic 12 Google 11