Serving any LLM using a single command line with Flama Flama 2.0 introduces first-class support for generative AI, enabling users to download, package, and serve large language models (LLMs) via a single command line. The framework allows fetching models from HuggingFace, interacting with them locally, and serving them over HTTP with a production-ready API and built-in chat interface. Flama's CLI commands like `flama get`, `flama model run`, and `flama serve` streamline the entire workflow without requiring boilerplate code or custom infrastructure. Flama 2.0 https://dev.to/vortico/--2pll brings first-class support for generative AI: downloading, packaging, and serving large language models LLMs is now as simple as running a few commands in your terminal. No boilerplate code, no custom serving infrastructure, no configuration files. Just the CLI and a model. In this post, we walk through the entire workflow: fetching a model from HuggingFace, interacting with it locally in your terminal, and serving it over HTTP with a production-ready API and a built-in chat interface. We will also show how a locally served model can power agentic workflows, using Claude CLI as a practical example. Before we dive into the details, we recommend you to have the following resources at hand: flama get flama model run flama model stream flama serve command flama get The first step in serving an LLM with Flama is downloading and packaging a model into a .flm artifact a Flama Lightweight Model file . The flama get command handles this in a single step: it downloads the model weights and configuration from a supported source and serialises them into the portable .flm format. All examples in this post assume Flama has been installed with the LLM extras via uv https://docs.astral.sh/uv/ : uv pip install "flama llm,pydantic " Alternatively, you can run any command without a prior install by using uvx --from "flama llm,pydantic " flama ... , but for brevity we assume Flama is already installed throughout. Let us fetch a quantised version of Google's Gemma 4 model, optimised for Apple Silicon via the MLX Community: flama get --family llm --source huggingface mlx-community/gemma-4-E2B-it-qat-4bit Downloading ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 2.3 GB 28.7 MB/s 0:00:00 Packaging... Model saved to mlx-community gemma-4-E2B-it-qat-4bit.flm Two options are required: --source tells Flama where to download from currently HuggingFace , and --family declares whether the artifact is a traditional machine-learning model ml or a generative model llm . For large language models, you always pass --family llm . The output path defaults to