# Extract Plain Text from Medium Posts for RAG and Search Indexes

> Source: <https://dev.to/zenndraapi/extract-plain-text-from-medium-posts-for-rag-and-search-indexes-14mm>
> Published: 2026-05-30 09:15:17+00:00

Chunk clean article content for embeddings, summarization, and full-text search—skip nav, clap bars, and scripts.

**HTML embeds** are for humans; **plain text** is for chunking, embeddings, and summarization. One call should return body text without nav, clap bars, or script tags.

Tool outcome:`ingest-medium-article.ts`

→ chunked documents in your vector DB.

`GET /article/{id}/content`

→ plain text.`GET /article/{id}`

for title, tags, author metadata.

``` js
const API = 'https://api.zenndra.com';
const headers = { Authorization: `Bearer ${process.env.ZENNDRA_API_KEY}` };

export async function fetchArticleText(articleId) {
  const [contentRes, metaRes] = await Promise.all([
    fetch(`${API}/article/${articleId}/content`, { headers }),
    fetch(`${API}/article/${articleId}`, { headers }),
  ]);

  const { content } = await contentRes.json();
  const meta = await metaRes.json();

  return {
    id: articleId,
    title: meta.title,
    tags: meta.tags,
    text: content,
  };
}

export function chunkText(text, { size = 800, overlap = 100 } = {}) {
  const words = text.split(/\s+/);
  const chunks = [];
  for (let i = 0; i < words.length; i += size - overlap) {
    chunks.push(words.slice(i, i + size).join(' '));
  }
  return chunks.filter(Boolean);
}
```

Wire `chunkText`

to [OpenAI embeddings](https://platform.openai.com/docs/guides/embeddings), [Ollama](https://ollama.com/), or your host’s model—swap the vector client, keep the ingest shape.

`article_id`

and `chunk_index`

in metadata for citations.For human-readable syndication, see [embed articles](https://./embed-medium-articles-on-website.md)—different threat model than LLM training.

`medium plain text api`

, `medium rag pipeline`

, `medium embeddings`

, `medium article content extraction`

, `llm medium`

.