Chunk-Split
Split large documents into RAG-ready chunks.
Build better retrieval systems with properly sized chunks.
WHAT IT DOES:
chunk-split document.md --max-tokens 500 --out chunks/
Takes a large markdown file and splits it into smaller pieces:
→ Respects heading boundaries
→ Doesn't cut mid-sentence
→ Maintains context
OUTPUT:
chunks/
chunk_001.md (487 tokens)
chunk_002.md (512 tokens)
chunk_003.md (498 tokens)
...
PERFECT FOR:
→ RAG pipelines
→ Vector database ingestion
→ Embedding preparation
→ Context window management
OPTIONS:
--max-tokens 500 (chunk size)
--overlap 50 (token overlap between chunks)
--by-heading (split at headings only)
✅ Token-aware splitting (uses tiktoken)
✅ Preserves markdown formatting
✅ No cloud, processes locally
Chunk smarter, retrieve better.
A CLI tool, README + example chunks to split large markdown into clean RAG-ready pieces with token-awareness.