A
84.0Overall score
A high-throughput batch classifier that runs an open model under vLLM to label huge datasets without per-call API fees. For data teams that need sentiment, topic or moderation labels across millions of rows on their own GPUs.
84.0Score
720Votes
5Components
Install this build
terminal
pip install vllm && python classify.pyComponents
Model
- Qwen3 8B Instruct
- Gemma 3 12B for harder labels
Stack
- vLLM offline batching
- Polars
- Pydantic output parsing
Hardware
- 1x RTX 4090 24GB
- Scales linearly with more GPUs
How it works
- Load the dataset and build prompt templates per row
- vLLM processes thousands of prompts per batch
- Constrain output to a fixed JSON label schema
- Write labeled results back to parquet
Rules
- Use guided decoding so labels stay in the allowed set
- Checkpoint progress so a crash never reruns everything
Summary
A high-throughput batch classifier that runs an open model under vLLM to label huge datasets without per-call API fees. For data teams that need sentiment, topic or moderation labels across millions of rows on their own GPUs.
84.0 score 720 votes
0 Reviews
Your rating
Sign in to post
Loading discussion...