A
Aider logoAiderClassify millions of rows cheaply offline

vLLM Bulk Text Classifier

setuproll@setuproll
84.0Overall score

A high-throughput batch classifier that runs an open model under vLLM to label huge datasets without per-call API fees. For data teams that need sentiment, topic or moderation labels across millions of rows on their own GPUs.

84.0Score
720Votes
5Components

Install this build

Export
terminal
pip install vllm && python classify.py

Components

Model

  • Qwen3 8B Instruct
  • Gemma 3 12B for harder labels

Stack

  • vLLM offline batching
  • Polars
  • Pydantic output parsing

Hardware

  • 1x RTX 4090 24GB
  • Scales linearly with more GPUs

How it works

  • Load the dataset and build prompt templates per row
  • vLLM processes thousands of prompts per batch
  • Constrain output to a fixed JSON label schema
  • Write labeled results back to parquet

Rules

  • Use guided decoding so labels stay in the allowed set
  • Checkpoint progress so a crash never reruns everything

Summary

A high-throughput batch classifier that runs an open model under vLLM to label huge datasets without per-call API fees. For data teams that need sentiment, topic or moderation labels across millions of rows on their own GPUs.

84.0 score 720 votes

0 Reviews

Your rating
Sign in to post

Loading discussion...