Skip to content

classify

Classify thousands of CSV rows with Claude's Batch API

Pay 50% less, get results in ~1 hour, no rate limits

Get Started GitHub

Overview

Stop writing loops to classify data. classify turns CSV classification into a single command, handles batching automatically, and gives you prompt caching for free.

graph LR
    A["Your CSV<br/>(10,000 rows)"] --> B[classify]
    B --> C["Claude's Batch API<br/>- 50% discount<br/>- Prompt caching<br/>- No rate limits"]
    C --> D["Classified CSV"]

Features

  • Automatic Batching


    Point at your CSV, get classified data back. No manual batch management needed.

  • Structured Outputs


    Define your schema, get valid JSON every time with Pydantic validation.

  • Prompt Caching


    System prompt cached across all rows for 90% cost reduction on cache hits.

  • 50% Batch Discount


    Automatically applied to all tokens. Pay half the price of regular API calls.

  • Cost Estimation


    See exact costs before submitting your batch job.

  • Reasoning Support


    Get explanations for each classification to improve accuracy and debugging.

Installation

Requires Python 3.12+

# Install as an isolated tool (recommended)
uv tool install git+https://github.com/alfranz/classify.git

# Or run without installing
uvx --from git+https://github.com/alfranz/classify.git classify --help

Set your API key:

export ANTHROPIC_API_KEY=your_api_key_here

Quick Start

Try the included example:

# Check the example and see cost estimate
classify check examples/example_config.yaml

# Submit the batch (costs ~$0.02)
classify run examples/example_config.yaml

# Check status (processing takes ~30-60 minutes)
classify status <batch_id>

# Download and merge results when done
classify pull <batch_id>

Why Batch API?

You have a CSV with 10,000 rows. Each needs classification. You could:

Approach Cost Time Rate Limits
Loop through rows Full price ~3 hours Hit limits
Batch API 50% less ~1 hour None

How It Works

Your CSV (10,000 rows)
    [classify]
    Claude's Batch API
    - 50% discount on all tokens
    - Prompt caching (90% cheaper cache hits)
    - No rate limits
    - ~1 hour processing
    Classified CSV

Cost example (10,000 rows): - First request: Write cache (~\(0.20) - Other 9,999 requests: Read cache (~\)0.02) + input tokens (~\(5) + output tokens (~\)3) - Total: ~\(8.22 instead of ~\)80+ without batching/caching

Next Steps


Documentation automatically deployed via GitHub Actions