Configuration¶

Learn how to write config files for classify.

Config Structure¶

settings:
  model: claude-sonnet-4-5-20250929
  reasoning: true
  batch_size: 10000

input:
  file: data.csv
  columns: [title, description]
  id_column: optional_id_column

prompt:
  system: "You are an expert at categorizing content."
  template: |
    Categorize this:

    Title: {title}
    Description: {description}
output:
  fields:
    - name: category
      type: string
      description: "Content category"
      enum: ["tech", "sports", "other"]

Settings¶

`model`¶

Required

The Claude model to use. Current options:

claude-sonnet-4-5-20250929 - Best balance of speed and quality
claude-haiku-4-5-20250929 - Fastest, cheapest

`reasoning`¶

Optional - Default: false

When enabled, adds {field}_reasoning columns to output explaining each classification.

settings:
  reasoning: true  # Adds category_reasoning, confidence_reasoning, etc.

`batch_size`¶

Optional - Default: 10000

Maximum number of requests per batch. Claude Batch API has a 100,000 request limit.

settings:
  batch_size: 10000  # Good for most use cases

Input¶

`file`¶

Required

Path to your CSV file.

`columns`¶

Required

List of column names to include in the prompt. Only these columns are sent to the API.

input:
  file: data.csv
  columns: [title, description, author]  # Only these 3 columns

`id_column`¶

Optional

Use an existing column as the unique identifier instead of generating one.

input:
  file: data.csv
  columns: [title, description]
  id_column: user_id  # Must be unique

Prompt¶

`system`¶

Required

System prompt describing the task. Be specific about what you want classified.

prompt:
  system: "You are an expert at categorizing content. Classify each item into one of the allowed categories."

`template`¶

Required

Template for formatting each row. Use {column_name} placeholders.

prompt:
  template: |
    Categorize this content:

    Title: {title}
    Description: {description}
    Author: {author}

Output¶

`fields`¶

Required - List of output fields to generate.

Each field needs:

Property	Required	Description
`name`	Yes	Field name (lowercase, no spaces)
`type`	Yes	`string`, `integer`, `number`, `boolean`
`description`	Yes	What the field represents
`enum`	No	Allowed values (for strings only)

output:
  fields:
    - name: sentiment
      type: string
      description: "Overall sentiment of the content"
      enum: ["positive", "negative", "neutral"]

    - name: score
      type: integer
      description: "Confidence score from 1-10"

    - name: urgency
      type: number
      description: "Urgency level from 0.0 to 1.0"

    - name: is_flagged
      type: boolean
      description: "Whether content should be flagged for review"

Complete Example¶

settings:
  model: claude-sonnet-4-5-20250929
  reasoning: true
  batch_size: 10000

input:
  file: reviews.csv
  columns: [product_name, review_text, rating]

prompt:
  system: "You are a sentiment analysis expert. Analyze product reviews and classify sentiment, extract key themes, and assess review quality."

  template: |
    Analyze this product review:

    Product: {product_name}
    Rating: {rating}/5
    Review: {review_text}

    Provide sentiment analysis and key themes.

output:
  fields:
    - name: sentiment
      type: string
      description: "Overall sentiment: positive, negative, or neutral"
      enum: ["positive", "negative", "neutral"]

    - name: themes
      type: string
      description: "Comma-separated list of key themes mentioned"

    - name: quality_score
      type: integer
      description: "Review quality score from 1-5 based on detail and usefulness"

Validation¶

Always validate your config before running:

classify check my_config.yaml

This will: - Check CSV file exists and is readable - Verify all referenced columns exist - Validate output schema - Calculate estimated costs - Show sample rows that will be processed

Configuration¶

Config Structure¶

Settings¶

model¶

reasoning¶

batch_size¶

Input¶

file¶

columns¶

id_column¶

Prompt¶

system¶

template¶

Output¶

fields¶

Complete Example¶

Validation¶

`model`¶

`reasoning`¶

`batch_size`¶

`file`¶

`columns`¶

`id_column`¶

`system`¶

`template`¶

`fields`¶