Back to app
Raw OpenAPI specification (YAML - inlined for the scrapers that need it)
openapi: 3.1.0
info:
  title: Riveter API
  description: |
    ## Overview

    The Riveter API lets you **build datasets** and **run enrichments** programmatically.

    - **A dataset** is a collection of rows — companies, people, URLs, or anything else you want to work with. You can build one from a natural-language prompt or a structured spec, and Riveter will generate the rows for you.
    - **An enrichment** takes rows of input data and fills in new columns using AI, web scraping, and other tools. For example, given a list of companies, an enrichment can look up each company's revenue, employee count, and CEO.

    ## Four ways to use the API

    ### 1. Build a new dataset and enrich it (easiest)

    Describe a dataset and Riveter will generate the dataset and enrich it in one step. Just provide a prompt (e.g. "top 50 SaaS companies"), the attributes you want to find (e.g. "CEO", "revenue"), and set `auto_run_enrichment: true`.

    This is the fastest way to go from idea to enriched data — no setup required.

    **Key endpoints:**
    - [/build_dataset](#tag/dataset-builder/post/build_dataset) — generate rows and auto-run enrichment with `auto_run_enrichment: true`
    - [/run_status](#tag/runs/get/run_status) — check progress (or use `webhook_url`)
    - [/run_data](#tag/runs/get/run_data) — get the enriched results

    ### 2. Run a new enrichment with your own data

    Already have input data? Define everything in a single API request — no UI setup required.

    **Option A: Prompt + attributes (recommended)** — provide your input data, a natural-language prompt, and output column names. The AI generates the full configuration automatically.

    **Option B: Full specification** — define exact prompts, tools, and formatting for each output column when you need precise control.

    **Key endpoints:**
    - [/run_new_enrichment](#tag/enrichments/post/run_new_enrichment) — execute with prompt + attributes or full configuration
    - [/run_status](#tag/runs/get/run_status) — check progress (or use `webhook_url`)
    - [/run_data](#tag/runs/get/run_data) — get the enriched results

    ### 3. Run an existing enrichment

    Build and test your enrichment in the [Riveter UI](https://app.riveterhq.com/enrichments), then run it via the API by passing new input data. The enrichment already stores your prompts, tool settings, and output format — you just supply new rows.

    This is ideal when you've fine-tuned an enrichment in the UI and want to deploy it to production.

    **Key endpoints:**
    - [/run_enrichment](#tag/enrichments/post/run_enrichment) — execute your enrichment with input data
    - [/run_status](#tag/runs/get/run_status) — check progress (or use `webhook_url`)
    - [/run_data](#tag/runs/get/run_data) — get the enriched results

    ### 4. Build a dataset for an existing enrichment

    Have an enrichment but need new input data? Use `/build_dataset_from_enrichment` to generate rows that match your enrichment's expected input columns. The dataset builder derives identifiers from your enrichment's source-data columns automatically.

    Optionally set `auto_run_enrichment: true` to run the enrichment automatically when the dataset completes.

    **Key endpoints:**
    - [/build_dataset_from_enrichment](#tag/dataset-builder/post/build_dataset_from_enrichment) — generate rows matching your enrichment's structure
    - [/dataset_build_status](#tag/dataset-builder/get/dataset_build_status) — check progress (or use `dataset_webhook_url`)
    - [/run_enrichment_from_dataset](#tag/dataset-builder/post/run_enrichment_from_dataset) — run the enrichment on the generated data

    ## Webhooks

    Instead of polling, pass a `webhook_url` in the JSON body when starting a run and we'll POST the results to your URL when it finishes.

    1. Include `webhook_url` in the JSON body of your `/run_new_enrichment` or `/run_enrichment` request
    2. Your run executes normally
    3. When complete, we POST the full results (same format as `/run_data`) to your webhook URL

    ```bash
    curl -X POST "https://api.riveterhq.com/v1/run_enrichment?enrichment_uuid=xxx" \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "webhook_url": "https://your-server.com/webhook",
        "input": {"Company": ["Apple", "Google"]}
      }'
    ```

    **Webhook payload:**
    ```json
    {
      "event": "run.completed",
      "run_key": "abc-123",
      "status": "success",
      "enrichment_uuid": "...",
      "enrichment_name": "My Enrichment",
      "credits_used": 2.0,
      "completed_at": "2024-01-15T12:00:00Z",
      "formatted_data": {
        "Company": [{"value": "Apple"}, {"value": "Google"}],
        "Revenue": [{"value": "383000000000"}, {"value": "307000000000"}]
      }
    }
    ```

    **Events:** `run.completed` (success), `run.stopped` (manually stopped)

    **Retries:** Failed deliveries are retried up to 2 times. Your endpoint should return a 2xx status code.

    Dataset builds also support webhooks — pass `dataset_webhook_url` to `/build_dataset`.

    ## Authentication
    All endpoints require an API key via the Authorization header:
    ```
    Authorization: Bearer YOUR_API_KEY
    ```
    [Get an API key here](https://app.riveterhq.com/settings/api)

    ## Rate limiting
    Default: 30 requests per minute. You can send up to 1,000 rows per request.

    ## Response format
    All responses include a `request_status` field (`success` or `error`).

    ## MCP server

    Use Riveter from Claude, Cursor, or any MCP-compatible AI assistant. [Get an API key](https://app.riveterhq.com/settings/api), then:

    **Claude Code:**
    ```bash
    claude mcp add riveter -- npx -y riveter-mcp-server --env RIVETER_API_KEY=YOUR_API_KEY
    ```

    **Codex:**
    ```bash
    codex mcp add riveter --env RIVETER_API_KEY=YOUR_API_KEY -- npx -y riveter-mcp-server
    ```

    **Cursor / Windsurf / Claude Desktop** — paste into your MCP config:
    ```json
    {
      "mcpServers": {
        "riveter": {
          "command": "npx",
          "args": ["-y", "riveter-mcp-server"],
          "env": {
            "RIVETER_API_KEY": "YOUR_API_KEY"
          }
        }
      }
    }
    ```

    The MCP server dynamically exposes all API endpoints as tools, with full descriptions and typed parameters. No setup beyond the API key.
  version: 1.0.4
  contact:
    name: Riveter Support
    url: https://riveterhq.com
    email: [email protected]
servers:
  - url: https://api.riveterhq.com/v1
    description: Production server

security:
  - ApiKeyAuth: []

paths:
  /run_enrichment:
    post:
      summary: run_enrichment
      description: |
        Run an existing enrichment with input data. The enrichment must be API-enabled (you can turn this on from your enrichment view).

        **Recommended:** Pass a `webhook_url` in the JSON body to receive results when the run completes — this is more efficient than polling. If you must poll, use `/run_status` with the returned `run_key` (suggested interval: 5–10 seconds). Grab data with `/run_data`.

        ## Quick Example

        ```bash
          curl -X POST "https://api.riveterhq.com/v1/run_enrichment?enrichment_uuid=YOUR_ENRICHMENT_UUID" \
          -H "Authorization: Bearer YOUR_API_KEY" \
          -H "Content-Type: application/json" \
          -d '{"input": {"Company Name": ["Acme Corp", "Tech Solutions Inc"]}}'
        ```

        The enrichment UUID comes from the enrichment's URL (ex: app.riveterhq.com/enrichments/YOUR_ENRICHMENT_UUID)

        ## After running
        Use the [/run_data](#tag/runs/get/run_data) endpoint to get results as they become available, or pass `webhook_url` in the body to receive them automatically.

        ## Input Format
        Input data should be a JSON object where:
        - Keys are column headers from your enrichment's sheet.
        - Values are arrays of strings (all arrays must be the same length)
        - Only source data columns are required (columns marked as "source data" in your enrichment)
        - Maximum 1000 rows per request

        Optionally pass in a run_key (if not, one will be returned). Use this to later poll for the [status](#tag/runs/get/run_status) of your enrichment run, or grab the final [data](#tag/runs/get/run_data)
      operationId: runExistingEnrichment
      tags:
        - Enrichments
      parameters:
        - name: enrichment_uuid
          in: query
          required: true
          description: UUID of the enrichment to run (project_uuid is also accepted)
          schema:
            type: string
            format: uuid
        - name: run_key
          in: query
          required: false
          description: Custom identifier for this run (optional, will be generated if not provided)
          schema:
            type: string
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/RunExistingEnrichmentRequest"
            examples:
              basic_enrichment:
                summary: Basic company enrichment
                value:
                  input:
                    "Company Name": ["Acme Corp", "Tech Solutions Inc"]
                    "Website": ["acme.com", "techsolutions.com"]
              with_webhook:
                summary: With webhook delivery
                value:
                  input:
                    "Company Name": ["Acme Corp", "Tech Solutions Inc"]
                    "Website": ["acme.com", "techsolutions.com"]
                  webhook_url: "https://your-server.com/webhook"
      responses:
        "200":
          description: Enrichment run initiated successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/EnrichmentRunResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"
        "409":
          $ref: "#/components/responses/Conflict"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /enrichment:
    get:
      summary: get_enrichment
      description: |
        Retrieve the structure of an existing enrichment: input column names and the full output specification
        (prompts, contexts, tools, formats, etc.) keyed by column header. Use this to inspect configuration before
        updating it with PATCH, or to round-trip config in your automation.

        Does not return row data — only the enrichment configuration.
      operationId: getEnrichment
      tags:
        - Enrichments
      parameters:
        - name: enrichment_uuid
          in: query
          required: true
          description: UUID of the enrichment (project_uuid is also accepted)
          schema:
            type: string
            format: uuid
      responses:
        "200":
          description: Enrichment structure retrieved successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/EnrichmentStructureResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"
    patch:
      summary: update_enrichment
      description: |
        Partially update an existing enrichment's output columns in place. The same `enrichment_uuid` is preserved.

        The request body is keyed by **column header** (output column display name).

        - **Update** an existing output column: include only the keys you want to change. When a key is present, its value **fully replaces** the previous value for that key (e.g. sending `contexts` replaces all contexts).
        - **Add** a new output column: use a header name that does not exist yet and include the full column configuration (same required fields as [/run_new_enrichment](#tag/enrichments/post/run_new_enrichment) output columns — `prompt` and `contexts` for agent mode, or `tool` plus its parameters for tool-only mode).
        - **Delete** an output column: send `{ "delete": true }` for that column header. You cannot delete input columns or the last remaining output column.

        Supported keys per column match the full output specification from [/run_new_enrichment](#tag/enrichments/post/run_new_enrichment):
        `prompt`, `contexts`, `tools`, `format`, `format_details`, `run_when`, `run_when_config`, and
        tool-only fields (`tool` plus its parameters). Expand **CEO** or **Industry** in the request schema for the full field list.

        You may send the column map at the top level of the body, or wrap it in an `output` property (see below).

        This endpoint updates configuration only — it does not run the enrichment. Use [/run_enrichment](#tag/enrichments/post/run_enrichment) afterward.

        **Update one column's prompt and contexts:**
        ```json
        {
          "CEO": {
            "prompt": "Find the current CEO of this company using recent news and filings",
            "contexts": ["Company Name", "Website"]
          }
        }
        ```

        **Update another column's format and tools:**
        ```json
        {
          "Industry": {
            "format": "tag",
            "format_details": {
              "options": ["SaaS", "Fintech", "Healthcare", "Other"],
              "allow_multiple": false
            },
            "tools": ["web_search", "scrape"]
          }
        }
        ```

        **Update multiple columns in one request:**
        ```json
        {
          "CEO": {
            "prompt": "Find the current CEO of this company using recent news and filings",
            "contexts": ["Company Name", "Website"]
          },
          "Industry": {
            "prompt": "What industry is this company in?",
            "contexts": ["Company Name"],
            "format": "tag",
            "format_details": {
              "options": ["SaaS", "Fintech", "Healthcare", "Other"]
            },
            "tools": ["web_search", "scrape"]
          }
        }
        ```

        **Same payloads with an `output` wrapper** (optional):
        ```json
        {
          "output": {
            "CEO": {
              "prompt": "Find the current CEO of this company using recent news and filings",
              "contexts": ["Company Name", "Website"]
            },
            "Industry": {
              "format": "tag",
              "format_details": {
                "options": ["SaaS", "Fintech", "Healthcare", "Other"]
              },
              "tools": ["web_search", "scrape"]
            }
          }
        }
        ```

        **Add a new output column:**
        ```json
        {
          "Annual Revenue": {
            "prompt": "Find the latest annual revenue for this company",
            "contexts": ["Company Name", "Website"]
          }
        }
        ```

        **Delete an output column:**
        ```json
        {
          "CEO": {
            "delete": true
          }
        }
        ```
      operationId: updateEnrichment
      tags:
        - Enrichments
      parameters:
        - name: enrichment_uuid
          in: query
          required: true
          description: UUID of the enrichment to update (project_uuid is also accepted)
          schema:
            type: string
            format: uuid
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/UpdateEnrichmentRequestBody"
            example:
              "CEO":
                prompt: "Find the current CEO of this company using recent news and filings"
                contexts: ["Company Name", "Website"]
              "Industry":
                prompt: "What industry is this company in?"
                contexts: ["Company Name"]
                format: "tag"
                format_details:
                  options: ["SaaS", "Fintech", "Healthcare", "Other"]
                tools: ["web_search", "scrape"]
      responses:
        "200":
          description: Enrichment updated successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/EnrichmentStructureResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /run_new_enrichment:
    post:
      summary: run_new_enrichment
      description: |
        Create and run a new enrichment in a single request. There are two ways to define your output columns:

        ### Option 1: Prompt + attributes (recommended)
        Provide a natural-language `prompt` describing what you want and an `attributes` array listing the output column names. The AI will automatically generate the full output configuration (prompts, contexts, tools, formats) for each attribute. This is the easiest option and is often the best choice — just describe what you need and let the AI handle the rest.

        ### Option 2: Full output specification (not recommended)
        Define the exact structure of each output column (prompts, contexts, tools, formats). Use this when you need precise control over how each column is enriched.

        You must provide **either** both `prompt` and `attributes` **or** `output` — not both, not neither.

        **Recommended:** Pass a `webhook_url` in the JSON body to receive results when the run completes — this is more efficient than polling. If you must poll, use `/run_status` with the returned `run_key` (suggested interval: 5–10 seconds). Grab data with `/run_data`.

      operationId: runNewEnrichment
      tags:
        - Enrichments
      parameters:
        - name: run_key
          in: query
          required: false
          description: Custom identifier for this run (optional, will be generated if not provided)
          schema:
            type: string
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/RunNewEnrichmentRequest"
            examples:
              prompt_and_attributes:
                summary: "Option 1: Prompt + attributes (recommended)"
                value:
                  input:
                    "Company Name": ["Acme Corp", "Tech Solutions Inc"]
                    "Website": ["acme.com", "techsolutions.com"]
                  prompt: "Find key business information about these companies"
                  attributes:
                    ["Employee Count", "Industry", "Annual Revenue", "CEO"]
              prompt_and_attributes_with_webhook:
                summary: "Option 1 + webhook delivery"
                value:
                  input:
                    "Company Name": ["Acme Corp", "Tech Solutions Inc"]
                    "Website": ["acme.com", "techsolutions.com"]
                  prompt: "Find key business information about these companies"
                  attributes:
                    ["Employee Count", "Industry", "Annual Revenue", "CEO"]
                  webhook_url: "https://your-server.com/webhook"
              full_output_specification:
                summary: "Option 2: Full output specification"
                value:
                  input:
                    "Company Name": ["Acme Corp", "Tech Solutions Inc"]
                    "Website": ["acme.com", "techsolutions.com"]
                  output:
                    "Employee Count":
                      prompt: "Find the number of employees at this company"
                      contexts: ["Company Name", "Website"]
                      format: "number"
                    "Industry":
                      prompt: "What industry is this company in?"
                      contexts: ["Company Name"]
                      format: "text"
              tool_only_code:
                summary: "Tool-only: JavaScript code execution"
                value:
                  input:
                    "First Name": ["Jane", "John"]
                    "Last Name": ["Doe", "Smith"]
                    "Revenue": ["1500000", "3200000"]
                    "Employees": ["50", "120"]
                  output:
                    "Full Name":
                      tool: "code"
                      code: "return `${args.first} ${args.last}`"
                      args:
                        first: "First Name"
                        last: "Last Name"
                    "Revenue Per Employee":
                      tool: "code"
                      code: "const r = parseFloat(args.revenue) || 0; const e = parseInt(args.employees) || 1; return (r / e).toFixed(2)"
                      args:
                        revenue: "Revenue"
                        employees: "Employees"
                      format: "number"
      responses:
        "200":
          description: Run initiated successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/EnrichmentRunResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "409":
          $ref: "#/components/responses/Conflict"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /monitor_enrichment:
    post:
      summary: monitor_enrichment
      description: |
        Create a monitor for an enrichment. Monitors run your enrichment on a schedule and can send webhook notifications with results.

      operationId: monitorEnrichment
      tags:
        - Monitors
      parameters:
        - name: enrichment_uuid
          in: query
          required: true
          description: UUID of the enrichment to monitor (project_uuid is also accepted)
          schema:
            type: string
            format: uuid
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                cadence:
                  type: string
                  enum: [daily, weekly, monthly]
                  description: How often the monitor runs
                minute:
                  type: integer
                  minimum: 0
                  maximum: 59
                  description: Minute of the hour to run
                hour:
                  type: integer
                  minimum: 0
                  maximum: 23
                  description: Hour of the day to run
                day_of_week:
                  type: integer
                  minimum: 0
                  maximum: 6
                  description: Day of the week (0=Sunday, required for weekly)
                day_of_month:
                  type: integer
                  minimum: 1
                  maximum: 28
                  description: Day of the month (required for monthly)
                timezone:
                  type: string
                  description: "Timezone (e.g. 'UTC', 'America/New_York')"
                webhook_url:
                  type: string
                  format: uri
                  description: URL to receive webhook notifications
                alert_rule:
                  type: string
                  enum: [each_run, only_on_change]
                  description: When to send alerts (default each_run)
                output_format:
                  type: string
                  enum: [current_only, current_and_previous]
                  description: Output format (default current_only)
                run_immediately:
                  type: boolean
                  description: Whether to run the monitor immediately after creation
                input:
                  $ref: "#/components/schemas/EnrichmentInputData"
                  description: Optional input data for the monitor
              required:
                - cadence
                - minute
                - hour
                - timezone
      responses:
        "201":
          description: Monitor created successfully
          content:
            application/json:
              schema:
                type: object
                properties:
                  request_status:
                    type: string
                    enum: [success]
                  message:
                    type: string
                  monitor:
                    type: object
                    properties:
                      uuid:
                        type: string
                      name:
                        type: string
                      cadence:
                        type: string
                      enabled:
                        type: boolean
                      project_uuid:
                        type: string
                        description: UUID of the enrichment (also available as enrichment_uuid)
                      project_name:
                        type: string
                        description: Name of the enrichment (also available as enrichment_name)
                      enrichment_uuid:
                        type: string
                        description: UUID of the enrichment
                      enrichment_name:
                        type: string
                        description: Name of the enrichment
                      next_run_at:
                        type: string
                      schedule_summary:
                        type: string
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /run_status:
    get:
      summary: run_status
      description: |
        Check the current status of an enrichment run.

        **Tip:** Webhooks are more efficient than polling. Pass `webhook_url` when starting a run to receive results automatically.

        **Polling interval:** If you must poll, we recommend 5–10 seconds between requests. Early time estimates may be unreliable — actual completion is often faster than initial projections.

        ## Quick Example
        ```bash
          curl -X GET "https://api.riveterhq.com/v1/run_status?run_key=YOUR_RUN_KEY" \
          -H "Authorization: Bearer YOUR_API_KEY"
        ```

        The run key comes from the [/run_enrichment](#tag/enrichments/post/run_enrichment) endpoint.
      operationId: getRunStatus
      tags:
        - Runs
      parameters:
        - name: run_key
          in: query
          required: true
          description: The run key (UUID) of the run to check
          schema:
            type: string
      responses:
        "200":
          description: Status retrieved successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/EnrichmentStatusResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"

  /run_data:
    get:
      summary: run_data
      description: |
        Retrieve the processed data from a completed run

        ## Quick Example
        ```bash
          curl -X GET "https://api.riveterhq.com/v1/run_data?run_key=YOUR_RUN_KEY" \
          -H "Authorization: Bearer YOUR_API_KEY"
        ```

        The run key comes from the [/run_enrichment](#tag/enrichments/post/run_enrichment) endpoint.
      operationId: getRunData
      tags:
        - Runs
      parameters:
        - name: run_key
          in: query
          required: true
          description: The run key (UUID) of the run to retrieve data for
          schema:
            type: string
      responses:
        "200":
          description: Run data retrieved successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/EnrichmentDataResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /stop_run:
    post:
      summary: stop_run
      description: |
        Stop a currently running enrichment. This will halt all processing and mark the run as stopped.

        ## Quick Example
        ```bash
          curl -X POST "https://api.riveterhq.com/v1/stop_run?run_key=YOUR_RUN_KEY" \
          -H "Authorization: Bearer YOUR_API_KEY"
        ```

        The run key comes from the [/run_enrichment](#tag/enrichments/post/run_enrichment) endpoint.

        ## Behavior
        - If the run is already stopped or success, returns success with current status
        - If the run is in progress, stops all pending cells and marks the run as stopped
        - Stopped runs cannot be resumed
      operationId: stopRun
      tags:
        - Runs
      parameters:
        - name: run_key
          in: query
          required: true
          description: The run key (UUID) of the run to stop
          schema:
            type: string
      responses:
        "200":
          description: Run stopped successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/EnrichmentStopResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /scrape:
    post:
      summary: scrape
      description: |
        Scrape a webpage and return the text content. This endpoint allows you to extract text content from any public webpage.

        ## Quick Example
        ```bash
          curl -X POST https://api.riveterhq.com/v1/scrape \
          -H "Authorization: Bearer YOUR_API_KEY" \
          -H "Content-Type: application/json" \
          -d '{"url": "https://example.com"}'
        ```

        ## Credit Costs
        - **With proxy**: 1/5 credit (0.20 credits)
        - **Without proxy**: 1/20 credit (0.05 credits)
        - **From cache**: Free (0 credits)

        ## Proxy Usage
        Scraping is not guaranteed to succeed without a proxy. Some websites may block requests or require specific geographic locations. Using a proxy may be necessary to guarantee results.

        To use a proxy, include the `proxy_country_code` parameter with a two-character country code (e.g., 'us', 'gb', 'de').

        ## Caching
        By default, recently scraped pages are cached to save credits. If you hit a recently cached webpage, the scrape is free. To always fetch fresh content, set `skip_cache` to true.

        ## Response
        The response includes:
        - `text`: The extracted text content from the webpage
        - `url`: The URL that was scraped
        - `base_url_for_links`: The base URL for resolving relative links
        - `status_code`: The HTTP status code returned by the server (e.g., 200, 404, 500)
        - `possibly_blocked`: (optional) Boolean flag if the page may be blocked by anti-scraping measures
        - `credit_used`: The number of credits consumed
        - `riveter_app_link`: Direct link to view this scrape in the Riveter application
      operationId: scrape
      tags:
        - Tools
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                url:
                  type: string
                  format: uri
                  description: The URL to scrape
                proxy_country_code:
                  type: string
                  description: Optional two-character country code for proxy (e.g., 'us', 'gb', 'de')
                  pattern: "^[a-z]{2}$"
                skip_cache:
                  type: boolean
                  description: Set to true to bypass cache and always fetch fresh content
                  default: false
              required:
                - url
            examples:
              basic_scrape:
                summary: Basic webpage scrape
                value:
                  url: "https://example.com"
              scrape_with_proxy:
                summary: Scrape with proxy
                value:
                  url: "https://example.com"
                  proxy_country_code: "us"
              scrape_skip_cache:
                summary: Scrape bypassing cache
                value:
                  url: "https://example.com"
                  skip_cache: true
      responses:
        "200":
          description: Webpage scraped successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/ScrapeResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /web_search:
    post:
      summary: web_search
      description: |
        Run one or many web searches in a single request. Each search is a `{ query, date_start?, date_end? }`
        object. This is a convenience wrapper around a tool-only `web_search` enrichment — you don't need to
        create an enrichment first.

        Handles a single search or up to **100,000** searches per request (larger batches fan out across the
        big-run pipeline).

        Optionally filter each search to a date range with `date_start` / `date_end` (format `YYYY-MM-DD`).
        Dates are optional and may be set per-search — searches without dates are unfiltered. If only
        `date_start` is given, `date_end` defaults to today.

        This is **async**: poll [/run_status](#tag/runs/get/run_status) with the returned `run_key`, then fetch
        results with [/run_data](#tag/runs/get/run_data). Or pass a `webhook_url` to receive results when the
        run completes. Results come back under the `search_results` column alongside the `query` (and dates).

        ## Quick Example (inline)
        ```bash
          curl -X POST https://api.riveterhq.com/v1/web_search \
          -H "Authorization: Bearer YOUR_API_KEY" \
          -H "Content-Type: application/json" \
          -d '{
            "searches": [
              { "query": "OpenAI GPT-4o mini", "date_start": "2024-07-01", "date_end": "2024-07-31" },
              { "query": "Anthropic Claude 3.5 Sonnet" }
            ]
          }'
        ```

        ## Quick Example (from a file)
        For larger batches, put the body in a file and pass it with curl's `@` syntax:

        `queries.json`:
        ```json
        {
          "searches": [
            { "query": "OpenAI GPT-4o mini", "date_start": "2024-07-01", "date_end": "2024-07-31" },
            { "query": "Anthropic Claude 3.5 Sonnet", "date_start": "2024-06-01", "date_end": "2024-06-30" },
            { "query": "Google Gemini 2.0" }
          ],
          "webhook_url": "https://your-server.com/webhook"
        }
        ```

        ```bash
          curl -X POST https://api.riveterhq.com/v1/web_search \
          -H "Authorization: Bearer YOUR_API_KEY" \
          -H "Content-Type: application/json" \
          -d @queries.json
        ```

        ## Credit Costs
        - **0.03 credits** per search.
      operationId: webSearch
      tags:
        - Tools
      parameters:
        - name: run_key
          in: query
          required: false
          description: Custom identifier for this run (optional, will be generated if not provided)
          schema:
            type: string
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                searches:
                  type: array
                  description: One or more searches to run (up to 100,000).
                  items:
                    type: object
                    properties:
                      query:
                        type: string
                        description: The search query (required).
                      date_start:
                        type: string
                        description: "Optional start date filter, format YYYY-MM-DD."
                      date_end:
                        type: string
                        description: "Optional end date filter, format YYYY-MM-DD. Defaults to today if date_start is set."
                    required:
                      - query
                webhook_url:
                  type: string
                  format: uri
                  description: Optional URL to POST results to when the run completes.
              required:
                - searches
            examples:
              single_search:
                summary: A single search
                value:
                  searches:
                    - query: "latest OpenAI news"
              single_search_with_dates:
                summary: A single date-bounded search
                value:
                  searches:
                    - query: "OpenAI GPT-4o mini release"
                      date_start: "2024-07-01"
                      date_end: "2024-07-31"
              many_searches:
                summary: Many searches with per-search date ranges (e.g. from @queries.json)
                value:
                  searches:
                    - query: "OpenAI GPT-4o mini"
                      date_start: "2024-07-01"
                      date_end: "2024-07-31"
                    - query: "Anthropic Claude 3.5 Sonnet"
                      date_start: "2024-06-01"
                      date_end: "2024-06-30"
                    - query: "Google Gemini 2.0"
                  webhook_url: "https://your-server.com/webhook"
      responses:
        "200":
          description: Search run initiated successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/EnrichmentRunResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "409":
          $ref: "#/components/responses/Conflict"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /pause_monitor:
    post:
      summary: pause_monitor
      description: |
        Pause an active monitor. If the monitor is already paused, this is a no-op and returns success.
      operationId: pauseMonitor
      tags:
        - Monitors
      parameters:
        - name: monitor_uuid
          in: query
          required: true
          description: UUID of the monitor to pause
          schema:
            type: string
            format: uuid
      responses:
        "200":
          description: Monitor paused successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/MonitorResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"

  /monitor_status:
    get:
      summary: monitor_status
      description: |
        Retrieve the current status and configuration of a monitor.
      operationId: getMonitorStatus
      tags:
        - Monitors
      parameters:
        - name: monitor_uuid
          in: query
          required: true
          description: UUID of the monitor to check
          schema:
            type: string
            format: uuid
      responses:
        "200":
          description: Monitor status retrieved successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/MonitorResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"

  /monitor_recent_run_data:
    get:
      summary: monitor_recent_run_data
      description: |
        Retrieve the data from the most recent run of a monitor. Returns the formatted output data along with run status details.
      operationId: getMonitorRecentRunData
      tags:
        - Monitors
      parameters:
        - name: monitor_uuid
          in: query
          required: true
          description: UUID of the monitor
          schema:
            type: string
            format: uuid
      responses:
        "200":
          description: Recent run data retrieved successfully
          content:
            application/json:
              schema:
                type: object
                properties:
                  request_status:
                    type: string
                    enum: [success]
                  message:
                    type: string
                  run_key:
                    type: string
                  formatted_data:
                    $ref: "#/components/schemas/EnrichmentFormattedData"
                    description: The processed data from the most recent run in columnar format
                  status:
                    $ref: "#/components/schemas/EnrichmentRunStatusDetails"
                  monitor_uuid:
                    type: string
                    format: uuid
                required:
                  - request_status
                  - message
                  - run_key
                  - formatted_data
                  - status
                  - monitor_uuid
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /api_stats:
    get:
      summary: api_stats
      description: |
        Retrieve API usage statistics for the current account. Returns counts of API runs grouped by status.

        Pass `detailed=true` to include run-level detail (run_key and app URL) for active runs (pending, enqueued, processing).
      operationId: getApiStats
      tags:
        - Account
      parameters:
        - name: detailed
          in: query
          required: false
          description: Set to true to include run-level detail for active statuses
          schema:
            type: boolean
            default: false
      responses:
        "200":
          description: API stats retrieved successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/ApiStatsResponse"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"

  /build_dataset:
    post:
      summary: build_dataset
      description: |
        Build a dataset by providing either a natural-language prompt, a structured spec (identifiers, qualifiers, attributes), or both.

        **Recommended:** Pass a `dataset_webhook_url` to receive results when the build completes — this is more efficient than polling. If you must poll, use `/dataset_build_status` (suggested interval: 5–10 seconds).

        ## Input options

        **Prompt only** — describe what you want in plain English:
        ```json
        { "prompt": "Top 50 SaaS companies by revenue", "max_items": 50 }
        ```

        **Structured spec** — define the shape explicitly:
        ```json
        {
          "identifiers": ["Company Name"],
          "qualifiers": ["B2B SaaS", "revenue > $10M"],
          "attributes": ["CEO", "Headquarters"],
          "max_items": 50
        }
        ```

        **Both** — the prompt provides intent while the spec constrains the output.
      operationId: buildDataset
      tags:
        - Dataset Builder
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                prompt:
                  type: string
                  description: Natural-language description of the dataset to build. Can be used alone or combined with identifiers/qualifiers/attributes. At least one of prompt or identifiers is required.
                identifiers:
                  type: array
                  items:
                    type: string
                  maxItems: 3
                  description: Column names that uniquely identify each row (max 3). Can be used alone or combined with prompt. At least one of prompt or identifiers is required.
                qualifiers:
                  type: array
                  items:
                    type: string
                  maxItems: 10
                  description: Filters or constraints on the dataset (max 10). Used with identifiers and/or prompt.
                attributes:
                  type: array
                  items:
                    type: string
                  maxItems: 10
                  description: Additional columns to include in the output (max 10). Used with identifiers and/or prompt.
                max_items:
                  type: integer
                  minimum: 1
                  default: 100
                  description: Maximum number of rows to generate
                dataset_webhook_url:
                  type: string
                  format: uri
                  description: |
                    URL to receive a POST when the dataset build completes.

                    Always pass URLs containing query strings (e.g. Power Automate / Azure Logic Apps SAS URLs) here in the body — never as a query parameter — so the `&` characters in the SAS token are preserved.
                auto_run_enrichment:
                  type: boolean
                  description: Automatically create and run an enrichment from the dataset when complete. When true, an auto_run_enrichment_run_key is returned immediately in the response that can be used to poll run_status / run_data.
                  default: false
                auto_run_enrichment_webhook_url:
                  type: string
                  format: uri
                  description: Webhook URL for the auto-run enrichment (requires auto_run_enrichment). Same body-only guidance applies.
            examples:
              prompt_only:
                summary: Prompt only
                value:
                  prompt: "Top 50 SaaS companies by revenue"
                  max_items: 50
              structured_spec:
                summary: Structured spec only
                value:
                  identifiers: ["Company Name"]
                  qualifiers: ["B2B SaaS", "revenue > $10M"]
                  attributes: ["CEO", "Headquarters"]
                  max_items: 50
              prompt_and_spec:
                summary: Prompt + structured spec (combined)
                value:
                  prompt: "Top 50 B2B SaaS companies by revenue"
                  identifiers: ["Company Name"]
                  qualifiers: ["B2B SaaS", "revenue > $10M"]
                  attributes: ["CEO", "Headquarters", "Founded Year"]
                  max_items: 50
              prompt_with_webhook:
                summary: Prompt + dataset webhook delivery
                value:
                  prompt: "Top 50 SaaS companies by revenue"
                  max_items: 50
                  dataset_webhook_url: "https://your-server.com/dataset-webhook"
      responses:
        "200":
          description: Dataset build started
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/DatasetBuildResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /dataset_build_status:
    get:
      summary: dataset_build_status
      description: |
        Check the current status of a dataset build.

        **Tip:** Webhooks are more efficient than polling. Pass `dataset_webhook_url` when starting a build to receive results automatically.

        **Polling interval:** If you must poll, we recommend 30 seconds between requests. Early time estimates may be unreliable — actual completion is often faster than initial projections.

      operationId: getDatasetBuildStatus
      tags:
        - Dataset Builder
      parameters:
        - name: run_key
          in: query
          required: true
          description: The run_key returned by /build_dataset
          schema:
            type: string
      responses:
        "200":
          description: Status retrieved successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/DatasetStatusResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"

  /dataset_build_data:
    get:
      summary: dataset_build_data
      description: |
        Retrieve the generated data from a completed dataset build. Returns the data in columnar format along with status details.

      operationId: getDatasetBuildData
      tags:
        - Dataset Builder
      parameters:
        - name: run_key
          in: query
          required: true
          description: The run_key returned by /build_dataset
          schema:
            type: string
      responses:
        "200":
          description: Dataset data retrieved successfully
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/DatasetDataResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"

  /stop_dataset_build:
    post:
      summary: stop_dataset_build
      description: |
        Stop a dataset build that is currently in progress. Only builds in an active state can be stopped.

      operationId: stopDatasetBuild
      tags:
        - Dataset Builder
      parameters:
        - name: run_key
          in: query
          required: true
          description: The run_key returned by /build_dataset
          schema:
            type: string
      requestBody:
        required: false
        content:
          application/json:
            schema:
              type: object
              properties:
                dataset_webhook_url:
                  type: string
                  format: uri
                  description: |
                    Optionally update the webhook URL before stopping. Always pass URLs containing query strings (e.g. Power Automate / Azure Logic Apps SAS URLs) here in the body — never as a query parameter.
      responses:
        "200":
          description: Stop signal sent
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/DatasetBuildResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"

  /create_enrichment_from_dataset:
    post:
      summary: create_enrichment_from_dataset
      description: |
        Create an enrichment from a completed dataset build. The dataset must have finished building and have result data available.

        After creating the enrichment, use `/run_enrichment_from_dataset` or `/run_enrichment` to run it.
      operationId: createEnrichmentFromDataset
      tags:
        - Dataset Builder
      parameters:
        - name: run_key
          in: query
          required: true
          description: The run_key returned by /build_dataset
          schema:
            type: string
      responses:
        "200":
          description: Enrichment created from dataset
          content:
            application/json:
              schema:
                type: object
                properties:
                  request_status:
                    type: string
                    enum: [success]
                  message:
                    type: string
                  run_key:
                    type: string
                  enrichment_uuid:
                    type: string
                    format: uuid
                    description: UUID of the newly created enrichment
                required:
                  - request_status
                  - message
                  - run_key
                  - enrichment_uuid
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"
        "500":
          description: Internal server error
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/Error"

  /run_enrichment_from_dataset:
    post:
      summary: run_enrichment_from_dataset
      description: |
        Run an enrichment using a dataset build's result data as input. If an enrichment hasn't been created yet, one is automatically created from the dataset with the attributes of the dataset's columns.

        **Recommended:** Pass a `webhook_url` in the JSON body to receive results when the run completes — this is more efficient than polling. If you must poll, use `/run_status` with the returned `run_key` (suggested interval: 5–10 seconds). Grab data with `/run_data`.
      operationId: runEnrichmentFromDataset
      tags:
        - Dataset Builder
      parameters:
        - name: run_key
          in: query
          required: true
          description: The run_key returned by /build_dataset
          schema:
            type: string
      requestBody:
        required: false
        content:
          application/json:
            schema:
              type: object
              properties:
                webhook_url:
                  type: string
                  format: uri
                  description: |
                    URL to receive a POST when the enrichment run completes. The webhook payload includes the full results (same as /run_data). See the Webhooks section above for payload format and details.

                    Always pass URLs containing query strings (e.g. Power Automate / Azure Logic Apps SAS URLs) here in the body — passing them as query parameters will truncate them at the first `&`.
            examples:
              with_webhook:
                summary: Run with webhook delivery
                value:
                  webhook_url: "https://your-server.com/webhook"
      responses:
        "200":
          description: Enrichment run initiated
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/EnrichmentStatusResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"
        "409":
          $ref: "#/components/responses/Conflict"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /build_dataset_from_enrichment:
    post:
      summary: build_dataset_from_enrichment
      description: |
        Build a dataset using an existing enrichment's column structure. Instead of specifying identifiers and attributes manually, the endpoint derives them from the enrichment:

        - **Source-data columns** become the dataset's **identifiers** (max 3)
        - **Agent / tool-only columns** become the dataset's **attributes**

        You provide a prompt describing what entities to find, qualifiers to filter them, and an optional max_items limit.

        **Recommended:** Pass `dataset_webhook_url` and/or `auto_run_enrichment_webhook_url` to receive results when complete — this is more efficient than polling. If you must poll, use `/dataset_build_status` and `/run_status` (suggested interval: 5–10 seconds).

        After the dataset is built, you can either:
        - Use `/run_enrichment_from_dataset` to manually run the enrichment on the generated data, or
        - Pass `auto_run_enrichment: true` to automatically run the enrichment when the dataset completes. An `auto_run_enrichment_run_key` is returned immediately that can be used with `/run_status` and `/run_data`.
      operationId: buildDatasetFromEnrichment
      tags:
        - Dataset Builder
      parameters:
        - name: enrichment_uuid
          in: query
          required: true
          description: UUID of the enrichment whose column structure to use
          schema:
            type: string
            format: uuid
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                prompt:
                  type: string
                  description: Natural-language description of the dataset to build. Describes what entities to find for the enrichment's source-data columns.
                qualifiers:
                  type: array
                  items:
                    type: string
                  maxItems: 10
                  description: Filters or constraints on the dataset (max 10)
                max_items:
                  type: integer
                  minimum: 1
                  default: 100
                  description: Maximum number of rows to generate
                dataset_webhook_url:
                  type: string
                  format: uri
                  description: |
                    URL to receive a POST when the dataset build completes.

                    Always pass URLs containing query strings (e.g. Power Automate / Azure Logic Apps SAS URLs) here in the body — never as a query parameter — so the `&` characters in the SAS token are preserved.
                auto_run_enrichment:
                  type: boolean
                  description: Automatically run the enrichment on the generated dataset when complete. When true, an auto_run_enrichment_run_key is returned immediately in the response that can be used to poll run_status / run_data.
                  default: false
                auto_run_enrichment_webhook_url:
                  type: string
                  format: uri
                  description: Webhook URL for the auto-run enrichment (requires auto_run_enrichment). Same body-only guidance applies.
              required:
                - prompt
                - qualifiers
            examples:
              basic:
                summary: Build a dataset of companies for an existing enrichment
                value:
                  prompt: "SaaS companies in the United States"
                  qualifiers: ["B2B", "revenue > $10M", "founded after 2010"]
                  max_items: 50
              auto_run:
                summary: Build dataset and auto-run the enrichment
                value:
                  prompt: "SaaS companies in the United States"
                  qualifiers: ["B2B", "revenue > $10M"]
                  max_items: 25
                  auto_run_enrichment: true
                  auto_run_enrichment_webhook_url: "https://example.com/webhook"
      responses:
        "200":
          description: Dataset build started
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/DatasetBuildResponse"
        "400":
          $ref: "#/components/responses/BadRequest"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "404":
          $ref: "#/components/responses/NotFound"
        "422":
          $ref: "#/components/responses/UnprocessableEntity"

  /account:
    get:
      summary: account
      description: Retrieve information about the current account associated with the API key
      operationId: getAccount
      tags:
        - Account
      responses:
        "200":
          description: Account information retrieved successfully
          content:
            application/json:
              schema:
                type: object
                properties:
                  account:
                    $ref: "#/components/schemas/Account"
                  api_key_info:
                    $ref: "#/components/schemas/ApiKeyInfo"
        "401":
          $ref: "#/components/responses/Unauthorized"
        "403":
          $ref: "#/components/responses/Forbidden"
        "503":
          $ref: "#/components/responses/ServiceUnavailable"

components:
  securitySchemes:
    ApiKeyAuth:
      type: http
      scheme: bearer
      bearerFormat: API_KEY
      description: API key authentication. Use 'Bearer YOUR_API_KEY' in the Authorization header.
      x-scalar-secret-token: YOUR_API_KEY

  schemas:
    Account:
      type: object
      properties:
        uuid:
          type: string
          format: uuid
          description: Unique identifier for the account
        name:
          type: string
          description: Account name
        plan:
          type: string
          enum: [free, starter, advanced, pro, enterprise]
          description: Current billing plan
        credit:
          $ref: "#/components/schemas/Credit"
      required:
        - uuid
        - name
        - plan
        - credit

    Credit:
      type: object
      properties:
        count:
          type: integer
          description: Current credit count
        max:
          type: integer
          description: Maximum credits available
        balance:
          type: integer
          description: Remaining credit balance
      required:
        - count
        - max
        - balance

    ApiKeyInfo:
      type: object
      properties:
        name:
          type: string
          description: Name of the API key
        last_used_at:
          type: [string, "null"]
          format: date-time
          description: When the API key was last used
        created_by:
          $ref: "#/components/schemas/User"
      required:
        - name
        - last_used_at
        - created_by

    User:
      type: object
      properties:
        uuid:
          type: string
          format: uuid
          description: User's unique identifier
        name:
          type: string
          description: User's full name
        email:
          type: string
          format: email
          description: User's email address
      required:
        - uuid
        - name
        - email

    EnrichmentInputData:
      type: object
      description: |
        Keys are your source-data column headers. Values are arrays of strings (one per row).
        You may include any column header name; the examples below are illustrative.
      properties:
        "Company Name":
          type: array
          description: One string per row (all input columns must have the same array length, max 1000).
          items:
            type: string
          maxItems: 1000
        "Website":
          type: array
          description: One string per row (all input columns must have the same array length, max 1000).
          items:
            type: string
          maxItems: 1000
        "Domain":
          type: array
          description: One string per row (all input columns must have the same array length, max 1000).
          items:
            type: string
          maxItems: 1000

    FormatDetails:
      type: object
      description: |
        Format-specific options. Valid keys depend on the column `format` (see run_new_enrichment examples).
        Only include keys that apply to your chosen format.
      properties:
        options:
          type: array
          items:
            type: string
          description: tag — allowed tag values (required for tag format)
        allow_multiple:
          type: boolean
          description: tag — allow selecting more than one tag
        descriptions:
          type: object
          description: tag — optional map from tag value to description (keys must be in `options`)
          properties:
            "Enterprise":
              type: string
              description: Example tag option description
        decimal_places:
          type: integer
          minimum: 0
          maximum: 9
          description: number — decimal places
        currency_code:
          type: string
          description: number — 3-letter currency code (mutually exclusive with percentage)
        commas:
          type: boolean
          description: number — display thousands separators
        percentage:
          type: boolean
          description: number — format as percentage (mutually exclusive with currency_code)
        description:
          type: string
          description: json — natural-language schema description (use with or instead of `schema`)
        schema:
          type: object
          description: json — JSON Schema object for structured output
          properties:
            type:
              type: string
              example: object
            properties:
              type: object
              description: JSON Schema properties map
              properties:
                example_field:
                  type: object
                  description: Schema for one field (example)
        iso_8601:
          type: boolean
          description: date — output ISO 8601 (cannot combine with month/day/year/delimiter)
        month:
          type: string
          enum: [M, MM, MMM, MMMM]
          description: date — month format token
        day:
          type: string
          enum: [D, DD, Do]
          description: date — day format token
        year:
          type: string
          enum: [YYYY, YY]
          description: date — year format token
        delimiter:
          type: string
          description: date — single-character delimiter between date parts
        true_value:
          type: string
          description: boolean — display string for true
        false_value:
          type: string
          description: boolean — display string for false

    EnrichmentOutputColumnConfig:
      type: object
      description: |
        Per-column enrichment config. **Agent mode** (default): include `prompt` and `contexts`.
        **Tool-only mode**: set `tool` and its parameters (do not use `prompt`/`contexts`).
      properties:
        prompt:
          type: string
          description: Agent instructions for this column (agent mode)
        contexts:
          type: array
          description: Column headers used as input context (agent mode)
          items:
            type: string
        tools:
          type: array
          description: "Agent tools: web_search, scrape, pdf, image, etc."
          items:
            type: string
        format:
          type: string
          enum: [text, number, url, email, tag, date, json, boolean]
        format_details:
          $ref: "#/components/schemas/FormatDetails"
        run_when:
          type: string
          enum: [always, any_filled, all_filled, dynamic]
        run_when_config:
          type: object
          properties:
            match_mode:
              type: string
              enum: [all, any]
            rules:
              type: array
              items:
                type: object
                properties:
                  column:
                    type: string
                  condition:
                    type: string
                  value:
                    type: string
        tool:
          type: string
          description: "Tool-only mode: scrape, web_search, pdf, image, code, LinkedIn tools, etc."
        url:
          type: string
          description: Column header or static URL (tool-only)
        query:
          type: string
          description: Column header or static query (tool-only, web_search)
        date_start:
          type: string
          description: "Optional start date for filtering search results. Format: YYYY-MM-DD (tool-only, web_search)"
        date_end:
          type: string
          description: "Optional end date for filtering search results. Format: YYYY-MM-DD. Defaults to today if date_start is provided. (tool-only, web_search)"
        code:
          type: string
          description: JavaScript source (tool-only, code tool)
        args:
          type: object
          description: |
            Named arguments for the code tool (tool-only). Keys are names referenced in your JavaScript
            (e.g. `args.first`). Values are column headers (dynamic per row) or static strings.
            You may include any argument name; the examples below are illustrative.
          properties:
            first:
              type: string
              description: Column header name or static value
            last:
              type: string
              description: Column header name or static value
            revenue:
              type: string
              description: Column header name or static value (example)
        proxy_country_code:
          type: string
        wait_longer:
          type: boolean
        skip_cache:
          type: boolean

    EnrichmentOutputSpec:
      type: object
      description: |
        Keys are output column headers. Values are per-column configuration objects.
        Any output column name is allowed; expand the example columns below to see all supported fields.
      properties:
        "Employee Count":
          $ref: "#/components/schemas/EnrichmentOutputColumnConfig"
        "Industry":
          $ref: "#/components/schemas/EnrichmentOutputColumnConfig"
        "CEO":
          $ref: "#/components/schemas/EnrichmentOutputColumnConfig"
        "Website":
          $ref: "#/components/schemas/EnrichmentOutputColumnConfig"

    UpdateEnrichmentRequestBody:
      type: object
      description: |
        Output changes keyed by column header. Existing columns can be partially updated; new column names must include
        a full output configuration; set `"delete": true` on a column to remove it.
        Each column uses the same fields as [/run_new_enrichment](#tag/enrichments/post/run_new_enrichment) output columns
        (expand **CEO** or **Industry** below). Any other output column name is allowed.
      properties:
        "Column Header Name":
          $ref: "#/components/schemas/EnrichmentOutputColumnConfig"
      example:
        "CEO":
          prompt: "Find the current CEO of this company using recent news and filings"
          contexts: ["Company Name", "Website"]
        "Industry":
          prompt: "What industry is this company in?"
          contexts: ["Company Name"]
          format: "tag"
          format_details:
            options: ["SaaS", "Fintech", "Healthcare", "Other"]
          tools: ["web_search", "scrape"]

    RunNewEnrichmentRequest:
      type: object
      required: [input]
      properties:
        input:
          $ref: "#/components/schemas/EnrichmentInputData"
        prompt:
          type: string
          description: "Option 1 (recommended): Natural-language description of the enrichment. Must be provided with attributes. Cannot be combined with output."
        attributes:
          type: array
          items:
            type: string
          maxItems: 10
          description: "Option 1 (recommended): Output column names (max 10). Must be provided with prompt. Cannot be combined with output."
        output:
          $ref: "#/components/schemas/EnrichmentOutputSpec"
        webhook_url:
          type: string
          format: uri
          description: URL to POST results when the run completes (same payload as /run_data)

    RunExistingEnrichmentRequest:
      type: object
      required: [input]
      properties:
        input:
          $ref: "#/components/schemas/EnrichmentInputData"
        webhook_url:
          type: string
          format: uri
          description: URL to POST results when the run completes (same payload as /run_data)

    EnrichmentCellValue:
      type: object
      properties:
        value:
          type: string
          description: Cell result text
      required: [value]

    EnrichmentFormattedData:
      type: object
      description: |
        Columnar results. Keys are column headers; values are arrays of `{ "value": "..." }` objects per row.
        Expand the example columns below; additional column headers use the same array shape.
      properties:
        "Company Name":
          type: array
          description: One cell per row
          items:
            type: object
            required: [value]
            properties:
              value:
                type: string
                description: Cell result text
        "CEO":
          type: array
          description: One cell per row
          items:
            type: object
            required: [value]
            properties:
              value:
                type: string
                description: Cell result text
        "Website":
          type: array
          description: One cell per row
          items:
            type: object
            required: [value]
            properties:
              value:
                type: string
                description: Cell result text

    ApiStatsCounts:
      type: object
      description: Run counts grouped by status. Additional status keys use the same shape as the examples below.
      properties:
        pending:
          type: object
          properties:
            count:
              type: integer
            detail:
              type: array
              items:
                type: object
                properties:
                  run_key:
                    type: string
                  url:
                    type: string
                    format: uri
        enqueued:
          type: object
          properties:
            count:
              type: integer
        processing:
          type: object
          properties:
            count:
              type: integer
            detail:
              type: array
              items:
                type: object
                properties:
                  run_key:
                    type: string
                  url:
                    type: string
                    format: uri
        success:
          type: object
          properties:
            count:
              type: integer
        stopped:
          type: object
          properties:
            count:
              type: integer
        failed:
          type: object
          description: Example of another status key
          properties:
            count:
              type: integer

    EnrichmentRunStatusDetails:
      type: object
      description: Status information for the run
      properties:
        status:
          type: string
          enum: [pending, enqueued, processing, success, stopped]
          description: Current status of the run
        credits_used:
          type: number
          description: Number of credits consumed by this run
        total_cells_expected:
          type: integer
          description: Total number of cells that need to be processed (rows × columns)
        completed_cells:
          type: integer
          description: Number of cells successfully completed
        not_found_cells:
          type: integer
          description: Number of cells with 'not found' results
        project_name:
          type: string
          description: Name of the enrichment (also available as enrichment_name)
        project_uuid:
          type: string
          format: uuid
          description: UUID of the enrichment (also available as enrichment_uuid)
        enrichment_name:
          type: string
          description: Name of the enrichment
        enrichment_uuid:
          type: string
          format: uuid
          description: UUID of the enrichment
        error_message:
          type: [string, "null"]
          description: Error message if an error occurred
        riveter_app_link:
          type: string
          description: Direct link to view this run in the Riveter application
      required:
        - status
        - credits_used
        - total_cells_expected
        - completed_cells
        - project_name
        - project_uuid
        - riveter_app_link

    EnrichmentStructureResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success, error]
        message:
          type: string
        enrichment_uuid:
          type: string
          format: uuid
        enrichment_name:
          type: string
        name:
          type: string
        uuid:
          type: string
          format: uuid
        app_url:
          type: string
        status:
          type: string
        input:
          type: array
          items:
            type: string
          description: Names of input (source data) columns
        output:
          $ref: "#/components/schemas/EnrichmentOutputSpec"
      required:
        - request_status
        - message

    EnrichmentRunResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success, error]
          description: Status of the request
        message:
          type: string
          description: Human-readable message
        run_key:
          type: string
          description: Unique identifier for this run
        status:
          type: string
          enum: [pending, enqueued, processing, success, stopped]
          description: Current status of the run
        credits_used:
          type: number
          description: Number of credits consumed by this run
        total_cells_expected:
          type: integer
          description: Total number of cells that need to be processed (rows × columns)
        completed_cells:
          type: integer
          description: Number of cells successfully completed
        not_found_cells:
          type: integer
          description: Number of cells with 'not found' results
        project_name:
          type: string
          description: Name of the enrichment (also available as enrichment_name)
        project_uuid:
          type: string
          format: uuid
          description: UUID of the enrichment (also available as enrichment_uuid)
        enrichment_name:
          type: string
          description: Name of the enrichment
        enrichment_uuid:
          type: string
          format: uuid
          description: UUID of the enrichment
        error_message:
          type: [string, "null"]
          description: Error message if an error occurred
        riveter_app_link:
          type: string
          description: Direct link to view this run in the Riveter application
      required:
        - request_status
        - message
        - run_key
        - status
        - credits_used
        - total_cells_expected
        - completed_cells
        - not_found_cells
        - project_name
        - project_uuid
        - enrichment_name
        - enrichment_uuid
        - riveter_app_link

    EnrichmentStatusResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success, error]
          description: Status of the request
        message:
          type: string
          description: Human-readable message
        run_key:
          type: string
          description: Unique identifier for this run
        status:
          type: string
          enum: [pending, enqueued, processing, success, stopped]
          description: Current status of the run
        credits_used:
          type: number
          description: Number of credits consumed by this run
        total_cells_expected:
          type: integer
          description: Total number of cells that need to be processed (rows × columns)
        completed_cells:
          type: integer
          description: Number of cells successfully completed
        not_found_cells:
          type: integer
          description: Number of cells with 'not found' results
        project_name:
          type: string
          description: Name of the enrichment (also available as enrichment_name)
        project_uuid:
          type: string
          format: uuid
          description: UUID of the enrichment (also available as enrichment_uuid)
        enrichment_name:
          type: string
          description: Name of the enrichment
        enrichment_uuid:
          type: string
          format: uuid
          description: UUID of the enrichment
        error_message:
          type: [string, "null"]
          description: Error message if an error occurred
        riveter_app_link:
          type: string
          description: Direct link to view this run in the Riveter application
      required:
        - request_status
        - message
        - run_key
        - status
        - credits_used
        - total_cells_expected
        - completed_cells
        - not_found_cells
        - project_name
        - project_uuid
        - enrichment_name
        - enrichment_uuid
        - riveter_app_link

    EnrichmentDataResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success, error]
          description: Status of the request
        message:
          type: string
          description: Human-readable message
        run_key:
          type: string
          description: Unique identifier for this run
        formatted_data:
          $ref: "#/components/schemas/EnrichmentFormattedData"
        status:
          $ref: "#/components/schemas/EnrichmentRunStatusDetails"
      required:
        - request_status
        - message
        - run_key
        - formatted_data
        - status

    EnrichmentStopResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success, error]
          description: Status of the request
        message:
          type: string
          description: Human-readable message about the stop operation
        run_key:
          type: string
          description: Unique identifier for this run
        status:
          type: string
          enum: [stopped, success]
          description: Current status of the run after stop attempt
        stopped_at:
          type: [string, "null"]
          format: date-time
          description: When the run was stopped (if stopped)
        finished_at:
          type: [string, "null"]
          format: date-time
          description: When the run finished (if already completed)
        stopped_cells_count:
          type: integer
          description: Number of cells that were stopped (only present if run was actively stopped)
        project_name:
          type: string
          description: Name of the enrichment (also available as enrichment_name)
        project_uuid:
          type: string
          format: uuid
          description: UUID of the enrichment (also available as enrichment_uuid)
        enrichment_name:
          type: string
          description: Name of the enrichment
        enrichment_uuid:
          type: string
          format: uuid
          description: UUID of the enrichment
      required:
        - request_status
        - message
        - run_key
        - status
        - project_name
        - project_uuid
        - enrichment_name
        - enrichment_uuid

    ScrapeResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success, error]
          description: Status of the request
        message:
          type: string
          description: Human-readable message
        run_key:
          type: string
          description: Unique identifier for this scrape run
        data:
          type: object
          properties:
            url:
              type: string
              format: uri
              description: The URL that was scraped
            text:
              type: string
              description: The extracted text content from the webpage
            base_url_for_links:
              type: string
              format: uri
              description: The base URL for resolving relative links
            status_code:
              type: integer
              description: The HTTP status code returned by the server (e.g., 200, 404, 500)
              example: 200
            possibly_blocked:
              type: boolean
              description: Optional flag indicating if the page may be blocked by anti-scraping measures (captcha, access denied, etc.)
            credit_used:
              type: number
              description: Number of credits consumed (0 for cache hit, 0.05 without proxy, 0.20 with proxy)
            riveter_app_link:
              type: string
              format: uri
              description: Direct link to view this scrape in the Riveter application
          required:
            - url
            - text
            - base_url_for_links
            - credit_used
            - riveter_app_link
      required:
        - request_status
        - message
        - run_key
        - data

    MonitorResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success]
        message:
          type: string
        monitor:
          type: object
          properties:
            uuid:
              type: string
              format: uuid
            name:
              type: string
            cadence:
              type: string
              enum: [daily, weekly, monthly]
            minute:
              type: integer
            hour:
              type: integer
            day_of_week:
              type: [integer, "null"]
            day_of_month:
              type: [integer, "null"]
            timezone:
              type: string
            enabled:
              type: boolean
            webhook_url:
              type: [string, "null"]
              format: uri
            alert_rule:
              type: string
              enum: [each_run, only_on_change]
            output_format:
              type: string
              enum: [current_only, current_and_previous]
            next_run_at:
              type: [string, "null"]
              format: date-time
            schedule_summary:
              type: string
            project_uuid:
              type: string
              format: uuid
              description: UUID of the enrichment (also available as enrichment_uuid)
            project_name:
              type: string
              description: Name of the enrichment (also available as enrichment_name)
            enrichment_uuid:
              type: string
              format: uuid
            enrichment_name:
              type: string
            created_at:
              type: string
              format: date-time
            has_input:
              type: boolean
          required:
            - uuid
            - cadence
            - enabled
            - project_uuid
            - project_name
            - enrichment_uuid
            - enrichment_name
      required:
        - request_status
        - message
        - monitor

    ApiStatsResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success]
        stats:
          $ref: "#/components/schemas/ApiStatsCounts"
      required:
        - request_status
        - stats

    DatasetBuildResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success]
        message:
          type: string
        run_key:
          type: string
          description: Unique identifier for this dataset build (use with /dataset_build_status and /dataset_build_data)
        max_items:
          type: integer
        app_url:
          type: string
          format: uri
        auto_run_enrichment_run_key:
          type: string
          description: Returned when auto_run_enrichment is true. Use with /run_status and /run_data to poll the enrichment run.
      required:
        - request_status
        - message
        - run_key

    DatasetStatusResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success]
        message:
          type: string
        run_key:
          type: string
        state:
          type: string
          description: Current build state
        prompt:
          type: string
        identifiers:
          type: [array, "null"]
          items:
            type: string
        qualifiers:
          type: [array, "null"]
          items:
            type: string
        attributes:
          type: [array, "null"]
          items:
            type: string
        max_items:
          type: integer
        has_result:
          type: boolean
        error:
          type: [string, "null"]
        started_at:
          type: [string, "null"]
          format: date-time
        completed_at:
          type: [string, "null"]
          format: date-time
        created_at:
          type: string
          format: date-time
        credits_charged:
          type: [number, "null"]
        credits_refunded:
          type: [number, "null"]
        app_url:
          type: string
          format: uri
        enrichment_uuid:
          type: string
          format: uuid
          description: Present only if an enrichment has been created from this dataset
      required:
        - request_status
        - message
        - run_key
        - state
        - has_result
        - created_at

    DatasetDataResponse:
      type: object
      properties:
        request_status:
          type: string
          enum: [success]
        message:
          type: string
        run_key:
          type: string
        state:
          type: string
        has_result:
          type: boolean
        formatted_data:
          description: The generated data in columnar format (keys are column names, values are arrays of {value} objects). Omitted or null if no result yet.
          $ref: "#/components/schemas/EnrichmentFormattedData"
        row_count:
          type: integer
          description: Number of rows in the result
        credits_charged:
          type: [number, "null"]
        credits_refunded:
          type: [number, "null"]
        app_url:
          type: string
          format: uri
        enrichment_uuid:
          type: string
          format: uuid
          description: Present only if an enrichment has been created from this dataset
      required:
        - request_status
        - message
        - run_key
        - state
        - has_result
        - row_count

    Error:
      type: object
      properties:
        request_status:
          type: string
          enum: [error]
          description: Status of the request
        message:
          type: string
          description: Human-readable error message
        errors:
          type: array
          items:
            type: string
          description: List of specific error details
        error_type:
          type: string
          enum:
            [
              bad_request,
              validation,
              not_found,
              unauthorized,
              forbidden,
              conflict,
              server_error,
              service_unavailable,
            ]
          description: Type of error that occurred
      required:
        - request_status
        - message
        - error_type

  responses:
    BadRequest:
      description: Bad request - invalid input or missing required parameters
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/Error"
          example:
            request_status: error
            message: "Input validation failed - data format does not meet requirements"
            errors:
              [
                "All arrays must be the same length. Found different lengths: Company Name: 2, Website: 1",
              ]
            error_type: validation

    Unauthorized:
      description: Unauthorized - invalid or missing API key
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/Error"
          example:
            request_status: error
            message: "Invalid API key"
            error_type: unauthorized

    Forbidden:
      description: Forbidden - API access not enabled for account
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/Error"
          example:
            request_status: error
            message: "API access not enabled for this account"
            error_type: forbidden

    NotFound:
      description: Resource not found
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/Error"
          examples:
            enrichment_not_found:
              value:
                request_status: error
                message: "Enrichment not found"
                error_type: not_found
            run_not_found:
              value:
                request_status: error
                message: "API run not found"
                run_key: "some-run-key"
                error_type: not_found

    Conflict:
      description: Conflict - resource already exists or conflicting state
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/Error"
          example:
            request_status: error
            message: "Run key already exists for this enrichment"
            error_type: conflict

    UnprocessableEntity:
      description: Unprocessable entity - validation failed
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/Error"
          example:
            request_status: error
            message: "Enrichment validation failed"
            errors: ["Enrichment must have at least one source data column"]
            error_type: validation

    ServiceUnavailable:
      description: Service unavailable - API access is disabled
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/Error"
          example:
            request_status: error
            message: "API access is currently disabled"
            error_type: service_unavailable

tags:
  - name: Enrichments
    description: Create, configure, update, and run enrichments via the API
  - name: Runs
    description: Check status, retrieve data, and manage running API requests
  - name: Monitors
    description: Create and manage monitors that run your enrichments on a schedule
  - name: Dataset Builder
    description: Build datasets from natural-language prompts or structured specs, then optionally create and run enrichments from the results
  - name: Tools
    description: Standalone tools for web scraping and data extraction
  - name: Account
    description: Account information and management