Azure Captioner¶

matmmextract.inference.captioner_azure¶

Generate per-panel sub-captions using Azure-hosted models via the OpenAI-compatible API (Mistral, Llama, GPT, etc.).

Differences from captioner_gemini (Gemini): - Uses openai.OpenAI with a custom base_url pointing at Azure - Response format is a JSON schema dict (OpenAI structured outputs) - Column names differ: image_name and reference instead of downloaded_image_name and reference_sentences - Retries on "error" keys in already-written JSON files

Input CSV columns required¶

image_name, caption, reference

Output¶

One JSON file per row in output_dir, named <image_name>.json:

{
    "panels": [
        {
            "panel":                   "a",
            "visualization_category":  "Microscopy",
            "visualization_subtype":   "SEM",
            "subcaption":              "...",
            "summary":                 "..."
        },
        ...
    ]
}

class CaptionResult(n_total: 'int' = 0, n_success: 'int' = 0, n_error: 'int' = 0, n_skipped: 'int' = 0, output_dir: 'str' = '')[source]¶

n_error: int = 0¶

n_skipped: int = 0¶

n_success: int = 0¶

n_total: int = 0¶

output_dir: str = ''¶

captioner(csv_path: str | Path, output_dir: str | Path, api_key: str | None = None, azure_endpoint: str = '', model_name: str = 'Mistral-Large-3', max_tokens: int = 4096, max_retries: int = 4, overwrite: bool = False, image_name_col: str = 'downloaded_image_name', caption_col: str = 'caption', reference_col: str = 'reference_sentences', requests_per_minute: int | None = None, verbose: bool = True) → CaptionResult[source]¶

Generate sub-captions for every row in csv_path using an Azure model.

Parameters:

csv_path – CSV with columns image_name, caption, reference (column names overridable via the *_col parameters).
output_dir – Directory where one JSON per figure is written.
api_key – Azure API key. Falls back to AZURE_API_KEY env var.
azure_endpoint – Azure OpenAI-compatible endpoint URL.
model_name – Model deployment name (e.g. "Mistral-Large-3", "gpt-4o").
max_tokens – Max output tokens. Uses max_completion_tokens for OpenAI models, max_tokens for Mistral/Llama.
max_retries – Retry attempts on API error or existing "error" JSON.
overwrite – Re-generate even if output JSON already exists and has no error.
reference_col (image_name_col / caption_col /) – Column name overrides for non-standard CSVs.
requests_per_minute – If set, throttle API calls to this rate by sleeping between requests. If None (default), no extra throttling beyond retry back-off.
verbose – Print progress.

Return type:

CaptionResult