Azure Captioner

matmmextract.inference.captioner_azure

Generate per-panel sub-captions using Azure-hosted models via the OpenAI-compatible API (Mistral, Llama, GPT, etc.).

Differences from captioner_gemini (Gemini): - Uses openai.OpenAI with a custom base_url pointing at Azure - Response format is a JSON schema dict (OpenAI structured outputs) - Column names differ: image_name and reference instead of downloaded_image_name and reference_sentences - Retries on "error" keys in already-written JSON files

Input CSV columns required

image_name, caption, reference

Output

One JSON file per row in output_dir, named <image_name>.json:

{
    "panels": [
        {
            "panel":                   "a",
            "visualization_category":  "Microscopy",
            "visualization_subtype":   "SEM",
            "subcaption":              "...",
            "summary":                 "..."
        },
        ...
    ]
}
class CaptionResult(n_total: 'int' = 0, n_success: 'int' = 0, n_error: 'int' = 0, n_skipped: 'int' = 0, output_dir: 'str' = '')[source]
n_error: int = 0
n_skipped: int = 0
n_success: int = 0
n_total: int = 0
output_dir: str = ''
captioner(csv_path: str | Path, output_dir: str | Path, api_key: str | None = None, azure_endpoint: str = '', model_name: str = 'Mistral-Large-3', max_tokens: int = 4096, max_retries: int = 4, overwrite: bool = False, image_name_col: str = 'downloaded_image_name', caption_col: str = 'caption', reference_col: str = 'reference_sentences', requests_per_minute: int | None = None, verbose: bool = True) CaptionResult[source]

Generate sub-captions for every row in csv_path using an Azure model.

Parameters:
  • csv_path – CSV with columns image_name, caption, reference (column names overridable via the *_col parameters).

  • output_dir – Directory where one JSON per figure is written.

  • api_key – Azure API key. Falls back to AZURE_API_KEY env var.

  • azure_endpoint – Azure OpenAI-compatible endpoint URL.

  • model_name – Model deployment name (e.g. "Mistral-Large-3", "gpt-4o").

  • max_tokens – Max output tokens. Uses max_completion_tokens for OpenAI models, max_tokens for Mistral/Llama.

  • max_retries – Retry attempts on API error or existing "error" JSON.

  • overwrite – Re-generate even if output JSON already exists and has no error.

  • reference_col (image_name_col / caption_col /) – Column name overrides for non-standard CSVs.

  • requests_per_minute – If set, throttle API calls to this rate by sleeping between requests. If None (default), no extra throttling beyond retry back-off.

  • verbose – Print progress.

Return type:

CaptionResult