Captioner¶

matmmextract.inference.captioner_gemini¶

Generate per-panel sub-captions using Gemini, from figure captions and reference sentences extracted during the XML extraction step.

Input¶

A CSV with columns: downloaded_image_name, caption, reference_sentences, download_status.

Output¶

One JSON file per row in output_dir, named <downloaded_image_name>.json, with schema:

{
    "panels": [
        {
            "panel":                    "a",
            "visualization_category":  "Microscopy",
            "visualization_subtype":   "SEM",
            "subcaption":              "...",
            "summary":                 "..."
        },
        ...
    ]
}

class CaptionResult(n_total: 'int' = 0, n_success: 'int' = 0, n_error: 'int' = 0, n_skipped: 'int' = 0, output_dir: 'str' = '')[source]¶

n_error: int = 0¶

n_skipped: int = 0¶

n_success: int = 0¶

n_total: int = 0¶

output_dir: str = ''¶

captioner(csv_path: str | Path, output_dir: str | Path, api_key: str | None = None, model_name: str = 'gemini-3.1-flash-lite', max_tokens: int = 4096, max_retries: int = 4, overwrite: bool = False, requests_per_minute: int | None = None, verbose: bool = True) → CaptionResult[source]¶

Generate sub-captions for every successfully downloaded figure.

Parameters:

csv_path – Figure CSV with columns downloaded_image_name, caption, reference_sentences, download_status.
output_dir – Directory where one JSON per figure is written.
api_key – Google Gemini API key. Falls back to GOOGLE_API_KEY env var.
model_name – Gemini model string.
max_tokens – Max output tokens per request.
max_retries – Retry attempts on API error.
overwrite – Re-generate even if the output JSON already exists.
requests_per_minute – If set, throttle API calls to this rate by sleeping between requests. If None (default), no extra throttling beyond retry back-off.
verbose – Print progress.

Return type:

CaptionResult