Captioner

matmmextract.inference.captioner_gemini

Generate per-panel sub-captions using Gemini, from figure captions and reference sentences extracted during the XML extraction step.

Input

A CSV with columns: downloaded_image_name, caption, reference_sentences, download_status.

Output

One JSON file per row in output_dir, named <downloaded_image_name>.json, with schema:

{
    "panels": [
        {
            "panel":                    "a",
            "visualization_category":  "Microscopy",
            "visualization_subtype":   "SEM",
            "subcaption":              "...",
            "summary":                 "..."
        },
        ...
    ]
}
class CaptionResult(n_total: 'int' = 0, n_success: 'int' = 0, n_error: 'int' = 0, n_skipped: 'int' = 0, output_dir: 'str' = '')[source]
n_error: int = 0
n_skipped: int = 0
n_success: int = 0
n_total: int = 0
output_dir: str = ''
captioner(csv_path: str | Path, output_dir: str | Path, api_key: str | None = None, model_name: str = 'gemini-3.1-flash-lite', max_tokens: int = 4096, max_retries: int = 4, overwrite: bool = False, requests_per_minute: int | None = None, verbose: bool = True) CaptionResult[source]

Generate sub-captions for every successfully downloaded figure.

Parameters:
  • csv_path – Figure CSV with columns downloaded_image_name, caption, reference_sentences, download_status.

  • output_dir – Directory where one JSON per figure is written.

  • api_key – Google Gemini API key. Falls back to GOOGLE_API_KEY env var.

  • model_name – Gemini model string.

  • max_tokens – Max output tokens per request.

  • max_retries – Retry attempts on API error.

  • overwrite – Re-generate even if the output JSON already exists.

  • requests_per_minute – If set, throttle API calls to this rate by sleeping between requests. If None (default), no extra throttling beyond retry back-off.

  • verbose – Print progress.

Return type:

CaptionResult