Captioner¶
matmmextract.inference.captioner_gemini¶
Generate per-panel sub-captions using Gemini, from figure captions and reference sentences extracted during the XML extraction step.
Input¶
A CSV with columns: downloaded_image_name, caption,
reference_sentences, download_status.
Output¶
One JSON file per row in output_dir, named
<downloaded_image_name>.json, with schema:
{
"panels": [
{
"panel": "a",
"visualization_category": "Microscopy",
"visualization_subtype": "SEM",
"subcaption": "...",
"summary": "..."
},
...
]
}
- class CaptionResult(n_total: 'int' = 0, n_success: 'int' = 0, n_error: 'int' = 0, n_skipped: 'int' = 0, output_dir: 'str' = '')[source]¶
- n_error: int = 0¶
- n_skipped: int = 0¶
- n_success: int = 0¶
- n_total: int = 0¶
- output_dir: str = ''¶
- captioner(csv_path: str | Path, output_dir: str | Path, api_key: str | None = None, model_name: str = 'gemini-3.1-flash-lite', max_tokens: int = 4096, max_retries: int = 4, overwrite: bool = False, requests_per_minute: int | None = None, verbose: bool = True) CaptionResult[source]¶
Generate sub-captions for every successfully downloaded figure.
- Parameters:
csv_path – Figure CSV with columns
downloaded_image_name,caption,reference_sentences,download_status.output_dir – Directory where one JSON per figure is written.
api_key – Google Gemini API key. Falls back to
GOOGLE_API_KEYenv var.model_name – Gemini model string.
max_tokens – Max output tokens per request.
max_retries – Retry attempts on API error.
overwrite – Re-generate even if the output JSON already exists.
requests_per_minute – If set, throttle API calls to this rate by sleeping between requests. If
None(default), no extra throttling beyond retry back-off.verbose – Print progress.
- Return type: