Elsevier Download

build_candidate_urls(image_url: str, xml_file: str) list[str][source]

Return candidate download URLs for an Elsevier image.

Elsevier image URLs in the figure CSV are absolute CDN URLs, so this is a direct passthrough. The xml_file argument is unused but kept for interface compatibility with the shared engine.

download_all(csv_path: str, output_dir: str = 'elsevier_contents', output_csv: str = 'elsevier_figures_with_images.csv', log_file: str = 'download_log_elsevier.csv', api_key: str | None = None, inst_token: str | None = None, name_prefix: str = 'img', max_workers: int = 4, verbose: bool = True) DataFrame[source]

Download all Elsevier figure images referenced in csv_path.

Parameters:
  • csv_path – Figure CSV (output of extract_all()).

  • output_dir – Directory to save downloaded images.

  • output_csv – Updated CSV written after each batch.

  • log_file – Per-URL download log for resume support.

  • api_key – Elsevier API key (falls back to ELSEVIER_API_KEY env var).

  • inst_token – Elsevier institutional token (falls back to ELSEVIER_INST_TOKEN).

  • name_prefix – Image filename prefix.

  • max_workers – Thread pool size.

  • verbose – Print progress.

Returns:

Updated figure DataFrame with download status columns added.

Return type:

pd.DataFrame