Springer Download¶
- build_candidate_urls(image_url: str, xml_file: str) list[str][source]¶
Build ordered candidate URLs for a Springer image.
If image_url is already absolute, return it directly. Otherwise, build CDN candidates using the DOI reconstructed from xml_file.
- Parameters:
image_url – Raw URL string from the figure CSV.
xml_file – Source XML filename (used to reconstruct the DOI for relative URLs).
- Returns:
Ordered candidate URLs to try.
- Return type:
list[str]
- download_all(csv_path: str, output_dir: str = 'springer_images_flat', output_csv: str = 'springer_figure_details_with_images.csv', log_file: str = 'download_log_springer.csv', name_prefix: str = 'img', max_workers: int = 6, verbose: bool = True) DataFrame[source]¶
Download all Springer figure images referenced in csv_path.
- Parameters:
csv_path – Figure CSV (output of
extract_all()).output_dir – Directory to save downloaded images.
output_csv – Updated CSV written after each batch.
log_file – Per-URL download log for resume support.
name_prefix – Image filename prefix.
max_workers – Thread pool size.
verbose – Print progress.
- Returns:
Updated figure DataFrame with download status columns added.
- Return type:
pd.DataFrame