Pythonで画像をスクレイピングするサンプルコード

漫画ページのコミックをダウンロードするPythonコード
スクリプトの解説
まとめ

漫画ページのコミックをダウンロードするPythonコード

以下のコードは「http://xkcd.com」というサイトからBeautifulSoupモジュールを使って画像の要素を見つけ「xkcd」というフォルダに画像を保存します。画像の保存が全て終わったら次のページに推移します。

import requests, os, bs4

url = 'http://xkcd.com'
os.makedirs('xkcd', exist_ok=True)

while not url.endswith('#'):
    # ページをダウンロードする
    print('ページをダウンロード中 {}...'.format(url))
    res = requests.get(url)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text, 'html.parser')

    # コミック画像のURLを見つける
    # id="comic"のimgタグ
    comic_elem = soup.select('#comic img')
    if comic_elem == []:
        print('コミック画像が見つかりませんでした')
    else:
        comic_url = 'http:' + comic_elem[0].get('src')
        # 画像をダウンロードする
        print('画像をダウンロード中 {}...'.format(comic_url))
        res = requests.get(comic_url)
        res.raise_for_status()

        # 画像を./xkcdに保存する
        image_file = open(os.path.join('xkcd', os.path.basename(comic_url)), 'wb')
        for chunk in res.iter_content(100000):
            image_file.write(chunk)
        image_file.close()

    # prevボタンのURLを取得する
    prev_link = soup.select('a[rel="prev"]')[0]
    url = 'http://xkcd.com' + prev_link.get('href')

print('完了')