Hi everyone. I'm releasing the second edition of my Danbooru mirror/dataset, Danbooru2018:
https://www.gwern.net/Danbooru2018
This updates my previous Danbooru2017 (https://danbooru.donmai.us/forum_topics/8276?page=4) through 31 December 2018.
The dataset now contains ~2.5tb of 3.33m images with 92.7m tag. This includes all the original images plus the 'safe' subset downscaled to 512px (for easier use in machine learning applications) and the BigQuery mirror metadata (topic #12774) as JSON files. Compared to Danbooru2017, this adds +0.4TB/392k images/15.2m tags.
You can download it via two sets of torrents (the preferred method), or via a rsync mirror.
Please let me know if you run into any problems like bad BitTorrent client versions.
An example use of the dataset is training https://github.com/lllyasviel/style2paints , which is an impressive neural net tool for colorizing anime images.
Updated