Ask your ISP, not danbooru.
Also ask the utorrent forums, not danbooru.
Posted under General
May I ask for a seed please? Torrents are dead.
http://dedicated.yosome.org/torrents/0-3_07June2009_update.torrent
http://dedicated.yosome.org/torrents/4-7_07June2009_update.torrent
http://dedicated.yosome.org/torrents/8-b_07June2009_update.torrent
http://dedicated.yosome.org/torrents/c-f_07June2009_update.torrent
To bump this thread: I've made a new torrent of Danbooru with images up to 30 November 2017 and the BigQuery JSON metadata. (2.9m images, 1.9tb, 50m tags.) This is meant primarily for machine learning, but hopefully it'll be useful for whatever else people might want.
I consider it alpha because there may be some problems with the SQL version we made from the JSON metadata, but if anyone would like to test it out before a formal release and get a jump on the bulk of the downloading, you can get the 10 .torrent files from https://mega.nz/#!QHIxCA5D!Izw5vxZo11upkHNITvuKZ3mmjf8BhabJOwdPMYF5kD4
Let me know about any problems with using the data or client incompatibilities.
Updated
@gwern-bot said:
To bump this thread: I've made a new torrent of Danbooru with images up to 30 November 2017 and the BigQuery JSON metadata. (2.9m images, 1.9tb, 50m tags.) This is meant primarily for machine learning, but hopefully it'll be useful for whatever else people might want.
I consider it alpha because there may be some problems with the SQL version we made from the JSON metadata, but if anyone would like to test it out before a formal release and get a jump on the bulk of the downloading, you can get the 10 .torrent files from https://mega.nz/#!QHIxCA5D!Izw5vxZo11upkHNITvuKZ3mmjf8BhabJOwdPMYF5kD4
Let me know about any problems with using the data or client incompatibilities.
Interesting, I hope it's useful for your purposes. You may consider ruling out some of the tags in help:third-party edit or posts marked as status:deleted.
Often posts are deleted because they have a third-party problem and the post was just marked as "poor quality" or "breaks rules" by approvers or it's in the flag reason. Those poor quality/breaks rules messages only stay up for a month or two until the log of it is removed.
Also did you download tags for the images as well? It would be useful for any project to have them as well as a means of updating them to remove and add alongside the site.
Additional filtering might not be a bad idea. ('waifu2x' also for the reasons I outline in my comments in that thread.) I haven't done it by default, though - that would've complicated the download and if it turns out to be an issue, people can do it easily in SQL queries.
And yes, the tags are included: they're present as the JSON export of the BigQuery mirror of the Danbooru database, and then converted to a (hopefully more convenient) Sqlite3 database. So a developer could use a little commandline tool to simultaneously edit the local SQL and push the same change to the live Danbooru website with appropriate POSTs in the API.
To update this: the JSON we used turned out to be the wrong BQ dump (oops) and was badly out of date. We pulled the correct up-to-date one but the schema changed and is substantially more complex so the SQL conversion script is broken ATM. We did, however, dump the December images while we were messing with it, so now it's 2.94m images with 77.5m tags. This should be an official/final version of it. I hope it's useful.
Writeup: https://www.gwern.net/Danbooru2017
The new torrents: https://www.gwern.net/docs/anime/danbooru2017-torrent.tar.xz
Also added a SFW subset of images downscaled to 512x512px (241GB): apparently for a lot of ML folks, the full 1.9TB size and the NSFW images were deal-breakers. (I was hoping to not have to offer multiple versions and get a single swarm as big as possible, but oh well.) The torrent for that is available at https://www.gwern.net/docs/anime/danbooru2017-sfw512px-torrent.tar.xz
Updated
Yes, no cropping and aspect ratio is preserved; the empty space caused by shrinking the biggest dimension to 512px is filled in with a black background (since JPG doesn't do transparency). The exact ImageMagick call goes (from 'rescale_image.sh'):
convert -resize 512x512\> -extent 512x512\> -limit thread 1 -gravity center -background black "$@" ./512px/$BUCKET/$ID.jpg
Sometimes artists create lineart on a transparent background, which renders as solid black if you use black as the fill color. See transparent_background lineart and solid black thumbnail for some examples of this (see also: issue #1239). White backgrounds, while not immune to this issue, have fewer problems.
Huh. Checking a few 512px examples locally, that is indeed what happened. That's unfortunate. I can't recall the 512px torrent because a lot of people are already on it. But it can't be *that* common an issue since I did browse several hundred images (checking to make sure I hadn't screwed up the SFW filtering) and didn't notice it... (A few hundred if 'transparent_background lineart' is reasonably comprehensive.) I guess for the next version I'll switch to white backgrounds even though it'll invalidate most of the files and I think it looks a little worse. Alternately, users of it could simply do an image quality check on average brightness or number of distinct colors?
For users having problems with the mega-torrents, I've added a public rsync server which should be more reliable: https://www.gwern.net/Danbooru2017#rsync
An update on uses of Danbooru2017:
- the drawing-colorizer Paintschainer (https://github.com/lllyasviel/style2paints) recommends its use for training the colorizing NN
- yu45020's "Text Segmentation and Image Inpainting (https://github.com/yu45020/Text_Segmentation_Image_Inpainting) project, with the goal of automatically erasing text in manga/anime images for scanlation, uses it
- the thesis "Application of Generative Adversarial Network on Image Style Transformation and Image Processing" (https://cloudfront.escholarship.org/dist/prd/content/qt66w654x7/qt66w654x7.pdf), Wang 2018, uses it for anime<->face CycleGAN conversion; samples of photographic human faces turned into anime faces are... somewhat recognizable but not very good
- the paper "Improving Shape Deformation in Unsupervised Image-to-Image Translation" (https://arxiv.org/abs/1808.04325), Gokaslan et al 2018, does something similar but on more sets of images (eg face<->cat); the anime<->face conversions (pg11) are, IMO, much better than in Wang 2018
A particularly awesome use of Danbooru2017 is as part of the training corpus for style2paints V4: https://github.com/lllyasviel/style2paints The colorizing has gotten amazing. V3 paper: https://github.com/lllyasviel/style2paints/blob/master/papers/sa.pdf
Also possibly of some interest is deeppomf's "DeepCreamPy: Decensoring Hentai with Deep Neural Networks" https://github.com/deeppomf/DeepCreamPy .
As this thread is getting long, I've announced the release of Danbooru2018 in a new thread: topic #15864