Danbooru

Tumblr sources (also bookmarklet-related)

Posted under General

RaisingK said:

Oops. The scan I put together didn't account for tumblr sample posts being replaced with something other than the 'raw' source, so some of the posts @Mikaeri did such replacements for ended up getting deleted a second time.

RIP. That's fine, I was sort of anticipating that happening anyway. @BrokenEagle98 might need to rerun his scripts on those in that case, along with any others it may have potentially happened to.

post #2753690
post #2746728
post #2763253

Will be appealing these later, or I might just bug another approver to get them easy approvals. Regarding my thoughts on this, see forum #133018.

evazion said:

In other words, there are cases where 1) changing to _raw doesn't work and 2) the _500 version is more complete than the _1280 version. Let's just be careful what we do until we sort all this out.

I don't know if it's known already, but images earlier than a certain date in 2013 can't have the raw version obtained.

Dredging this topic up since it feels like a good place to discuss a recent issue - the previous method for obtaining raw Tumblr images has been struck down by the heavens.

https://i.imgur.com/BNXtZ4i.png

^ This is the error message I (and presumably anyone else who doesn't work at Tumblr) get when attempting to access any raw Tumblr image via the data.tumblr.com URL. For reference, I was testing using this image:

https://78.media.tumblr.com/d2ffac558323321e789fa6e350a77be5/tumblr_pd7564v5gY1uarmx0o1_1280.png (still works)
https://data.tumblr.com/d2ffac558323321e789fa6e350a97be5/tumblr_pd7564v5gY1uarmx0o1_raw.png (fucking dead)

This change also appears to have broken the bookmarklet when uploading Tumblr posts, so that'll probably need fixing.

The _raw urls are kill. Contacting support, they expressly state "we no longer offer this feature". No further explanation; other people are coming up with reasons (bandwidth, storage, patreon artists uploading full-size images unaware that they're not downsampled and throwing a fit afterwards).
There existed an a.tumblr.com endpoint for a short while, but it's gone as well.
A moment of silence for _raw images, I suppose.

Tumblr recently stopped doing lossless uploads, so I'd guess they wanted to stop huge raw images as well. (JPG images are now usually re-encoded though better than Twitter, opaque PNG become JPG, transparent PNG untouched)

I used the bookmarklet to upload post #3286469, and ended up getting an md5 mismatch, so wondering if there's a known issue here I'm not aware of since I haven't been keeping track of Tumblr stuff (other than knowing they ditched "raw").

Apparently these two sources are different
https://media.tumblr.com/f2148fe29dd93ecf67a8b7dfb64a1084/tumblr_pbtpb14aiT1re8kupo1_1280.jpg <- one I uploaded
https://66.media.tumblr.com/f2148fe29dd93ecf67a8b7dfb64a1084/tumblr_pbtpb14aiT1re8kupo1_1280.jpg <- one that is supposedly the "correct" source

The links look almost exactly the same except for the "66." at the start. Is there any difference in compression? I can't tell with my eyes if there is.

I'm seeing that the picture at the first link is 309 KB and that the second link with "66" at the start is 321 KB. Putting both into some comparison software, there's no visual difference between the two being shown.

Someone else more familiar with compression might be able to help out here more than I can.

Just a PSA, but Tumblr is a terrible site to upload from. They usually have different images on all of their CDN nodes, and it's really impossible to get the one true image, especially now that they've blocked access to the RAW images.

BrokenEagle98 said:

Just a PSA, but Tumblr is a terrible site to upload from. They usually have different images on all of their CDN nodes, and it's really impossible to get the one true image, especially now that they've blocked access to the RAW images.

Yeah, I already realize it's a "when there's no better option" case at this point.

Just curious to have it clarified which links should be preferred. I was wondering if it should be replaced but if they're essentially identical images, then maybe not a big priority.

EB said:

Yeah, I already realize it's a "when there's no better option" case at this point.

Just curious to have it clarified which links should be preferred. I was wondering if it should be replaced but if they're essentially identical images, then maybe not a big priority.

The media.tumblr.com links are the preferable ones, and the ones that Danbooru uses.

FWIW, my script checks for matches from both the plain media links (as above) and the numbered media (e.g. 66.media.tumblr.com), and if it finds IQDB matches from both (which it will), then it preferably chooses the one with highest SauceNAO score (which won't exist yet), and as a last resort picks the first one with the highest image size (which since the numbered image links are numbered first will usually get selected).

What I can do though is to have it test just the plain media links first, so that when it does run into the above situation again, it'll use the plain media links as the suggested source.

1 2 3 4 5 6