Danbooru

Why do we frequently have multiples of the same picture?

Posted under General

This has been bugging me for the same time. I'm talking about stuff like post #2044775 and its child. These two images are literally the exact same besides filesize and source site. Why have both of them? If one of them were a revision of the other (and I've seen this before) then I could understand it but these two appear to be the exact same picture?

I've even seen stuff where a post has a child that's the same image but only like, half the res. Why even keep the lower-res version if the objectively superior high-res version has been uploaded?

From a whole different topic but in this case you have a twitter and a pixiv upload.
I quote fossilnix from this different topic:
"The trouble with the second proposal is that Twitter recompresses images. In the case of PNGs, it just adds junk data and throws off the MD5 matching; but for JPGs, there's usually noticeable artifacting.[...] when the image is uploaded somewhere that doesn't recompress, such as Pixiv."

The main points are: Twitter recompresses images and twitter upload are going to have artifacts. Pixiv doesn't do that. Or Deviantart, Tumblr etc.

Tumblr actually resizes images, something Twitter mostly doesn't do. I usually upload images from Twitter when it's the only source of the image that has the highest resolution but at the same time having acceptable quality. And Danbooru has been keeping duplicates of images since the beginning. It just deletes duplicates that are clearly image samples from Pixiv and other sites. There was a mod who used to completely purge duplicates from the database but he got called out on it years after he stopped going on this website.

Chagen46 said:

I've even seen stuff where a post has a child that's the same image but only like, half the res. Why even keep the lower-res version if the objectively superior high-res version has been uploaded?

"Deleted" images aren't completely deleted, they're basically just more hidden from view. So, unless Albert (or whoever he pays to host Donmai) starts running out of hard drives, deleting duplicates wouldn't do much for the site itself.

That said, my not-actually-OCD does flare up and make me want to start up a purge once in a while, but I contain myself.

A note for OP. post #2044775 parent and child posts were uploaded on different time within same day. If you see it clearly, the child post were actually uploaded earlier than parent post.

This is a particular tendency that Japanese artists have. Some of them prefer to upload their new works on their Twitter first, and after some time upload it on their Pixiv. Basically, their priorities are Twitter > Pixiv. Not only that, sometimes the artist won't upload their new work to Pixiv at all, partially because of personal reasons, even though they already upload it on Twitter. This uncertainties is the one that leads to many parent-child duplicate post, because we don't know when or will he/she uploads their new works on Pixiv. Basically, better upload first then ask questions later (for Danbooru user).

Yes, I'm trying to justify the reason why there are so many duplicate parent-child posts. In my opinion, if they're uploaded in different time, and the Twitter version is earlier than Pixiv version, then it's still within the acceptable range. However, it's different case if the Twitter version were uploaded later after Pixiv version (unless the Twitter version have distinctive part compared with Pixiv version, f.e. adding background, render version, fixing bad anatomy etc.)

Sacriven said:

A note for OP. post #2044775 parent and child posts were uploaded on different time within same day. If you see it clearly, the child post were actually uploaded earlier than parent post.

This is a particular tendency that Japanese artists have. Some of them prefer to upload their new works on their Twitter first, and after some time upload it on their Pixiv. Basically, their priorities are Twitter > Pixiv. Not only that, sometimes the artist won't upload their new work to Pixiv at all, partially because of personal reasons, even though they already upload it on Twitter. This uncertainties is the one that leads to many parent-child duplicate post, because we don't know when or will he/she uploads their new works on Pixiv. Basically, better upload first then ask questions later (for Danbooru user).

Yes, I'm trying to justify the reason why there are so many duplicate parent-child posts. In my opinion, if they're uploaded in different time, and the Twitter version is earlier than Pixiv version, then it's still within the acceptable range.

Right. That's also the reason why twitter uploads should get approved (when the pixiv upload would also be on approval and approved by someone) and shouldn't get deleted when the pixiv upload came in later. This is verfiable which post was uploaded earlier.
But that's a different topic :3.

Well, sometimes there are changes that not everyone would notice.

To give an example, I remember uploading post #2031348 when there was a very similar version already uploaded. At first glance, the two images look the same. But if you look more closely, you can see that the previously uploaded one has more compression artifacts, and there are some slight color differences (the bottom of the staff is one of the most obvious places you can see a change in color). Comparing the metadata of the two files reveals that the parent was encoded with higher jpeg quality settings. I know that not everyone cares about a difference that small, but some of us do (myself included).

When I'm uploading a revision or alternate source with different md5, and I don't immediately see a difference between the two images, I check if the pixel values are the same (by converting both files to raw rgb data and checking if they are byte-wise identical), and if they are I don't upload it. But not everyone knows or cares how to do that, and I'd rather lean towards making sure we have the best available version of everything, even if it results in redundant images sometimes.

1