Duplicate script is being ignored

Posted under Tags

@nonamethanks created a script which scans images & automatically applies the duplicate tag if it finds a pixel-perfect duplicate.

I was made aware of this when I uploaded post #4335603 which I only uploaded as I thought "it's a larger file size, it must be a better version of post #4335173" - yet it was automatically marked as duplicate and not approved.
Once the script was explained to me on Discord, I noticed that there were several more images which are thus should not have been uploaded (cutesexyrobutts duplicate).
Yet 5 of 6 were re-approved by @NWF_Renim

This is very discouraging to see as a normal uploader - feels like moderation is a total crap shoot.

AFAIK current policy is to not flag duplicates if they're already active.

It would be pretty bad if we started flagging duplicates that are already active out of nowhere. Punishing people with deletions for inadvertently uploading a duplicate when we've been accepting them for 15 years with no real repercussion is not a good idea. If anything, a check should be implemented server-side so that duplicate uploads cannot be submitted.

I have to agree with nonamethanks, the fact that you have to rely on external tools/sites that not every user knows about to find out if an image is a duplicate (barring the very obvious cases) doesn't help our case. If instead Danbooru itself warned about it I would completely agree on flagging duplicates uploaded after that was implemented, though stopping them from being uploaded at all would be even better.

I reapproved them because the flag reason is currently not a valid reason for flagging, and the notice when creating a flag explicitly states that duplicate is not a valid flag reason. (As for why it was 5 out of 6, I flagged the parent of the 6th instead because of anatomy issues). Furthermore there is not a quality difference between the posts, as they're pixel-perfect duplicates, so there is no reason to out of the blue punish users on this. Necessary steps would need to be made first to make it justifiable that such posts can be flagged, which would require changing the current rules and then making a formal announcement so the public can be aware of it.

nnt's suggestion of an implementation that prevents pixel-perfect duplicates from being uploaded should be the way we go on this.

nonamethanks said:

Punishing people with deletions for inadvertently uploading a duplicate when we've been accepting them for 15 years with no real repercussion is not a good idea.

It would be better all around if there were some means of deleting posts for valid ex post facto reasons like this that doesn't result in punishment for the uploader. Having a bunch of rules that apply based on post age is just confusing and this one seems particularly arbitrary.

That said, I recognize it would take development effort and that makes it a harder sell.

Edit: A possibly simple fix to reduce the confusion factor from there being a bunch of duplicates on the site already - how hard would it be to hide posts tagged "duplicate" from member-level users, like we hide various other tags from them?

7HS said:

Edit: A possibly simple fix to reduce the confusion factor from there being a bunch of duplicates on the site already - how hard would it be to hide posts tagged "duplicate" from member-level users, like we hide various other tags from them?

I'm pretty sure the worst offenders of uploading duplicates are actually builders, especially long-time ones. Hiding duplicates from users who most of the time probably do it inadvertently isn't going to have any effect on the higher ranked users that do it habitually.

blindVigil said:

I'm pretty sure the worst offenders of uploading duplicates are actually builders, especially long-time ones.

This is correct. Of 7366 duplicates uploaded in the last year, about 70% (5166) were uploaded by builders.

In any case, blacklisting by default the duplicate tag as I mentioned in forum #180469 would have unintended consequences, so we really don't want to do it until (if ever) there'll be a proper system in place to verify which posts are really duplicates.

I feel that it is very much a User vs. Unrestricted issue.

I accidentally uploaded two dupes recently. Wasn't being careful with double-checking the artist's existing entries and a 7kb and 40kb difference meant the "similar to" function didn't work. Both posts are rightfully deleted.

The issue arises when you see someone with unrestricted uploads make the same mistake and be functionally "immune" to deletion. Or when someone uploads something and it gets approved before the bot tags and parents it. I agree that it's a bit discouraging.

At least in my opinion, duplicate should only be for pixel-perfect duplicates, or a new tag should be created it distinguish between them. At least part of the problem probably comes from people assuming that smaller file size -> compressed -> will have compression artifacts, and not be because of hidden quirks of image software. I lean towards a “pixel-perfect duplicate” tag simply because it’s self explanatory and would then be easily justifiable as a default blacklist item. In contrast, the current duplicate entry seems to imply that it includes, but is not limited to pixel-perfect duplicates. Thus, confusion.

SoulEvansEater said:

At least in my opinion, duplicate should only be for pixel-perfect duplicates, or a new tag should be created it distinguish between them. At least part of the problem probably comes from people assuming that smaller file size -> compressed -> will have compression artifacts, and not be because of hidden quirks of image software. I lean towards a “pixel-perfect duplicate” tag simply because it’s self explanatory and would then be easily justifiable as a default blacklist item. In contrast, the current duplicate entry seems to imply that it includes, but is not limited to pixel-perfect duplicates. Thus, confusion.

I would like to hear @evazion's opinion before creating a pixel-perfect duplicate or pixel duplicate tag. It would be trivial to mantain it on my side, but it'd still be a completely new tag with thousands of posts.

nonamethanks said:

I would like to hear @evazion's opinion before creating a pixel-perfect duplicate or pixel duplicate tag. It would be trivial to mantain it on my side, but it'd still be a completely new tag with thousands of posts.

Oh yeah, for sure. The upshot is that, with enough time + compute resources, it wouldn't be intractable to programmatically sift through and sort duplicate into pixel-perfect and pixel-imperfect bins. I know that intuitively, I always tend to think that I know better than programmers because of compression artifacts. Having a tag make the distinction would have answered my own curiosity, at least.

1