Technology

34904 readers

1142 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago

MODERATORS

[email protected]

Why is txt2img AI so bad with hands? (lemmygrad.ml)

submitted 1 year ago by [email protected] to c/[email protected]

10 comments fedilink hide all child comments

Pls explain

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 1 points 1 year ago (2 children)

That's true but I would have thought that the models would be able to "understand" hands because I'm assuming they have seen millions of photographs with hands in them by now.

[–] [email protected] 1 points 1 year ago

Sure, and if they were illustrative of hands, you'd get good hands for output. But they're random photos from random angles, possibly only showing a few fingers. Or maybe with hands clasped. Or worse, two people holding hands. If you throw all of those into the mix and call them all hands, a mix is what you're going to get out.

Look at this picture: https://petapixel.com/assets/uploads/2023/03/SD1131497946_two_hands_clasped_together-copy.jpg

You can sort of see where it's coming from. Some parts look like a handshake, some parts look like two people standing side by side holding hands (both with and without fingers interlaced), some parts look like one person's hands on their knee. It all depends on how you're constructing the image, and what your input data and labeling is.

Stable Diffusion works by changing individual pixels until it looks reasonable enough, not looking at the macro scale of the whole image. Other methods, like whatever dalle2 uses, seem to work better.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago)

I think it's helpful to remember that the model doesn't have a skeleton, its literally skin deep. It doesn't understand hands, it understands pixels. Without an understanding of the actual structure all the AI can do is guess where the pixels go based on other neighboring pixels.