Technology

62783 readers

3124 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

210

MIT scientists have just figured out how to make the most popular AI image generators 30 times faster (www.livescience.com)

submitted 10 months ago by [email protected] to c/[email protected]

44 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 10 months ago (5 children)

There is no single “whole” image when talking about a video game. It’s a combination of dynamic layers carefully interacting with each-other.

Of course there is. When everything is done a whole image is sent to the display to show. That's how FSR 1 can work without explicit game support.

[–] [email protected] 1 points 10 months ago* (last edited 10 months ago) (4 children)

What i ment is that the final image is dynamic so players may have a unique configuration which makes it harder for ai to understand whats going on.

Using the final render of each frame would cause a lot of texture bleeding for example when a red character stands in front of a red background. Or is jumping on top of an animal, you may have wild frames where the body shape drastically changes or is suddenly realistically riding the animal then petting it the next frame to then have it die on frame 3, all because every frame is processed as its own work.

Upscaling final renders is indeed possible but mostly because it doesnt change things all that much of the general shapes, Small artifacts are also very common here but often not noticeable by the human eye and dont effect a modern game.

In older games, especially mario where hitboxes are pixel dependent youd either have a very confusing games with tons of clipping because the game doesn’t consider the new textures or it abides to new textures affecting the gameplay.

Source: i have studied game development and have recreated mario era games as part of assignments, currently i am self-studying the technological specifics of how machine learning and generative algorithms operate.

[–] [email protected] 3 points 10 months ago (3 children)

Those are valid points, but nothing there is insurmountable with even little bit of advancement.

For example this is a relatively old example from 2021, before any of the dall-e2 and stable diffusion and video consistency models were out. https://www.youtube.com/watch?v=P1IcaBn3ej0

Is this perfect? No, there are artifacts and the quality just matches the training driving dataset, but this was an old specialized example from a completely different (now archaic) architecture. But the newer image generation models are much better, and frame-to-frame consistency models are getting better by week and some of them are nearly there (obviously not in real-time).

About the red-on-red bleed/background separation etc. issues: for 3d rendered games it's relatively straightforward to get not just the color but depth and normal maps from the buffer (especially assuming this will be done with the blessing of the game developers/game engines/directX or other APIs). I don't know if you follow the developments but for example using ControlNet with StableDiffusion it is trivial to add color, depth, normal map, pose, line outline, or a lot more other constraints on the created image, so if the character is wearing red over a red background, that is separated by the depth map and the generated image will also have the same depth, or their surface normals would be different. You can use whatever aspects of the input game as constraint in the generation.

I am not saying we can do this right now, the generation speeds for high quality images, plus any other required tools in the workflow (from understanding the context and caption generation/latent space matching, to getting these color/depth/normal constraints, to do temporal consistency using the previous N frames to generating the final image, and doing it all fast enough) obviously have a ton of challenges involved. But, it is quite possible, and I fully expect to see working non-realtime demos within a year or couple years at the most.

In 2D games, it may be harder due to pixelation. As you said there are upscaling algorithms that work more or less well to increase the resolution slightly, nothing photorealistic obviously. There are also methods such as these using segmenting algorithms to split the images and generating new ones using AI generators: https://github.com/darvin/X.RetroGameAIRemaster

To be honest to make 2D games in a different style you can do much better, even now. Most of the retro games have their sprites and backgrounds already extracted from their original games, you can just upscale once (by the producer, or fan-edits), and then you don't even need to worry about the real time generation. I wanted to upscale Laura Bow 2 this way for example. One random example I just found: https://www.youtube.com/watch?v=kBFMKroTuXE

Replacing the sprites/backgrounds won't make them really photorealistic with dynamic drop shadows and lighting changes, but once the sprites are in enough resolution then you can feed them into the full-frame re-generation frame by frame. But then I probably don't want Guybrush to look like Gabriel Knight 2 or other FMV games so not sure about that angle.

load more comments (2 replies)