this post was submitted on 12 Oct 2023
23 points (84.8% liked)

Technology

59374 readers
3794 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Abstract:

Significant advancements have been achieved in the realm of large-scale pre-trained text-to-video Diffusion Models (VDMs). However, previous methods either rely solely on pixel-based VDMs, which come with high computational costs, or on latent-based VDMs, which often struggle with precise text-video alignment. In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation. Our model first uses pixel-based VDMs to produce a low-resolution video of strong text-video correlation. After that, we propose a novel expert translation method that employs the latent-based VDMs to further upsample the low-resolution video to high resolution. Compared to latent VDMs, Show-1 can produce high-quality videos of precise text-video alignment; Compared to pixel VDMs, Show-1 is much more efficient (GPU memory usage during inference is 15G vs 72G). We also validate our model on standard video generation benchmarks. Our code and model weights are publicly available at https://github.com/showlab/Show-1.

top 3 comments
sorted by: hot top controversial new old
[–] [email protected] 4 points 1 year ago (1 children)

Show-1 is an AI platform that is able to write, produce, direct, animate, and even voice entirely new episodes of TV shows. Show-1 uses different diffusion models, such as Stable Diffusion and Tortoise TTS, to generate high-quality images and speech features based on reference clips of existing shows. It also uses a multi-agent simulation to provide contextualization, story progression, and behavioral control for the characters and events in the episodes.

Maybe I'm blind, but where does it say literally any of what you've written here? As far as I can tell, this is just a text-to-video model, not any of this other stuff.

[–] [email protected] 4 points 1 year ago* (last edited 1 year ago) (1 children)

I thought I linked this as well. Whoops, I think I confused it with another thing.

[–] [email protected] 2 points 1 year ago

Haha no worries. With a name like "Show-1" it was bound to happen, honestly.