WalnutLum

joined 11 months ago
[–] [email protected] 3 points 3 months ago (2 children)

There's nothing stopping an analog clock face from representing 24h time:

Image

[–] [email protected] 1 points 3 months ago (5 children)

I wonder how many people feel this way about writing when everyone just types/texts everything.

[–] [email protected] 20 points 3 months ago

One of the few practical things AI might be good at:

https://github.com/CorentinJ/Real-Time-Voice-Cloning

[–] [email protected] 9 points 3 months ago

Facebook is trying to burn the forest around OpenAI and other closed models by removing the market for "models" by themselves, by releasing their own freely to the community. A lot of money is already pivoting away towards companies trying to find products that use the AI instead of the AI itself. Unless OpenAI pivots to something more substantial than just providing multimodal prompt completion they're gonna find themselves without a lot of runway left.

[–] [email protected] 8 points 3 months ago

NewPipe can do peertube as well

[–] [email protected] 2 points 3 months ago (1 children)

I was never able to get appreciably better results from 11 labs than using some (minorly) trained RVC model :/ The long scripts problem is something pretty much any text-to-something model suffers from. The longer the context the lower the cohesion ends up.

I do rotoscoping with SDXL i2i and controlnet posing together. Without I found it tends to smear. Do you just do image2image?

[–] [email protected] 2 points 3 months ago (3 children)

Coqui for TTS, RVC UI for matching the TTS to the actor's intonation, and DWPose -> controlnet applied to SDXL for rotoscoping

[–] [email protected] 5 points 3 months ago (5 children)

All the models I've used that do TTS/RVC and rotoscoping have definitely not produced professional results.

[–] [email protected] 1 points 3 months ago* (last edited 3 months ago)

This isn't really accurate either. At the moment of generation, an LLM only has context for the input string and the network of text tokens it's been assigned. It pulls from a "pool" of these tokens based on what it's already output and the input context, nothing more.

Most LLMs have what are called "Top P", "Top K" etc, these are the number of tokens that it ends up selecting from based on the previous token, alongside the input tokens. It then randomly chooses one based on temperature settings.

It's why if you turn these models' temperature settings really high they output pure nonsense both conceptually and grammatically, because the tenuous thread linking the previous token's context to the next token has been widened enough that it completely loses any semblance of cohesiveness.

[–] [email protected] 3 points 3 months ago

Why would I save something for posterity when I could save it for posterior?

[–] [email protected] 1 points 3 months ago (3 children)

Saving this comment for posterior

[–] [email protected] 4 points 4 months ago

I'd say mostly energy savings and CPU usage efficiency

view more: ‹ prev next ›