this post was submitted on 05 Oct 2024
430 points (96.9% liked)
Programmer Humor
32715 readers
645 users here now
Post funny things about programming here! (Or just rant about your favourite programming language.)
Rules:
- Posts must be relevant to programming, programmers, or computer science.
- No NSFW content.
- Jokes must be in good taste. No hate speech, bigotry, etc.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Yeah, this is the problem with frankensteining two systems together. Giving an LLM a prompt, and giving it a module that can interpret images for it, leads to this.
The image parser goes "a crossword, with the following hints", when what the AI needs to do the job is an actual understanding of the grid. If one singular system understood both images and text, it could hypothetically understand the task well enough to fetch the information it needed from the image. But LLMs aren't really an approach to any true "intelligence", so they'll forever be unable to do that as one piece.