this post was submitted on 15 Dec 2023
402 points (95.3% liked)

Privacy

31876 readers
365 users here now

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

Related communities

Chat rooms

much thanks to @gary_host_laptop for the logo design :)

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 60 points 11 months ago (6 children)

Do people seriously still think this is a thing?

Literally anyone can run the basic numbers on the bandwidth that would be involved, you have 2 options:

  1. They stream the audio out to their own servers which process is there. The bandwidth involved would be INSTANTLY obvious, as streaming audio out is non-trivial and anyone can pop open their phone to monitor their network usage. You'd hit your data limit in 1-2 days right away

  2. They have the app always on and listening for "wakewords", which then trigger the recording and only then does it stream audio out. WakewordS plural is doing a LOT of heavy lifting here. Just 1 single wakeword takes a tremendous amount of training and money, and if they wanted the countless amount of them that would be required for what people are claiming? We're talking a LOT of money. But thats not all, running that sort of program is extremely resource intensive and, once again, you can monitor your phones resource usage, you'd see the app at the top burning through your battery like no tomorrow. Android and iPhone both have notifications to inform you if a specific app is using a lot of battery power and will show you this sort of indicator. You'd once again instantly notice such an app running.

I think a big part of this misunderstanding comes from the fact that Alexa/Google devices seem so small and trivial for their wakewords.

What people dont know though is Alexa / Google Home have an entire dedicated board with its own dedicated processor JUST for detecting their ONE wake word, and not only that they explicitly chose a phrase that is easy to listen for

"Okay Google" and "Hey Alexa" have a non-trivial amount of engineering baked into making sure they are distinct and less likely to get mistaken for other words, and even despite that they have false positives constantly.

If thats the amount of resources involved for just one wake word/phrase, you have to understand that targeted marking would require hundreds times that, its not viable for your phone to do it 24/7 without also doubling as a hand warmer in your pocket all day long.

[–] [email protected] 26 points 11 months ago* (last edited 11 months ago) (1 children)

The point of OK Google is to start listening for commands, so it needs to be really good and accurate. Whereas, the point of fluffy blanket is to show you an ad for fluffy blankets, so it can be poorly trained and wildly inaccurate. It wouldn’t take that much money to train a model to listen for some ad keywords and be just accurate enough to get a return on investment.

(I’m not saying they are monitoring you, just that it would probably be a lot less expensive than you think.)

[–] [email protected] 13 points 11 months ago* (last edited 11 months ago) (1 children)

If it's random sampled no one would notice. "Oh my battery ran low today." Tomorrow it's fine.

Google used to (probably still does) A/B test Play services that caused battery drain. You never knew if something was wrong or you were the unlucky chosen one out of 1000 that day.

Bandwidth for voice is tiny. The amr-wb standard is 6.6 kbits/second with voice detection. So it's only sending 6 kbits/ when it detects voice.

Given that a single webpage today averages 2 megabytes, an additional 825 bytes of data each second could easily go unnoticed.

[–] [email protected] 5 points 10 months ago

It’s insane people still believe voice takes up heaps of bandwidth.

Even moreso, on device you could just speech to text, and send the text back home. That’s like.. no data. Undetectable.

Even WITH voice, like you said, fuckin tiny amounts of data for today’s tech.

This is why I’ll never have “smart” anything in my house.

[–] [email protected] 13 points 11 months ago* (last edited 11 months ago)

This is simply not true. Low bit compressed audio is small amounts of bandwidth you would never notice on home internet. And recognizing wakewords? Tiny, tiny amounts of processing. Google's design is for accuracy and control, a marketing team cares nothing about that. They'll use an algorithm that just grabs everything.

Yes, this would be battery intensive on phones when not plugged in. But triggering on power, via CarPlay, or on smart speakers is trivial.

I'm still skeptical, but not because of this.

Edit: For creds: Developer specializing in algorithm creation and have previously rolled my own hardware and branch for MyCroft.

[–] [email protected] 7 points 11 months ago

FYI, sd 855 from 2019 could detect 2 wake words at the same time. With the exponential power increase in npus since then it wouldn't be shocking if newer ones can detect hundreds

[–] [email protected] 6 points 11 months ago

But what about a car? Cars are as smart as smartphones now, and you certainly wouldn't notice the small amount of power needed to collect and transfer data compared to driving the car. Some car manufacturer TOS agreements seemingly admit that they collect and use your in-car conversations (including any passengers, which they claim is your duty to inform them they are being recorded). Almost all the manufacturers are equally bad for privacy and data collection.

Mozilla details what data each car collects here.

[–] [email protected] 4 points 10 months ago (1 children)

What you're saying makes sense, but I can't believe nobody has bought up the fact that a lot of our phones are constantly listening for music and displaying the song details on our lock screen. That all happens without the little green microphone active light and minimal battery and bandwidth consumption.

I know next to nothing about the technology involved, but it doesn't seem like it's very far from listening for advertising keywords.

[–] [email protected] 2 points 10 months ago

That uses a similar approach to the wake word technology, but slightly differently applied.

I am not a computer or ML scientist but this is the gist of how it was explained to me:

Your smartphone will have a low-powered chip connect to your microphone when it is not in use/phone is idle to run a local AI model (this is how it works offline) that asks one thing: is this music or is it not music. Anyway, after that model decides it's music, it wakes up the main CPU which looks up a snippet of that audio against a database of other audio snippets that correspond to popular/likely songs, and then it displays a song match.

To answer your questions about how it's different:

  • the song id happens on a system level access, so it doesn't go through the normal audio permission system, and thus wouldn't trigger the microphone access notification.

  • because it is using a low-powered detection system rather than always having the microphone on, it can run with much less battery usage.

  • As I understand it, it's a lot easier to tell if audio seems like it's music than whether it's a specific intelligible word that you may or may not be looking for, which you then have to process into language that's linked to metadata, etc etc.

  • The initial size of the database is somewhat minor, as what is downloaded is a selection of audio patterns that the audio snippet is compared against. This database gets rotated over time, and the song id apps often also allow you to send your audio snippet to the online megadatabases (Apple's music library/Google's music library) for better protection, but overall the data transfer isn't very noticeable. Searching for arbitrary hot words cannot be nearly as optimized as assistant activations or music detection, especially if it's not built into the system.

And that's about it....for now.

All of this is built on current knowledge of researchers analysing data traffic, OS functions, ML audio detection, mobile computation capabilities, and traditional mobile assistants. It's possible that this may change radically in the near future, where arbitrary audio detection/collection somehow becomes much cheaper computationally, or generative AI makes it easy to extrapolate conversations from low quality audio snippets, or something else I don't know yet.