this post was submitted on 10 Aug 2023
11 points (100.0% liked)

Programmer Humor

19187 readers
1164 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

founded 1 year ago
MODERATORS
 
top 9 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 1 year ago

Chaotic evil is encrypting, compressing, then encrypting again.

[–] [email protected] 1 points 1 year ago

Don’t know about gz but zip files can be encrypted using passwords

[–] [email protected] 1 points 1 year ago

The encryption: base64 encoding

[–] [email protected] 0 points 1 year ago* (last edited 1 year ago) (1 children)

Anyone willing to drop some learning on a lay person? Is encrypted data less compressable because it lacks the patterns compression relies on?

Or, is it less secure to encrypt first because smart people things? I know enough about cryptography to know I know fuck all about cryptography.

[–] [email protected] 0 points 1 year ago (1 children)

It isn’t compressible at all, really. As far as a compression algorithm is concerned, it just looks like random data.

Imagine trying to compress a text file. Each letter normally takes 8 bits to represent. The computer looks at 8 bits at a time, and knows which character to display. Normally, the computer needs to look at all 8 bits even when those bits are “empty” simply because you have no way of marking when one letter stops and another begins. It’s all just 1’s and 0’s, so it’s not like you can insert “next letter” flags in that. But we can cut that down.

One of the easiest ways to do this is to count all the letters, then sort them from most to least common. Then we build a tree, with each character being a fork. You start at the top of the tree, and follow it down. You go down one fork for 0 and read the letter at your current fork on a 1. So for instance, if the letters are sorted “ABCDEF…” then “0001” would be D. Now D is represented with only 4 bits, instead of 8. And after reading the 1, you return to the top of the tree and start over again. So “01000101101” would be “BDBAB”. Normally that sequence would take 40 bits to represent, (because each character would be 8 bits long,) but we just did it in 11 bits total.

But notice that this also has the potential to produce letters that are MORE than 8 bits long. If we follow that same pattern I listed above, “I” would be 9 bits, “J” would be 10, etc… The reason we’re able to achieve compression is because we’re using the more common (shorter) letters a lot and the less common (longer) letters less.

Encryption undoes this completely, because (as far as compression is concerned) the data is completely random. And when you look at random data without any discernible pattern, it means that counting the characters and sorting by frequency is basically a lesson in futility. All the letters will be used about the same, so even the “most frequent” characters are only more frequent by a little bit due to random chance. So now. Even if the frequency still corresponds to my earlier pattern, the number of Z’s is so close to the number of A’s that the file will end up even longer than before. Because remember, the compression only works when the most frequent characters are actually used most frequently. Since there are a lot of characters that are longer than 8 bits and those characters are being used just as much as the shorter characters our compression method fails and actually produces a file that is larger than the original.

[–] [email protected] 1 points 1 year ago

I understood this despite being drunk, thank you for the excellent explanation!

[–] [email protected] 0 points 1 year ago (1 children)

Don't compress encrypted data since it opens you up to attacks like CRIME, unless it's at rest and static data.

[–] [email protected] 1 points 1 year ago

If that's true, what's to stop someone else from just compressing it themself and opening the same attack vector?

[–] [email protected] -1 points 1 year ago

Doesn't actually matter