AI language models can exceed PNG and FLAC in lossless compression, says study

FlickOfTheBean@beehaw.org · 9 months ago

AI language models can exceed PNG and FLAC in lossless compression, says study

skip0110@lemm.ee · edit-2 9 months ago

I think this model has billions of weights. So I believe that means the model itself is quite large. Since the receiver needs to already have this model, I’d suggest that rather than compressing the data, we have instead pre encoded it, embedded it in the model weights, and thus the “compression” is just basically passing a primary key that points to the data to be compressed in the model.

It’s like, if you already have a copy of a book, I can “compress” any text in that book into 2 numbers: a page offset, and a word offset on that page. But that’s cheating because, at some point, we had to transfer to book too!

puttputt@beehaw.org · 9 months ago

Yeah, it’s like saying I can “compress” a png of the Mona Lisa to just the string “Mona Lisa” because I have a database of art.

Coffee Junky ❤️@beehaw.org · 9 months ago

I feel it’s somewhere in the middle. Like your book example only works if you already have the book. If this is a model that is a few gigabytes of data, but it works for every movie or audio file it can still be useable. In that case it’s not that you have to send the book first, but you do need to have the same dictionary.