#1
Posted 24 May 2016 - 10:37 PM
#2
Posted 24 May 2016 - 11:00 PM
#4
Posted 24 May 2016 - 11:26 PM
File compression works in a number of ways using some type of algorithm. From what I have seen there are two major types, Algorithms that lessen redudency and (attempt to)lessen file size, and algorithms that raise redudency and keep file size about the same. For the rest of this I will refer to entropy instead of redudency . The entropy of a file determines how "compressable" it is entropy is essentially how random the data in a file is. The more you compress it the greater the entropy and thus harder it is to compress further.
The most common type of compression I see is called RLE or run length encoding. This is one of the simplest in my opinion to implement while still being quite efficient. This essentially stores the "count" of a run of same bytes so it is instead saved as count, byte or something similar. The most common compression algorithms I see in major file types are also Huffman and LZW. Huffman is a variable bit encoding algorithm. This essentially makes various characters take up more or less than 8 bits depending on how common they are. LZW I haven't actually programmed into anything yet so I think I will let bomb bloke explain that one since he knows it very well and I know he will show up. More minor algorithms include the BWT and MTF. These aim to lower entropy instead of lowering file size. This way other algorithms such as RLE can compress further and more effectively. There are tons more algorithms designed for specific situations. If you would like to see MOST of the ones I listed in Lua to help you understand exactly how everything is working feel free to see the below paste.
compression algorithms
#5
Posted 24 May 2016 - 11:58 PM
#6
Posted 25 May 2016 - 01:05 AM
#7
Posted 25 May 2016 - 01:10 AM
No idea what you want to compress (if you want to compress something at all), you might be able to find something similar done previously by someone else you can learn a lot from.
You might also want to look into something like Inter Frame Compression, compression of Video data. There are a lot of Algorithms for different types of data to compress stuff, you can learn something from each of them, too.
#8
Posted 25 May 2016 - 01:28 AM
There are loads of different ways of doing this, and there are pros and cons to each.
For example, RLE is a very simple scheme, whereby you simply tally up "runs" of identical bytes, and replace them with a few codes saying "repeat this byte this many times". On a pure-white image, for example, or a text file full of spaces, whatever, that sort of compression will shrink things down nicely - and hardly any RAM or CPU power is required to perform it.
But what if the original data starts out a bit denser than that? For example, what if you created a text document out of a thousand completely random characters? There would hardly be any runs at all - and so attempting to apply RLE might even increase the file size!
This is where you might perform a Burrows-Wheeler Transform - this rather clever technique allows you to reshuffle the content of a set of data, such that the size remains the same, the result can be reversed, and runs become more common. This is not a form of compression in itself, but it does lower entropy within the file (creating low-density pockets of information), meaning that a real compression technique (such as RLE) can be used more effectively. You still pay for it - a BWT chews a bit of RAM and processor time.
Then you've got other techniques. LZW, for example, is a dictionary based system that works by building a table filled with streams of bytes, and replacing matches for those streams within the data to compress with just the table index values. The clever bit is that the dictionary table doesn't need to be saved alongside the output - the decompressor can figure out that table's content based on the order of the indexes found in the compressed file! Again, more work for the processor and RAM chips, but dramatically better results than basic RLE can provide.
LZ77 is similar, only the dictionary file IS written to disk (making things a lot simpler for the decompressor). But if multiple files are compressed using the one dictionary file, you can still achieve a smaller total file size over all.
Then you've got Huffman's, which essentially removes slack space within bytes by replacing them with binary trees. That technique also uses a dictionary, and needs to include it alongside the compressed output.
When push comes to shove, though, there's a maximum amount of information density that you can sink into a given number of bytes. One byte can only represent 256 different values, after all - your choice of compression technique basically defines which values those are. Hence some compression techniques are better for some distributions of data, and some compression techniques are better for others. There's no one single, perfect method that suits all use cases ideally.
#9
Posted 25 May 2016 - 03:38 PM
#11
Posted 27 May 2016 - 01:35 AM
This:
local returnValue = "" for i=1,#rotations do returnValue = returnValue..string.sub(rotations[i],#rotations[i]) coroutine.yield() end return returnValue
... would technically go faster like this (assuming any appreciable amount of data):
local returnValue = {} for i=1,#rotations do local thisRotation = rotations[i] returnValue[i] = thisRotation:sub(#thisRotation) end return table.concat(returnValue)
... but a far better speed gain is available for your decoder, where you only need to perform one sort (and don't need to populate a table with full-sized strings, either). This article covers the relevant technique.
#13
Posted 27 May 2016 - 01:28 PM
#14
Posted 27 May 2016 - 02:25 PM
Did you actually test it in-game? I've come to suspect LuaJ is a lot more tolerant than some other VMs out there. (Related: MC 1.8.9 has a much better garbage collector system than 1.7.10.)
If the worst comes to worst, you could always just... not store rotated copies of the string. Build a table with numbers 1 through to #stringLength in it instead, then sort it like this:
function encode(theString) local theString = theString .. markChar local numbersTable, stringLen = {}, #theString for i = 1, stringLen do numbersTable[i] = i end table.sort(numbersTable, function(a, b) a = theString:sub(stringLen - a + 1) .. theString:sub(1, stringLen - a) b = theString:sub(stringLen - b + 1) .. theString:sub(1, stringLen - b) return a < b end) local results = {} for i = 1, stringLen do local thisNum = stringLen - numbersTable[i] if thisNum == 0 then thisNum = stringLen end results[i] = theString:sub(thisNum, thisNum) end return table.concat(results) end
I wouldn't expect it to operate at the same speed (there should theoretically be quite a few more function calls in there), but it ought to reduce the stress on your RAM.
You can often swap processing time for memory space in this sort of manner.
Edited by Bomb Bloke, 27 May 2016 - 02:36 PM.
#15
Posted 27 May 2016 - 02:31 PM
#16
Posted 27 May 2016 - 02:36 PM
Yeah, um, maybe go with the code above, then.
Edited by Bomb Bloke, 27 May 2016 - 02:44 PM.
#17
Posted 27 May 2016 - 02:38 PM
#18
Posted 27 May 2016 - 09:32 PM
Now I just need to adjust my compression algorithm so that it can handle repetition of characters more than 255 chars in a row.
Edited by Selim, 28 May 2016 - 03:09 AM.
#19
Posted 09 June 2016 - 06:01 PM
#20
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users