Denial

Posted: **Sat Feb 17, 2018 2:40 pm**

I have been thinking of decompression on-the-fly for our Vic-20 and this is my scheme. It is somewhat different from byte pairing, but contains tables for that as well. I am probably reinventing the wheel, but its fun to think about and try out different schemes for speed/compression ratio (and code size).

Code: Select all

Byteno  Data
   1	 Segment length
   2    First Codebyte
   3->  First data byte(s) or second Codebyte.. and so on.

The first half byte of the Codebyte (e.g. bit 7-4) is a special 4-bit value that specifies length (in number of bytes). Lets call this value Codelength

The second half of the codebyte contains 4-bit branch value. There are different things that happen based on this value. Lets call this value Codebranch.

If
Codelength=0 --> No compression. Codebranch contains number of recurring bytes that are not compressed.
Codelength=1 --> Codebranch = pointer to table of 16-bit values (up to 16 different)
Codelength>1 --> Codebranch = pointer to table of 8-bit values that repeat "Codelength" times.

E.g. each "segment" of coded data is defined by a length byte. And each segment has 32 bytes of 16-bit values and up to 16 bytes of 8-bit values. Thus it makes no sense to have a segment of less that 32 Codebytes. Since the tables take 48 bytes each, there is probably a limit on how small a segment can be and still make sense for compression. So we can probably use add a constant like +48 to the segment length.

Example
- First Codebyte is $04, meaning Codelength=0 and Codebranch=4. E.g. we have 5 uncompressed bytes following this.
- After the first 6 bytes, the next codebyte is $42. Codelength=4 means that byte 2 in table of 8-bit values is repeated 4 times.
- The eight byte (Codebyte) is $12. This means that we have to look up second word in 16-bit table and insert that.

And so on..

Thus (the above) compressed this is 8 bytes + 3 bytes in table (11 bytes), while uncompressed it becomes 6+4+2 bytes=12 bytes. Not very promising, but we have to remember that the 16-bit table is based on recurring 16-bit combinations, so it can be found several places (and the 2 bytes of the table happen only once), and thus we get better compression over larger code segments.

The decompression code could be something like this:

Code: Select all

LDY $segmentbyte	;=length
STY $1
LDX #$0
STX $2			; position in data (decompressed)
mainloop:
LDA $segmentbyte+1,X
INX
STX $3			; position in codesegment (uncompressed)
TAY
AND #$f0
CMP #$0
BEQ	nocompress
CMP #$1
BNE	compress
TYA
AND #$f
TAY
LDX $2
LDA $table16,Y
STA $data,X
INX
LDA $table16+1,Y
STA $data,X
INX
STX $2
JMP next
nocompress:
TYA
AND #$f
CLC
ADC $2
STA comp+1	; Byte to compare
LDY $2
LDX $3
loop:
LDA $segmentbyte+1,X
STA $data,Y
INX
INY
comp
CPY #00
BNE loop
STY $2
STX $3
JMP next
compress:
TAX	; length
TYA
AND #$f
TAY
LDA $table8,Y
TXA
TAY
CLC
ADC $2
TAX
loop2:
STA $data,X
INX
DEY
BNE loop2
STX $2
next:
LDA $3
CMP $1
BEQ exit
JMP mainloop
exit:
RTS

Not so large.

So is this very different from other compression algoritms on the Vic-20? What are the alternatives and how large are the decompression code for them?

Posted: **Mon Feb 19, 2018 4:34 pm**

If you're OK with 156 bytes for a table and about 1.5 to 2 times of your decruncher code you might wanna go with exomizer. It's got excellent compression ratios and it's "fast". AFAICS it's using LZSS compression. Unless you dedicate your own algorithm to a very specific kind of data I doubt that you'd get better compression ratios but much more speed.

Arukanoido is decompressing on the fly a lot (especially when playing the original arcade sounds). That, got to give credit here, is mostly with help of a shrunk-down version of exomizer's decruncher.

Posted: **Thu Feb 22, 2018 2:44 pm**

Kakemoms wrote:I have been thinking of decompression on-the-fly for our Vic-20 and this is my scheme. It is somewhat different from byte pairing, but contains tables for that as well. I am probably reinventing the wheel, but its fun to think about and try out different schemes for speed/compression ratio (and code size).

Data compression is very interresting indeed! You have to decide what you want your algorithm to excell in. Then to see if your heuristic algorithm is any good, perhaps implement a model in a higher level language, like C, and try it out on some typical data.

Posted: **Fri Feb 23, 2018 2:44 am**

It can be quite instructive to design and implement an own (de)compressor.

For the second installation of the Bible Series I reused a streaming text compression scheme I once devised on the Acorn Archimedes, and where I took the idea from an article in Dr. Dobb's around 1994: it compresses three bytes from a 32-entry alphabet (i.e. 5 bits per letter) into two bytes and flags the top-most bit of the resulting 16-bit word as marker. Non-compressed bytes below 128 are transmitted normally. Input bytes with bit 7 set are quoted with a ESC character, which is also used to flag repeated bytes. Most English texts compress well (by ~30%), the decompressor is fast, small and very simple. The compression rate was sufficient to bring the text body of the Pentateuch below 800 KB (i.e. the capacity of a 3 1/2" disk for 1581).

Another method involves code generators. They can be eminently useful for graphics applications. For example, my VFLI panning viewer and, lately, my 320x200 CGA viewer both employ such a code generator to produce the actual display routine. In case of the CGA viewer, the resulting code is ~16K in size and is produced from a ~350 bytes small generator. This gives a compression ratio of 1:46. Not too bad, eh?

Also for 6502 code, I've read about a preprocess step which can improve the compression. It separates opcodes, low- and highbytes into three streams of opcode bytes, operand low-bytes and operand high-bytes. That preprocess step can easily be reverted during decompression. Presumably, the opcode stream is highly structured, the low-bytes probably less so, but the high-bytes are supposed to be easily compressable by RLE and/or have only a small range of values.

Posted: **Tue Feb 27, 2018 2:22 pm**

Mike wrote:This gives a compression ratio of 1:46. Not too bad, eh?

Also for 6502 code, I've read about a preprocess step which can improve the compression. It separates opcodes, low- and highbytes into three streams of opcode bytes, operand low-bytes and operand high-bytes. That preprocess step can easily be reverted during decompression. Presumably, the opcode stream is highly structured, the low-bytes probably less so, but the high-bytes are supposed to be easily compressable by RLE and/or have only a small range of values.

I haven't played much with code generators, but that is probably the best compression ratio ever!

Interesting idea about dividing opcodes, low and high bytes. As you say, low bytes are probably mostly random, but high bytes will often point to the same area (registers, screen, area of memory). As you say, opcodes are often structured (like LDA + STA + DEX + BNE) which would make it more efficient by having often used substitutes (e.g. 3- or 4-byte combinations).

Very interesting indeed! I will have to look more into that.

Posted: **Wed Mar 28, 2018 6:49 am**

Kakemoms wrote:I haven't played much with code generators, but that is probably the best compression ratio ever!

Similar results can be had with careful use of self-modifying code, especially in graphics routines, where time-costly case selections in inner loops are avoided by simply putting the correct instruction there instead. For example, take this code snippet from my high-speed (>30000 pixels/second) Bresenham line routine - it is 8 times unrolled, so all bit positions (%10000000, %01000000, ..., %00000001) are covered:

Code: Select all

5.5     LDA (base),Y
  2     EOR #bit     ; ** or AND/ORA
  6     STA (base),Y
  2     DEX
  2     BEQ end
  3     LDA error
  3     SBC dy
3/2 +-- BCS
0/3 |   ADC dx
0/2 |   INY          ; ** or DEY
  3 +-> STA error

29.5/33.5 cycles/pixel

**) self-modifying code

This is for slopes with |dy/dx|<=1, slopes with |dy/dx|>1 need a slight re-arrangement of these instructions.

The setup routine first sorts the start- and endpoints so the line is always drawn from left to right, and then it makes fast checks, whether the correct instructions at the places (** top - (EOR)invert/(AND)clear [1]/(ORA)set pixel, ** bottom - INY(down)/DEY(up)) are already there - otherwise they are modified (only for the mainloop which corresponds to the slope <=1 or >1!). Finally, it calculates the (column-)byte address of the start pixel (including the offset nicely used with the (),Y address mode

) and jumps into the main loop (at the correct place for the x_start AND 7 value

).

If I'd insist on static code, I'd have to copy the two mainloops 3x2 = 6 times to cover all combinations minus a small bit of the code that performs the self-modification (but I'd still have to include the checks which of the 6 copies were to call!) - so, once again, this is a clear winner on code size (less than 1K still!) given its speed.

P.S. [1] a change between EOR/ORA and AND also implies the exchange of the immediate operand (like in %00100000 <-> %11011111).

P.P.S. Pop-Quiz: why can this code snippet omit SEC/CLC before SBC/ADC, and what is the correct state of the C flag on entry?

P.P.P.S. Pop-Quiz, second question: what is necessary to do as fixup, when all 8 bits have been processed, before continuing the main loop?

Posted: **Mon Apr 09, 2018 1:19 pm**

Wow. Simply amazing! It won't get more compact than this.

Denial

Compression for Vic-20

Compression for Vic-20

Re: Compression for Vic-20

Re: Compression for Vic-20

Re: Compression for Vic-20

Re: Compression for Vic-20

Re: Compression for Vic-20

Re: Compression for Vic-20