Compression for Vic-20
Posted: Sat Feb 17, 2018 2:40 pm
I have been thinking of decompression on-the-fly for our Vic-20 and this is my scheme. It is somewhat different from byte pairing, but contains tables for that as well. I am probably reinventing the wheel, but its fun to think about and try out different schemes for speed/compression ratio (and code size).
The first half byte of the Codebyte (e.g. bit 7-4) is a special 4-bit value that specifies length (in number of bytes). Lets call this value Codelength
The second half of the codebyte contains 4-bit branch value. There are different things that happen based on this value. Lets call this value Codebranch.
If
Codelength=0 --> No compression. Codebranch contains number of recurring bytes that are not compressed.
Codelength=1 --> Codebranch = pointer to table of 16-bit values (up to 16 different)
Codelength>1 --> Codebranch = pointer to table of 8-bit values that repeat "Codelength" times.
E.g. each "segment" of coded data is defined by a length byte. And each segment has 32 bytes of 16-bit values and up to 16 bytes of 8-bit values. Thus it makes no sense to have a segment of less that 32 Codebytes. Since the tables take 48 bytes each, there is probably a limit on how small a segment can be and still make sense for compression. So we can probably use add a constant like +48 to the segment length.
Example
- First Codebyte is $04, meaning Codelength=0 and Codebranch=4. E.g. we have 5 uncompressed bytes following this.
- After the first 6 bytes, the next codebyte is $42. Codelength=4 means that byte 2 in table of 8-bit values is repeated 4 times.
- The eight byte (Codebyte) is $12. This means that we have to look up second word in 16-bit table and insert that.
And so on..
Thus (the above) compressed this is 8 bytes + 3 bytes in table (11 bytes), while uncompressed it becomes 6+4+2 bytes=12 bytes. Not very promising, but we have to remember that the 16-bit table is based on recurring 16-bit combinations, so it can be found several places (and the 2 bytes of the table happen only once), and thus we get better compression over larger code segments.
The decompression code could be something like this:
Not so large.
So is this very different from other compression algoritms on the Vic-20? What are the alternatives and how large are the decompression code for them?
Code: Select all
Byteno Data
1 Segment length
2 First Codebyte
3-> First data byte(s) or second Codebyte.. and so on.
The second half of the codebyte contains 4-bit branch value. There are different things that happen based on this value. Lets call this value Codebranch.
If
Codelength=0 --> No compression. Codebranch contains number of recurring bytes that are not compressed.
Codelength=1 --> Codebranch = pointer to table of 16-bit values (up to 16 different)
Codelength>1 --> Codebranch = pointer to table of 8-bit values that repeat "Codelength" times.
E.g. each "segment" of coded data is defined by a length byte. And each segment has 32 bytes of 16-bit values and up to 16 bytes of 8-bit values. Thus it makes no sense to have a segment of less that 32 Codebytes. Since the tables take 48 bytes each, there is probably a limit on how small a segment can be and still make sense for compression. So we can probably use add a constant like +48 to the segment length.
Example
- First Codebyte is $04, meaning Codelength=0 and Codebranch=4. E.g. we have 5 uncompressed bytes following this.
- After the first 6 bytes, the next codebyte is $42. Codelength=4 means that byte 2 in table of 8-bit values is repeated 4 times.
- The eight byte (Codebyte) is $12. This means that we have to look up second word in 16-bit table and insert that.
And so on..
Thus (the above) compressed this is 8 bytes + 3 bytes in table (11 bytes), while uncompressed it becomes 6+4+2 bytes=12 bytes. Not very promising, but we have to remember that the 16-bit table is based on recurring 16-bit combinations, so it can be found several places (and the 2 bytes of the table happen only once), and thus we get better compression over larger code segments.
The decompression code could be something like this:
Code: Select all
LDY $segmentbyte ;=length
STY $1
LDX #$0
STX $2 ; position in data (decompressed)
mainloop:
LDA $segmentbyte+1,X
INX
STX $3 ; position in codesegment (uncompressed)
TAY
AND #$f0
CMP #$0
BEQ nocompress
CMP #$1
BNE compress
TYA
AND #$f
TAY
LDX $2
LDA $table16,Y
STA $data,X
INX
LDA $table16+1,Y
STA $data,X
INX
STX $2
JMP next
nocompress:
TYA
AND #$f
CLC
ADC $2
STA comp+1 ; Byte to compare
LDY $2
LDX $3
loop:
LDA $segmentbyte+1,X
STA $data,Y
INX
INY
comp
CPY #00
BNE loop
STY $2
STX $3
JMP next
compress:
TAX ; length
TYA
AND #$f
TAY
LDA $table8,Y
TXA
TAY
CLC
ADC $2
TAX
loop2:
STA $data,X
INX
DEY
BNE loop2
STX $2
next:
LDA $3
CMP $1
BEQ exit
JMP mainloop
exit:
RTS
So is this very different from other compression algoritms on the Vic-20? What are the alternatives and how large are the decompression code for them?