Code compression with bytecode

Basic and Machine Language

Moderator: Moderators

FD22
Vic 20 Hobbyist
Posts: 148
Joined: Mon Feb 15, 2010 12:31 pm

Re: Code compression with bytecode

Post by FD22 »

Fastest way to initialise .A, .X and .Y to zero, if you don't mind undocumented opcodes:

Code: Select all

  ZAX              // 2 bytes, 2 cycles (encode as .word $00AB)
  TAY              // 1 byte, 2 cycles
ZAX is 'Zero .A and .X', using the zero-as-operand variant of LAX - which is the undocumented 'Load .A and .X' instruction. LAX is unstable in immediate mode, except when the operand is zero.
User avatar
pixel
Vic 20 Scientist
Posts: 1329
Joined: Fri Feb 28, 2014 3:56 am
Website: http://hugbox.org/
Location: Berlin, Germany
Occupation: Pan–galactic shaman

Re: Code compression with bytecode

Post by pixel »

johncl wrote: … Then I checked out exomizer and it was able to squeeze those sprites down even further than my own highly specialized routine. :)
I am very impressed by exomizer's performance as well – it always pays out to read a good book about the topic, I guess. Didn't get around to hack that threaded code idea to see if it's worth to grab all memory exomizer would need instead. Rather depends on the nature of your application.
johncl wrote: I know a lot of what could be optimized through some bytecode thing can be optimized away at a later stage, like figuring out when you can leave out the clc before an adc, although in many cases my adc's are often used to really add the y register being used in an indirect lda/sta in the loop. So there are bit of these (tya, clc, adc #NN, tay) to be found - if not for the variable NN I guess I could do a jsr to save some bytes at the expense of speed. A deflated bytecode version would ofc be faster except for the time deflation happens ofc.
In the end compression works because of repetitive patterns. Kicking out CLCs and such could be quite counter-productive.
johncl wrote: Anyway an interesting thought experiment. :)
I'm glad to read that. :)
johncl wrote: Atm I am happy just finding (as you have seen in my posts) what memory I can use - and I try to do short optimizations like a print routine that does not need zero termination but rather use a bit in the string bytes to indicate termination (or special code for e.g. a jump or color change). The tradeoffs are somewhat vague though at times where perhaps a ROM routine would be easier to use, although I tend to try to avoid having to set full word pointers but bunch text into blocks that can be indexed by a byte.

Edit: Checking the 6502 opcode table I see that low nybble values $3, $7, $b and $f are unused (lower two bits set of opcode) so nice byte keys to use as deflater keys. No doubt exomizer does any compression better so I am sure this is somewhat silly to contemplate. :) - But I see a number of "patterns" in my code when I look at it - like ldx before its being used in a sta/lda NN,x (same with y). Of course the many lda NN, sta MM (and similar for x,y). And lets not forget the initialization of a,x,y (varying number of these) before calling some subroutine.

LDXSTA NN,MM = ldx #NN , sta MM,x

Yay! Saved one byte! :)
Tried messing around with code like that. It's hopeless. Exomizer just got the math right already.

I found that writing clean, simple code, like you got all memory in the world, and then tweaking it with an overly motivated mind, is the only way to get something out of it when resources are scarce. My one and only working 6502 program *cough* is quite a proof of that. At least for me. Did it the other way around more than once too often.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
johncl
Vic 20 Amateur
Posts: 58
Joined: Sat Dec 22, 2007 3:17 am

Re: Code compression with bytecode

Post by johncl »

Well, for the other C64 project I am working on I use exomize to unpack each level/regions sprites as needed and the depacker code takes around 256 bytes including some custom code and pointer tables I have to set up which block should be unpacked. In addition it needs 156 bytes temporary memory which can be scrapped after use (so you can e.g. use the screen area for this). Depending on your project it might work even in a 5kb unexpanded Vic20 (depacker code then takes 5% of memory), and certainly for packing the whole executable if you want to set up zero page with code as well which you would normally need a two part loader to use.

I believe the idea of bytecode real-time uncompression then needs a depacker that is both faster and takes less space than exomize, which could prove tricky to devise. Some sort of byte pattern huffman-like thing might work and no doubt it would be fun to just try to implement something like this just to see how slow it would run on e.g a 1kb heap for the unpacked "modules". The speed would ofc be heavily dependent on how much of the inner code would reside in this heap at any given time. There wont be room for any profiling and if you start marking priorities I guess you might as well never compress those routines. :)
Post Reply