Code compression with bytecode

Basic and Machine Language

Moderator: Moderators

User avatar
pixel
Vic 20 Scientist
Posts: 1338
Joined: Fri Feb 28, 2014 3:56 am
Website: http://hugbox.org/
Location: Berlin, Germany
Occupation: Pan–galactic shaman

Code compression with bytecode

Post by pixel »

Don't know if somebody came up with this idea previously, but I'm pretty sure it didn't happen in the VIC community.

The idea is to move repeated code patterns into subroutines which share a single page and to then have them called by a bytecode interpreter. Each bytecode would be the address on that page. There'd be two kinds of subroutines: one for which the register values are preserved and one for which they aren't, so the interpreter can be extended. You probably wouldn't want to write the subroutines yourself, merely have a tool to convert most of your code. If calculated right, if I had dealt with the game logic that way, pulse would have had half a kilobyte more space left and a bunch of people would hate me now. ;)

Here's what the interpreter would look like:

Code: Select all

interpret:                                                                                                                         
.(
    ldy #0
    lda (bytecode_ptr),y
    beq done
    inc bytecode_ptr
    asl  ; drawback of subroutines only on even addresses
    bcs interpreter_extension
    sta jump+1
    lda bytecode_flags
    ldy bytecode_y
    pha
    lda bytecode_a
    plp
jump:
    jsr bytecode_funs
    sta bytecode_a
    sty bytecode_y
    php
    pla
    sta bytecode_flags
    jmp interpret
call_interpreter_extension:
    sta jump2+1
jump2:
    jsr interpreter_extensions
    jmp interpret
done:
    rts
.)

interpreter_extensions:
bc_lda:
    lda (bytecode_ptr),y
    sta bytecode_a
    inc bytecode_ptr
    rts

bc_bcs:
.(
    lda bytecode_flags
    pha
    plp
    bcc r
    lda (bytecode_ptr),y
    sta bytecode_ptr
r:  inc bytecode_ptr
    rts
.)
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
FD22
Vic 20 Hobbyist
Posts: 148
Joined: Mon Feb 15, 2010 12:31 pm

Re: Code compression with bytecode

Post by FD22 »

I've experimented with bytecode interpreters in 6502 in the past - the problem is always the tradeoff between size and speed, as inevitably the interpreter and its' child functions are slower to execute than the equivalent 'raw' opcodes.

Where space is at a premium (and on the VIC this is pretty-much all the time) I embed Exomizer to manage runtime decompression, and always slap the final executable through the commandline invocation of it just to crunch the whole lot.
User avatar
Kweepa
Vic 20 Scientist
Posts: 1314
Joined: Fri Jan 04, 2008 5:11 pm
Location: Austin, Texas
Occupation: Game maker

Re: Code compression with bytecode

Post by Kweepa »

It's called BASIC :)
groepaz
Vic 20 Scientist
Posts: 1180
Joined: Wed Aug 25, 2010 5:30 pm

Re: Code compression with bytecode

Post by groepaz »

you could try the "sweet16" interpreter... its originally from the apple2 ROM if i recall correctly, available at wozniacs web site.

and yes, BASIC is the way to go if you are really short on memory :)
I'm just a Software Guy who has no Idea how the Hardware works. Don't listen to me.
RJBowman
Vic 20 Enthusiast
Posts: 198
Joined: Tue Oct 25, 2011 7:50 pm

Re: Code compression with bytecode

Post by RJBowman »

Search for "huffman threaded code"; probably the most compact coding system ever devised, but at the expense of speed.
User avatar
pixel
Vic 20 Scientist
Posts: 1338
Joined: Fri Feb 28, 2014 3:56 am
Website: http://hugbox.org/
Location: Berlin, Germany
Occupation: Pan–galactic shaman

Re: Code compression with bytecode

Post by pixel »

Ah, cool! Thanks! So the name of the game is "threaded code"... https://en.wikipedia.org/wiki/Threaded_code

So, ermh, I wanted to propose direct threading with the size benefits of token threading. This is faster than SWEET16 or BASIC and can provide much more compression. According to the Wikipedia article you can end up with a quarter to eigth of the original code size (see "Token threading"). Generic data compression IMHO won't go much further and is less practical at run-time. I'm still very excited about this.

I wonder how much space a Huffman tree would take.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
User avatar
Kweepa
Vic 20 Scientist
Posts: 1314
Joined: Fri Jan 04, 2008 5:11 pm
Location: Austin, Texas
Occupation: Game maker

Re: Code compression with bytecode

Post by Kweepa »

Size of Huffman tree depends on the number of tokens N, the size of a token T, and the size of a jump J in the tree (which maxes out at half the size of the tree).
N * sizeof( T ) + ( N - 1 ) * sizeof( J ).
This is for N = 2^X. Might be a little different if 2^(X-1) < N < 2^X, I don't remember.
RJBowman
Vic 20 Enthusiast
Posts: 198
Joined: Tue Oct 25, 2011 7:50 pm

Re: Code compression with bytecode

Post by RJBowman »

There is an overhead in the huffman tree and code required to make it work. It is only worthwhile if the program is large enough that the compression will save more memory than the huffman overhead consumes.
User avatar
Mike
Herr VC
Posts: 4830
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: Code compression with bytecode

Post by Mike »

IMO, some people should better concentrate on putting out actual, *working* code before theorizing about alternative encoding schemes for their algorithms.

And, for most non-trivial programs, the size of the code is usually dwarfed against the size of the data being processed.
User avatar
pixel
Vic 20 Scientist
Posts: 1338
Joined: Fri Feb 28, 2014 3:56 am
Website: http://hugbox.org/
Location: Berlin, Germany
Occupation: Pan–galactic shaman

Re: Code compression with bytecode

Post by pixel »

Mike wrote:IMO, some people should better concentrate on putting out actual, *working* code before theorizing about alternative encoding schemes for their algorithms.
Sorry. :oops: Couldn't let the tempation pass by with the only VIC-20 programming forum right under my nose. And I don't wanna code all this really.
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
User avatar
Mike
Herr VC
Posts: 4830
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: Code compression with bytecode

Post by Mike »

PM sent
Vic20-Ian
Vic 20 Scientist
Posts: 1214
Joined: Sun Aug 24, 2008 1:58 pm

Re: Code compression with bytecode

Post by Vic20-Ian »

Mike wrote:IMO, some people should better concentrate on putting out actual, *working* code before theorizing about alternative encoding schemes for their algorithms.
IMHO people should be more friendly on this forum
Vic20-Ian

The best things in life are Vic-20

Upgrade all new gadgets and mobiles to 3583 Bytes Free today! Ready
rhurst
Omega Star Commander
Posts: 1371
Joined: Thu Jan 31, 2008 2:12 pm
Website: https://robert.hurst-ri.us
Location: Providence, RI
Occupation: Tech & Innovation

Re: Code compression with bytecode

Post by rhurst »

I think it's a German thing. It does explain why so much engineering efficiency comes from that demographic; but terse can be misunderstood by us regular folk. :lol:
Any technology distinguishable from magic is insufficiently advanced.
https://robert.hurst-ri.us/rob/retrocomputing
User avatar
pixel
Vic 20 Scientist
Posts: 1338
Joined: Fri Feb 28, 2014 3:56 am
Website: http://hugbox.org/
Location: Berlin, Germany
Occupation: Pan–galactic shaman

Re: Code compression with bytecode

Post by pixel »

rhurst wrote:I think it's a German thing. It does explain why so much engineering efficiency comes from that demographic; but terse can be misunderstood by us regular folk. :lol:
No worries. We deal with each other in your preferred efficient manner. ;)
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
johncl
Vic 20 Amateur
Posts: 58
Joined: Sat Dec 22, 2007 3:17 am

Re: Code compression with bytecode

Post by johncl »

Smart ways of crunching data is always welcome in my world, and even if some are just theoretical tickling of the brain - I welcome the discussion. Coming from a Java world, some sort of bytecode thing is interesting, and as several have said BASIC is generally just that, but an idea that is more optimized to whatever program you have at hand is interesting. For example my program so far already have lda #0, ldy #0 and ldx #0 about 30 times each taking 2 bytes - actually often 3 bytes including the sta, sty or stx that often comes after (but naturally into different memory). So some places bytecode is an interesting idea, although I can see it requiring dynamic deflating which would take quite a bit of time and embedding e.g exomizer to unpack bits of code needed might be a better and shorter way to the goal (which I am using in a C64 project now). I actually spent several weeks with sprite compression on the C64, even including frame difference compression (may sprites had many parts of the frame identical to the previous in an animation) - and I was really satisfied with the results. Then I checked out exomizer and it was able to squeeze those sprites down even further than my own highly specialized routine. :) - And personally I don't find the exomizer depacker very large (it requires a 156 bytes decrunch table area that can be used freely inbetween depacks). But I am uncertain that I would use it for anything in an unexpanded Vic20 except to allow depacking into areas which ROM does not load into (like the first pages), but even that is solved easily with a boot loader.

I know a lot of what could be optimized through some bytecode thing can be optimized away at a later stage, like figuring out when you can leave out the clc before an adc, although in many cases my adc's are often used to really add the y register being used in an indirect lda/sta in the loop. So there are bit of these (tya, clc, adc #NN, tay) to be found - if not for the variable NN I guess I could do a jsr to save some bytes at the expense of speed. A deflated bytecode version would ofc be faster except for the time deflation happens ofc.

Anyway an interesting thought experiment. :)

Atm I am happy just finding (as you have seen in my posts) what memory I can use - and I try to do short optimizations like a print routine that does not need zero termination but rather use a bit in the string bytes to indicate termination (or special code for e.g. a jump or color change). The tradeoffs are somewhat vague though at times where perhaps a ROM routine would be easier to use, although I tend to try to avoid having to set full word pointers but bunch text into blocks that can be indexed by a byte.

Edit: Checking the 6502 opcode table I see that low nybble values $3, $7, $b and $f are unused (lower two bits set of opcode) so nice byte keys to use as deflater keys. No doubt exomizer does any compression better so I am sure this is somewhat silly to contemplate. :) - But I see a number of "patterns" in my code when I look at it - like ldx before its being used in a sta/lda NN,x (same with y). Of course the many lda NN, sta MM (and similar for x,y). And lets not forget the initialization of a,x,y (varying number of these) before calling some subroutine.

LDXSTA NN,MM = ldx #NN , sta MM,x

Yay! Saved one byte! :)
Post Reply