Dynalife - 160x192 graphics on unexpanded VIC

IsaacKuo · Post by **IsaacKuo** » Sat Aug 29, 2009 11:42 pm

I've written "Dynavic" graphics routines in DASM which allow 160x192 graphics on an unexpanded VIC.

Dynavic utilizes whatever RAM is left over after the end of your program for dynamically allocated custom characters. If you try to draw a pixel in a blank block when there are no more available characters, it will quietly fail to draw the pixel.

To demonstrate the Dynavic routines, I wrote a Game of Life simulator:

http://members.cox.net/mechdan/vic/dynalife.prg

Here's the source code:

http://members.cox.net/mechdan/vic/dynalife.asm

Instructions:

IJKL = move cursor
space = flip pixel
return = start life simulation (press any key to return to edit mode)

The game of life simulation is slow, but it's not the main point. The main point is to demonstrate the Dynavic graphics routines.

Here's a fun challenge - draw a small starting graphic which will push Dynavic to its limits. When the screen is too cluttered, it will run out of custom chars and leave 8x16 patches of blank space. See if you can use this to produce visible failure.

tlr · Post by **tlr** » Sun Aug 30, 2009 3:15 am

Clever idea!

Very slow though. How fast is just plotting?

Mike · Post by **Mike** » Sun Aug 30, 2009 4:06 am

Nice! With 639 bytes size your implementation is also quite small, I've seen much larger ones, that only work in text-mode resolution.

Of course, 160x192 doesn't really show the possibilities of dynamic character allocation. Even with a 3K expander, you could move the program, and its non-graphics data out of $1000 .. $1FFF. And then you'd never run out of characters. But then you could spare the overhead, anyway.

Obviously you're not restricted to 160x192. 208x256 pixels are easily possible on a PAL VIC-20; 200x208 pixels should be in order on a NTSC machine.

IsaacKuo wrote:Here's a fun challenge - draw a small starting graphic which will push Dynavic to its limits.

Maybe not quite small, but easy to draw: A complete horizontal line of 160 pixels.

This one will propagate over the whole screen. In an intermediate stage (due to the wrap-around feature) you'll get three lines 64 pixels apart. These lines then will attempt to fill the entire screen, and then the regular pattern gets chaotic.

Greetings,

Michael

tlr · Post by **tlr** » Sun Aug 30, 2009 5:58 am

I have an idea...
Couldn't some things like lines be optimized by having a small garbage collect routine running in the background?
If a duplicate is found it is replaced by the first occurance.
This requires a small reference count to be kept for each allocated char so you know if you have to duplicate it when plotting.

Mike · Post by **Mike** » Sun Aug 30, 2009 6:28 am

tlr wrote:If a duplicate is found it is replaced by the first occurance ...

You'll only have a small number of cases where the search for duplicates will succeed, and free another 8x16 bitmap for another definition. This search routine however, will quite probably use more than 16 bytes. In the end there will even be less characters available for display.

Finally, the outcome of the next generation does not only depend on the contents of the characters alone, but also their neighbors - i.e. characters, that once matched will most probably differ again in the next 1 or 2 generations. So there's little or no gain for a lot more complicated implementation.

tlr · Post by **tlr** » Sun Aug 30, 2009 6:34 am

Mike wrote:
tlr wrote:If a duplicate is found it is replaced by the first occurance ...
You'll only have a small number of cases where the search for duplicates will succeed, and free another 8x16 bitmap for another definition. This search routine however, will quite probably use more than 16 bytes. In the end there will even be less characters available for display.

Finally, the outcome of the next generation does not only depend on the contents of the characters alone, but also their neighbors - i.e. characters, that once matched will most probably differ again in the next 1 or 2 generations. So there's little or no gain for a lot more complicated implementation.

I was thinking about the general drawing case, not the game of life.
You might be right in that the cost is too much though.

Mike · Post by **Mike** » Sun Aug 30, 2009 6:41 am

For static images, it can be a feasible method.

At least, Ghislain uses this method to produce hires screens in the 22x23 standard format for his RoQ3 project.

IsaacKuo · Post by **IsaacKuo** » Sun Aug 30, 2009 7:43 am

Mike wrote:Obviously you're not restricted to 160x192. 208x256 pixels are easily possible on a PAL VIC-20; 200x208 pixels should be in order on a NTSC machine.

I wrote the dynavic routines around a 160x192 resolution because that way the screen matrix would fit within 256 bytes. That greatly decreased the size and complexity of the code compared to the Fleas code (which also has to deal with mixing ROM characters with custom bitmaps).

I expect that for most applications, the main program is going to consume a lot more than just 639 bytes...that means a lot less custom characters available, which means an overall resolution much greater than 160x192 is probably not practical anyway.

Mainly, I just whipped up the dynavic routines because I knew they'd be really easy after the experience with Fleas's routines. It took me a few hours yesterday morning to write the dynavic routines. I thought the Game of Life code wouldn't take very long, but it took me more time to debug the Game of Life code than it did to write/debug the dynavic code.

In the end, it was because I hadn't read the documentation on "SBC"...I had assumed you should clear the carry when you should actually set the carry before using.

That meant that my lineload loop was incrementing x by 9 rather than 8...

tlr wrote:How fast is just plotting?

How fast is fast? If you have DASM you could simply use the dynavic code to try plotting things yourself.

Obviously it's not as efficient as plotting to a traditional bitmap. But it depends on the application...if performance really matters then you're likely using sprite graphics rather than pixel graphics.

Hmm...maybe I could extend dynavic to include some sprite/text rendering abilities. Since this is meant for an unexpanded VIC, I'd be making rather different decisions around memory saving--like options for rotation/reflection which would be slow but conserve memory.

Mike · Post by **Mike** » Sun Aug 30, 2009 7:59 am

IsaacKuo wrote:In the end, it was because I hadn't read the documentation on "SBC"...I had assumed you should clear the carry when you should actually set the carry before using.

Yep, 65xx, and 68K CPUs define the C flag in opposite ways for subtraction, or compare instructions. I got caught on that, too, when writing my first programs for the 68K.

IsaacKuo · Post by **IsaacKuo** » Sun Aug 30, 2009 9:46 am

I wrote in some speed optimizations, although this has increased the size to 799 bytes

http://members.cox.net/mechdan/vic/dynalife2.prg

Here's the source code:

http://members.cox.net/mechdan/vic/dynalife2.asm

IsaacKuo · Post by **IsaacKuo** » Mon Aug 31, 2009 12:00 am

I wrote in more optimizations, and more bloat...it's now 911 bytes big, but it's nice and fast (for small patterns).

http://members.cox.net/mechdan/vic/dynalife3.prg
http://members.cox.net/mechdan/vic/dynalife3.asm

...Even with all of this bloat, I'm surprised at how long it takes for it to fail the "horizontal line" test. Unless your hires application REALLY needs to fill up the screen, dynamic allocation with the "dynavic" routines may be good enough!

BTW, I designed the dynavic routines for ease of use. Just include the part below the **** lines at the bottom of your program, and avoid using location 254 (the FREECHAR pointer to the first free character). You can move FREECHAR to another location if you want. The routines are:

dyn_cls clear/initialize screen
dyn_testxy test pixel (ZERO if blank pixel)
dyn_resetxy erase pixel
dyn_setxy draw pixel
dyn_flipxy flip pixel

The routines take x,y coordinates from the x,y registers.

IsaacKuo · Post by **IsaacKuo** » Fri Sep 04, 2009 6:33 am

I've completely rewritten the logic code, and it's MUCH faster now.

http://members.cox.net/mechdan/vic/fastlife1.prg size 976 bytes

The innermost loop processes neighborhood sums for two pixels at a time. Since the sum is at most 9, two sums fit in a byte.

The reworked code has a very TINY temporary buffer also. The buffer is only 16 bytes big (for an 8x16 block). A vertical stripe of data is also temporarily stored, but this fits in nybble RAM. There's plenty of nybble RAM left over, as well as the entire tape buffer.

Since there's so much RAM left over, there's a lot of potential for more optimization. In particular, storing a list of blocks with potential changes could make for a big speed boost by skipping static areas.

Or I could use extra RAM for an instruction screen, or more features.

http://members.cox.net/mechdan/vic/fastlife1.asm

I still haven't bothered to optimize the rendering routine at all. It just uses my generic dynavic flipxy subroutine on a pixel by pixel basis.

Mike · Post by **Mike** » Fri Sep 04, 2009 11:40 am

So, how fast is fast?

With the r-pentomino pattern your program gets an obvious head start against VIC Life, but then both settle at about the same speed.

To quantify this a bit, I took VIC Life, and instrumented the code to do exactly 100 generations.

The "empty" bench just does these on an empty life plane. The "full" bench fills the plane with blocks beforehand:

Code: Select all

**.**.**. ...
**.**.**. ...
......... ...
**.**.**. ...
**.**.**. ...
.........

.
.
.

This pattern is stable, and incurs the most work for the algorithm. It then needs to throw the whole logic at every cell.

Put this way: The "empty" bench measures the overhead for every cell, the "full" bench measures the overhead + additional logic for every cell.

I got these figures:

100 generations, PAL VIC, 1.1 MHz:

empty: 13.68 sec => 15.07E6 Cycles
full: 334.36 sec => 367.73E6 Cycles

100 generations == 3072000 cell calculations, i.e. in cycles/cell:

empty: ~5 cycles/cell
full: ~110 cycles/cell

I'd be interested to know the corresponding figures of your implementation. This would however require to move the code into expanded RAM, so the hires screen can use $1000 .. $1FFF for the "full" bench. The "empty" bench will give a slightly skewed value for cycles/cell as you supposedly put in an optimisation that skips entirely empty 8x16 cell areas, but you can give it nonetheless.

IsaacKuo · Post by **IsaacKuo** » Fri Sep 04, 2009 12:18 pm

It's still a work in progress...actually, I'm almost scrapping the current algorithm again. Currently, the algorithm skips blank spaces. I thought I'd hack in something to skip static spaces later on...but that makes the logic for skipping blank spaces redundant!

My original idea was that I KNEW how to implement skipping blank spaces, but I wasn't sure if I'd have enough RAM to skip static spaces. When I started coding, the buffer was a huge 192 byte vertical stripe--it ate up the tape buffer. During coding, I realized I could slim that down to just a 16 byte buffer.

So now I have the space to implement static space skipping--which ultimately will be a lot more efficient than blank space skipping anyway. And I need static space skipping, so the algorithm is suitable for me to implement the games "Lifecommand" and "Lifeteroids".

Therefore, I'm currently reworking the code to reverse the role of the buffer. Instead of being a temporary copy of the current block's graphic data, it will be a temporary buffer of flipped pixels. I can just pump bits into a buffer byte and unload them onto the temporary buffer. After filling the 8x16 buffer, it gets eor'd onto the screen. This loop will very efficiently detect whether any update occurs (just loop through the buffer for a nonzero entry--if found, then mark the block as changed and eor the remaining bytes onto the screen).

Needless to say, now that I'm rendering the graphics into an 8x16 buffer rather than pixel by pixel, I'll use an optimized blit routine instead of calling dyn_flipxy. This will also give a nice boost to performance.

IsaacKuo · Post by **IsaacKuo** » Fri Sep 04, 2009 12:23 pm

Mike wrote:The "empty" bench will give a slightly skewed value for cycles/cell as you supposedly put in an optimisation that skips entirely empty 8x16 cell areas, but you can give it nonetheless.

Oh, the "empty" bench would be completely useless for my Fastlife algorithm. It does almost no work, since all it does is cycle through the screen buffer skipping blanks. But gee...it's really doing no work! The screen is just a blank space of nothingness, with no movement. It would be a useless statistic.

Denial

Dynalife - 160x192 graphics on unexpanded VIC

Dynalife - 160x192 graphics on unexpanded VIC

Re: Dynalife - 160x192 graphics on unexpanded VIC

Re: Dynalife - 160x192 graphics on unexpanded VIC

Fastlife