Dynalife - 160x192 graphics on unexpanded VIC

Mike · Post by **Mike** » Fri Sep 04, 2009 12:33 pm

IsaacKuo wrote:Oh, the "empty" bench would be completely useless for my Fastlife algorithm. It does almost no work, since all it does is cycle through the screen buffer skipping blanks. But gee...it's really doing no work! The screen is just a blank space of nothingness, with no movement. It would be a useless statistic.

Well, we can then set this value to 0 cycles/pixel for your program - no problem.

It still remains the value for the "full" bench. And I'm interested to see how the static space optimisation will fool it.

Then I'll give you the pattern of a screen-filling period-2 oscillator.

Edit: Here is the test pattern, a twin-dots agar:

Interestingly, it even runs slightly slower than my "full" test-bench earlier. VIC Life needs 345.3 seconds for 100 generations.

IsaacKuo · Post by **IsaacKuo** » Fri Sep 04, 2009 5:47 pm

I did a crude test hack. Rather than modify the program location, I just filled in HALF the screen and did 200 cycles. With VICE in PAL mode, 200 cycles of the block pattern takes 182 seconds.

This test isn't fair, though, because the least efficient part of my code is the code which renders flipped pixels. I just use the generic dyn_flipxy subroutine to render pixel-by-pixel, causing a disgusting amount of recalculation. The block pattern doesn't flip any pixels, so it skips the least efficient part of my code.

Of course, I'm addressing this problem by writing an optimized render routine. My brain's starting to hurt from the coding though, so I'll take a little break from it. In the meantime, the code is "good enough" to have some fun with.

I also did a nearly blank screen with just a 3 pixel blinker. It completed 200 cycles in about 7 seconds (around 25+ fps).

Mike · Post by **Mike** » Sun Sep 06, 2009 10:57 am

IsaacKuo wrote:... I just filled in HALF the screen and did 200 cycles. With VICE in PAL mode, 200 cycles of the block pattern takes 182 seconds. ... This test isn't fair, though, because the least efficient part of my code is the code which renders flipped pixels.

Still, that indicates to me, that the cell decision logic of VIC Life could need a good rework. Your program is seemingly twice as fast at that.

The pixel operations take nearly no time in VIC Life, and I had moved address calculations already out of the critical inner loop. And the update of the screen is subsumed in the 5 cycles/pixel "empty" overhead.

I also did a nearly blank screen with just a 3 pixel blinker. It completed 200 cycles in about 7 seconds (around 25+ fps).

Which would translate to ~1.25 cycles/pixel "empty" overhead.

IsaacKuo · Post by **IsaacKuo** » Sun Sep 06, 2009 11:53 am

Still, that indicates to me, that the cell decision logic of VIC Life could need a good rework. Your program is seemingly twice as fast at that. Confused

I use a parallel processing technique to calculate two sums simultaneously. That might account for most of the speed difference.

Also, I use two 64 byte tables to allow very quickly calculating the 3-pixel horizontal neighborhoods of 4 cells--I do two lookups without updating the y register.

The basic strategy is to devote 12 bytes of zero page to a rolling store of the 3-pixel horizontal sums of 8 cells. The x register revolves 2,1,0,2,1,0... to point to storage. The y register increments 0...15 to indicate where to read the data from. Within the loop, the y register gets trashed for purposes of table lookups.

When it comes time to do the sums, I just use something like this:

Code: Select all

  clc
  lda BCsum
  adc BCsum+1
  adc BCsum+2
  bne CALCULATE_CELLS_B_AND_C
DONE_CELLS_B_AND_C:
  lda DEsum
  adc DEsum+1
  adc DEsum+2
  bne CALCULATE_CELLS_D_AND_E
...same thing for cells FG and HI...

Because the storage revolves 2,1,0,2,1,0,... it doesn't matter what order the results are summed in. That way, I can hardcode the neighborhood sums without any index.

I also did a nearly blank screen with just a 3 pixel blinker. It completed 200 cycles in about 7 seconds (around 25+ fps).

It turns out I placed the blinker in the center of the screen which was the worst place to put it. It's constantly allocating and freeing character blocks because it's right on a corner.

It takes 5 seconds with the blinker placed in the middle of a block. A simple still life also takes 5 seconds, so the overhead of just flipping 4 pixels (without freeing/allocation) is not too bad.

The current algorithm checks a blank space for quite a few possible reasons why computation may be required. This will be made a lot tighter when everything is reduced to checking against a single unified bitmap of "changed" 8x8 blocks.

My new strategy is to calculate a bitmap matrix of potentially changed 8x8 blocks. However, I won't recalculate this bitmap every turn. Instead, it will only be recalculated every 7 turns (maybe 6 turns). Since the "spillover" on an 8x8 block grid naturally expands the recalculation zone by at least 7 pixels, there's no need to recalculate immediately. This represents a major change from the previous algorithm.