Still, that indicates to me, that the cell decision logic of VIC Life could need a good rework. Your program is seemingly twice as fast at that. Confused
I use a parallel processing technique to calculate two sums simultaneously. That might account for most of the speed difference.
Also, I use two 64 byte tables to allow very quickly calculating the 3-pixel horizontal neighborhoods of 4 cells--I do two lookups without updating the y register.
The basic strategy is to devote 12 bytes of zero page to a rolling store of the 3-pixel horizontal sums of 8 cells. The x register revolves 2,1,0,2,1,0... to point to storage. The y register increments 0...15 to indicate where to read the data from. Within the loop, the y register gets trashed for purposes of table lookups.
When it comes time to do the sums, I just use something like this:
Code: Select all
clc
lda BCsum
adc BCsum+1
adc BCsum+2
bne CALCULATE_CELLS_B_AND_C
DONE_CELLS_B_AND_C:
lda DEsum
adc DEsum+1
adc DEsum+2
bne CALCULATE_CELLS_D_AND_E
...same thing for cells FG and HI...
Because the storage revolves 2,1,0,2,1,0,... it doesn't matter what order the results are summed in. That way, I can hardcode the neighborhood sums without any index.
I also did a nearly blank screen with just a 3 pixel blinker. It completed 200 cycles in about 7 seconds (around 25+ fps).
It turns out I placed the blinker in the center of the screen which was the worst place to put it. It's constantly allocating and freeing character blocks because it's right on a corner.
It takes 5 seconds with the blinker placed in the middle of a block. A simple still life also takes 5 seconds, so the overhead of just flipping 4 pixels (without freeing/allocation) is not too bad.
The current algorithm checks a blank space for quite a few possible reasons why computation may be required. This will be made a lot tighter when everything is reduced to checking against a single unified bitmap of "changed" 8x8 blocks.
My new strategy is to calculate a bitmap matrix of potentially changed 8x8 blocks. However, I won't recalculate this bitmap every turn. Instead, it will only be recalculated every 7 turns (maybe 6 turns). Since the "spillover" on an 8x8 block grid naturally expands the recalculation zone by at least 7 pixels, there's no need to recalculate immediately. This represents a major change from the previous algorithm.