C=Hacking Issue 8 - fast plot routine with 20 cycles

MrSterlingBS · Post by **MrSterlingBS** » Thu Sep 28, 2023 3:15 am

Hello,

i read about the very fast plot routine on C-Hacking issue #8. They calculate with an average speed of 20 cycles per pixel.
After a few attempts and experimenting, I wasn't able to get the routine to work. Has anyone here finished the routine yet?

In assembly:

Code: Select all

	LDA	BITP,X	4	;Load the bit pattern from a table
	BPL	CONT	3  2	;Still in the same column?
	EOR	$LO	   3	;If not, add 128 to the low byte
	STA	$LO	   3
	BMI	CONT	   3  2	;If the high bit is set, stay in the same page
	INC	$HI	      5	;Otherwise point to the next page
	LDA	#$128	      2	;We still need the bit pattern for x!
 CONT ORA	($LO),Y 5
	STA	($LO),Y	6	;Plot the point
			--------
	   Cycle count: 18 26 32

Therefore, it takes 18 cycles to plot a point, 26 cycles to jump a column,
and 32 cycles to jump a page. Over 16 points, this averages 19.375 cycles.

http://unusedino.de/ec64/technical/c=hacking/ch08.html

Best regards
Sven

PS: My fastest solution is 22 cycles + JSR + RTS or with JMP + JMP = 22 + 12 or 22 + 6 = 34 or 28

Code: Select all

      LDA #$00
      STA $FB				; set lowbyte for drawing - save LDA #$00
						; every column starts with lowbate $00 - $1000, $1100, $1200, - , $1F00

MainLoop:
    some code ...
    JSR SetPixel                       ; 6
    
    (JMP SetPixel                      ; 3)
(FinishSetPixel:)
    jmp MainLoop

SetPixel:
		lda highb,x              ; 4
		sta $fc                    ; 3
		lda ($fb),y               ; 5
		ora xtableset,x        ; 4
		sta ($fb),y               ; 6
		rts                           ; 6
                
               (JMP FinishSetPixel    ; 3)

Mike · Post by **Mike** » Thu Sep 28, 2023 4:33 am

My own fast Bresenham line drawing routine just employs these 3 instructions to plot a single pixel:

Code: Select all

LDA (zp),Y
ORA #imm
STA (zp),Y

The pointer in zp/zp+1 contains the bitmap column start address (derived from the x co-ordinate), Y contains the y co-ordinate, and #imm has a single bit set (again derived from the x co-ordinate). The main loop is 8 times unrolled, so exists one version for each of the 8 possible bits set. The bitmap column start address is calculated once, for the start pixel, and then updated as necessary as the routine steps along the line.

Depending on whether there is a carry into the high byte of the effective address in the LDA instruction, this takes either 13 or 14 cycles in total (LDA, either 5 or 6 cycles; ORA 2 cycles; STA always 6 cycles) for a pixel plot. Using JSR here is out of the question, this code fragment must be inlined.

MrSterlingBS · Post by **MrSterlingBS** » Thu Sep 28, 2023 8:01 am

Wow

That is very fast!

MrSterlingBS · Post by **MrSterlingBS** » Mon Oct 02, 2023 3:33 am

Mike wrote: ↑Thu Sep 28, 2023 4:33 am My own fast Bresenham line drawing routine just employs these 3 instructions to plot a single pixel:
Code: Select all
LDA (zp),Y
ORA #imm
STA (zp),Y
The pointer in zp/zp+1 contains the bitmap column start address (derived from the x co-ordinate), Y contains the y co-ordinate, and #imm has a single bit set (again derived from the x co-ordinate). The main loop is 8 times unrolled, so exists one version for each of the 8 possible bits set. The bitmap column start address is calculated once, for the start pixel, and then updated as necessary as the routine steps along the line.

Depending on whether there is a carry into the high byte of the effective address in the LDA instruction, this takes either 13 or 14 cycles in total (LDA, either 5 or 6 cycles; ORA 2 cycles; STA always 6 cycles) for a pixel plot. Using JSR here is out of the question, this code fragment must be inlined.

I do not understand how it works.
The high byte has to be loaded or calculated from a table? and how do I determine which of the eight bits I need to set? I can't find the solution.

Mike · Post by **Mike** » Mon Oct 02, 2023 4:03 am

MrSterlingBS wrote:I do not understand how it works.

In short: I factored the calculation of the bitmap column start address and the calculation of the pixel bit mask out of the core loop of the Bresenham algorithm.

To rephrase my previous posting: the bitmap column start address is calculated in full only once, for the start pixel. That address is only updated as necessary, upon a column change. The core loop is 8 times unrolled, for each of the 8 possible bit positions a copy of the loop body exists. The relevant copy is directly jumped to when the 'preparation' part has worked out the bit position of the start pixel.

Here's one copy of the loop body in full:

Code: Select all

5.5     LDA (base),Y
  2     EOR #bit     ; ** or AND/ORA
  6     STA (base),Y
  2     DEX
  2     BEQ end
  3     LDA error
  3     SBC dy
3/2 +-- BCS
0/3 |   ADC dx
0/2 |   INY          ; ** or DEY
  3 +-> STA error

29.5/33.5 cycles/pixel

**) self-modifying code

This is for slopes with |dy/dx|<=1, slopes with |dy/dx|>1 feature a slight re-arrangement of these instructions.

As you see, only the first three instructions are involved with single pixel access. The next two instructions, DEX/BEQ, count down the remaining pixels to be set, and the following instructions handle the actual Bresenham algorithm.

The preparation part modifies the instructions marked with ** as necessary: EOR/AND/ORA for toggling/clearing/setting the pixels, INY is required for dy>0 and DEY for dy<0. Drawing the first pixel of the line then takes roughly 150 cycles, all following pixels then only take 30 .. 34 cycles each, i.e. for long lines, the routine approaches 30000 pixels/second.

groepaz · Post by **groepaz** » Mon Oct 02, 2023 5:21 am

Next step would be unrolling the loop for each needed slope... perhaps not on VIC20 though

Mike · Post by **Mike** » Mon Oct 02, 2023 8:27 am

As is, the whole routine weighs in at less than 1 KB (793 bytes, to be exact).

IMO, the real challenge now is providing the routine with geometry data fast enough.

groepaz wrote:unrolling the loop for each needed slope

Or, for that matter, using pre-compiled sprites for small line segments.

...

That being said, the routine has seen real life use in the following programs: my Vector Demo, Earth Globe 2 (as part of the MG batch suite), MINISKETCH (a lightpen drawing program) and Madness (a small demonstrator of NR_QDRand).

MrSterlingBS · Post by **MrSterlingBS** » Mon Oct 02, 2023 8:32 am

Okay, thanks a lot for the code and explanations. I think (hope) i understand.

Denial

C=Hacking Issue 8 - fast plot routine with 20 cycles

C=Hacking Issue 8 - fast plot routine with 20 cycles

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles