C=Hacking Issue 8 - fast plot routine with 20 cycles

Basic and Machine Language

Moderator: Moderators

Post Reply
User avatar
MrSterlingBS
Vic 20 Enthusiast
Posts: 174
Joined: Tue Jan 31, 2023 2:56 am
Location: Germany,Braunschweig

C=Hacking Issue 8 - fast plot routine with 20 cycles

Post by MrSterlingBS »

Hello,

i read about the very fast plot routine on C-Hacking issue #8. They calculate with an average speed of 20 cycles per pixel.
After a few attempts and experimenting, I wasn't able to get the routine to work. Has anyone here finished the routine yet?

In assembly:

Code: Select all

	LDA	BITP,X	4	;Load the bit pattern from a table
	BPL	CONT	3  2	;Still in the same column?
	EOR	$LO	   3	;If not, add 128 to the low byte
	STA	$LO	   3
	BMI	CONT	   3  2	;If the high bit is set, stay in the same page
	INC	$HI	      5	;Otherwise point to the next page
	LDA	#$128	      2	;We still need the bit pattern for x!
 CONT ORA	($LO),Y 5
	STA	($LO),Y	6	;Plot the point
			--------
	   Cycle count: 18 26 32
Therefore, it takes 18 cycles to plot a point, 26 cycles to jump a column,
and 32 cycles to jump a page. Over 16 points, this averages 19.375 cycles.


http://unusedino.de/ec64/technical/c=hacking/ch08.html


Best regards
Sven

PS: My fastest solution is 22 cycles + JSR + RTS or with JMP + JMP = 22 + 12 or 22 + 6 = 34 or 28

Code: Select all

      LDA #$00
      STA $FB				; set lowbyte for drawing - save LDA #$00
						; every column starts with lowbate $00 - $1000, $1100, $1200, - , $1F00

MainLoop:
    some code ...
    JSR SetPixel                       ; 6
    
    (JMP SetPixel                      ; 3)
(FinishSetPixel:)
    jmp MainLoop

SetPixel:
		lda highb,x              ; 4
		sta $fc                    ; 3
		lda ($fb),y               ; 5
		ora xtableset,x        ; 4
		sta ($fb),y               ; 6
		rts                           ; 6
                
               (JMP FinishSetPixel    ; 3)
User avatar
Mike
Herr VC
Posts: 4841
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Post by Mike »

My own fast Bresenham line drawing routine just employs these 3 instructions to plot a single pixel:

Code: Select all

LDA (zp),Y
ORA #imm
STA (zp),Y
The pointer in zp/zp+1 contains the bitmap column start address (derived from the x co-ordinate), Y contains the y co-ordinate, and #imm has a single bit set (again derived from the x co-ordinate). The main loop is 8 times unrolled, so exists one version for each of the 8 possible bits set. The bitmap column start address is calculated once, for the start pixel, and then updated as necessary as the routine steps along the line.

Depending on whether there is a carry into the high byte of the effective address in the LDA instruction, this takes either 13 or 14 cycles in total (LDA, either 5 or 6 cycles; ORA 2 cycles; STA always 6 cycles) for a pixel plot. Using JSR here is out of the question, this code fragment must be inlined.
User avatar
MrSterlingBS
Vic 20 Enthusiast
Posts: 174
Joined: Tue Jan 31, 2023 2:56 am
Location: Germany,Braunschweig

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Post by MrSterlingBS »

:shock: :shock: :shock:

Wow 🤩 That is very fast!
User avatar
MrSterlingBS
Vic 20 Enthusiast
Posts: 174
Joined: Tue Jan 31, 2023 2:56 am
Location: Germany,Braunschweig

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Post by MrSterlingBS »

Mike wrote: Thu Sep 28, 2023 4:33 am My own fast Bresenham line drawing routine just employs these 3 instructions to plot a single pixel:

Code: Select all

LDA (zp),Y
ORA #imm
STA (zp),Y
The pointer in zp/zp+1 contains the bitmap column start address (derived from the x co-ordinate), Y contains the y co-ordinate, and #imm has a single bit set (again derived from the x co-ordinate). The main loop is 8 times unrolled, so exists one version for each of the 8 possible bits set. The bitmap column start address is calculated once, for the start pixel, and then updated as necessary as the routine steps along the line.

Depending on whether there is a carry into the high byte of the effective address in the LDA instruction, this takes either 13 or 14 cycles in total (LDA, either 5 or 6 cycles; ORA 2 cycles; STA always 6 cycles) for a pixel plot. Using JSR here is out of the question, this code fragment must be inlined.
:?: :?: :?:

I do not understand how it works.
The high byte has to be loaded or calculated from a table? and how do I determine which of the eight bits I need to set? I can't find the solution.
User avatar
Mike
Herr VC
Posts: 4841
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Post by Mike »

MrSterlingBS wrote:I do not understand how it works.
In short: I factored the calculation of the bitmap column start address and the calculation of the pixel bit mask out of the core loop of the Bresenham algorithm.

To rephrase my previous posting: the bitmap column start address is calculated in full only once, for the start pixel. That address is only updated as necessary, upon a column change. The core loop is 8 times unrolled, for each of the 8 possible bit positions a copy of the loop body exists. The relevant copy is directly jumped to when the 'preparation' part has worked out the bit position of the start pixel.

Here's one copy of the loop body in full:

Code: Select all

5.5     LDA (base),Y
  2     EOR #bit     ; ** or AND/ORA
  6     STA (base),Y
  2     DEX
  2     BEQ end
  3     LDA error
  3     SBC dy
3/2 +-- BCS
0/3 |   ADC dx
0/2 |   INY          ; ** or DEY
  3 +-> STA error

29.5/33.5 cycles/pixel

**) self-modifying code
This is for slopes with |dy/dx|<=1, slopes with |dy/dx|>1 feature a slight re-arrangement of these instructions.

As you see, only the first three instructions are involved with single pixel access. The next two instructions, DEX/BEQ, count down the remaining pixels to be set, and the following instructions handle the actual Bresenham algorithm.

The preparation part modifies the instructions marked with ** as necessary: EOR/AND/ORA for toggling/clearing/setting the pixels, INY is required for dy>0 and DEY for dy<0. Drawing the first pixel of the line then takes roughly 150 cycles, all following pixels then only take 30 .. 34 cycles each, i.e. for long lines, the routine approaches 30000 pixels/second.
groepaz
Vic 20 Scientist
Posts: 1187
Joined: Wed Aug 25, 2010 5:30 pm

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Post by groepaz »

Next step would be unrolling the loop for each needed slope... perhaps not on VIC20 though :)
I'm just a Software Guy who has no Idea how the Hardware works. Don't listen to me.
User avatar
Mike
Herr VC
Posts: 4841
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Post by Mike »

As is, the whole routine weighs in at less than 1 KB (793 bytes, to be exact).

IMO, the real challenge now is providing the routine with geometry data fast enough.
groepaz wrote:unrolling the loop for each needed slope
Or, for that matter, using pre-compiled sprites for small line segments.

...

That being said, the routine has seen real life use in the following programs: my Vector Demo, Earth Globe 2 (as part of the MG batch suite), MINISKETCH (a lightpen drawing program) and Madness (a small demonstrator of NR_QDRand).
User avatar
MrSterlingBS
Vic 20 Enthusiast
Posts: 174
Joined: Tue Jan 31, 2023 2:56 am
Location: Germany,Braunschweig

Re: C=Hacking Issue 8 - fast plot routine with 20 cycles

Post by MrSterlingBS »

Okay, thanks a lot for the code and explanations. I think (hope) i understand. :idea:
Post Reply