Line drawing algorithm

Mike · Post by **Mike** » Sat Mar 16, 2024 2:51 pm

MrSterlingBS wrote:[...]

PM sent

Mike · Post by **Mike** » Sun Mar 24, 2024 7:53 am

Warning, longer post.

Before I posted my own version of the "20000 pixels/second" line routine on the previous page, I also had a PM exchange with Merytsetesh about MrSterlingBS's original implementation:

in a PM to Merytsetesh, Mike wrote:I'll take a look into this the next days whether I can derive a version of the routine that does not have those rounding issues, and will do a post about the improved version here in Denial. FYI, that routine roughly manages 20000 pixels/second.

While doing so, I found the 'skeleton' of the main loop to be general enough that only a single copy was necessary (not two, as I originally had imagined). One of the self-modifications changes between CPX and CPY at the end of the loop as necessary, that had been the decisive factor in it.

The routine is remarkable in the sense that it manages to keep the X and Y co-ordinates as work values in the respective registers throughout the whole operation of the main loop - that is something you normally do not see in non-trivial 6502 code. Normally, you would always need to swap between the registers and (zero page) memory. Still the 6502 is missing a second accumulator, and that means the single accumulator has to hold the current bitmap byte value and the decision/error value in turns (and for a short time, also the low and high byte parts of the bit column address).

Yet, this routine still does a full address calculation for each pixel, even if that is extremely streamlined by avoiding shift and mask instructions during retrieval of the bitmap column pointer which would otherwise be helpful to keep the tables small.

To further improve on this, the crucial observation is that the Y indexed address mode already does half the job of calculating the effective address of the bitmap byte, and the bitmap column pointer only changes its value when the respective 8-pixel bitmap column is left to either side. Most of the time, the bitmap column pointer stays put so we only need to update it as necessary. Factoring out that part is one of the two necessary steps.

The second step is saving the time on the bit mask retrieval. Again, the routine here reads the bit mask from a table with an X indexed ORA instruction. However, while stepping along the line, the bit in the mask either stays put or changes its position one to the left or to the right (with possible wraparound). This can be streamlined by unrolling the main loop and providing 8 copies of the loop body, one for each position of the bit to be set (or reset), as immediate constant.

In the end, the main loop of the line routine changes from:

Code: Select all

.Line_03
 INX            ; or DEX for x as main direction (or INY/DEY for y as main direction)
 SEC
 LDA accu
 SBC #&00       ; effectively, SBC #dy for x as main direction (or SBC #dx for y as main direction)
 BCS Line_04
 ADC #&00       ; effectively, ADC #dx for x as main direction (or ADC #dy for y as main direction)
 INY            ; or DEY for x as main direction (or INX/DEX for y as main direction)
.Line_04
 STA accu
.Line_05
 LDA Line_06,X
 STA scrn
 LDA Line_07,X
 STA scrn+1
 LDA (scrn),Y
 ORA Line_08,X
 STA (scrn),Y
 CPX #&00       ; effectively, CPX #x2 for x as main direction (or CPY #y2 for y as main direction)
 BNE Line_03
 
49.5/52.5 cycles/pixel

... (the comments show where the preparation part of the routine modifies the main loop) ...

to:

Code: Select all

5.5     LDA (base),Y
  2     EOR #bit     ; ** or AND/ORA
  6     STA (base),Y
  2     DEX
  2     BEQ end
  3     LDA error
  3     SBC dy
3/2 +-- BCS
0/3 |   ADC dx
0/2 |   INY          ; ** or DEY
  3 +-> STA error

29.5/33.5 cycles/pixel

**) self-modifying code

The latter is one copy of the loop body for x as main direction, the loop body for y as main direction has these instructions slightly re-arranged.

The X register now has its role changed from keeping the X co-ordinate to counting down the number of remaining pixels. If that number becomes 0, one of the "BEQ end" instructions is executed to terminate the loop. That is another small optimization as it saves one cycle per pixel in most cases. Furthermore, DEX does not change the carry flag (as do the CPX or CPY instructions in the former routine) so it is not necessary to do a SEC to prepare the SBC instruction in the Bresenham step. It is only necessary to enter the main loop with the C flag set once (actually, when the bitmap column pointer is adjusted, the state of the C flag needs to be kept in mind as well).

Adjusting the bitmap column pointer weighs in with 23 cycles, at most for every 8th pixel (less often so for y as main direction), so this adds in another 3 cycles per pixel at worst. In the end, the average number of cycles per pixel goes down from 51 cycles to 31.5(+3) cycles for the faster implementation, resulting in ~50 % more speed.

As the full address calculation of the bitmap column pointer is only done once, the tables can be kept small: only 20 entries for the low and high bytes each (assuming 160 pixels horizontal width for the bitmap, as used by MINIGRAFIK). The bit position table is not anymore necessary as it is now 'contained' within the immediate fields of the ORA/AND/EOR instructions. Another small improvement in play: the line is always drawn left to right - this is always possible by exchanging the line ends, should x2 be less than x1.

What does not carry over is using immediate operands for the SBC and ADC instructions - modifying these would require to change their operand fields for all 8 copies of the loop body in the relevant main loop. During init, this would incur an extra cost of at least 64 cycles (for 16 STA ABS instructions) which needs a line of at least 32 pixels length to break even.

All that needs to be done now is preparing the relevant of the two main loops with the correct instructions for stepping along the y axis (either INY or DEY), at 8 places. When the draw action is changed between set, reset or invert, also those corresponding instructions, possibly also their operands need be modified. My faster routine takes care to only do these modifications when they are really necessary. If one disregards the change of draw action, the setup part still only takes about the same time on average as the setup part of the former routine (150 cycles vs 130 cycles).

...

That being said, this faster routine now sits at a sweet spot of being quite fast with 30000 pixels/second while still being less than 1 KB in size. Any further improvement in speed to the direction of 20 cycles/pixel (as pondered about by me earlier in this thread) likely needs an exuberant increase in code size to cover all the definitions by cases. If the flexibility to change the draw action is still supposed to be kept, I would estimate it at about 10 KB or more, which IMO makes such a routine unfit for general use.

In any case, whether with the 20000 pixels/second routine here or with the faster 30000 pixels/second routine in my repository, the challenge now is to supply the line routine with co-ordinate data at a fast pace. When the geometry engine is sluggish, neither of the two line routines will save your day.

Greetings,

Michael

Merytsetesh · Post by **Merytsetesh** » Sun Mar 24, 2024 12:01 pm

First off, I'd like to take my hat off to you, Mike: this is incredible. I would never have looked at things from this perspective -- not for a long time, anyway. Getting 20K-30K px/s on a VIC is incredible. Everything here is a masterclass in elegance in engineering... I'm lost for words.

Mike wrote:Another small improvement in play: the line is always drawn left to right - this is always possible by exchanging the line ends, should x2 be less than x1.

I'm curious: what if you had y2>y1, so lines were always drawn in one direction vertically as well? I admit I may have missed something in your explanation, so as ever I apologise in advance.

Once again, I'm blown away, and humbled, by this kind of vision. Thank you for sharing this, Mike.

Best wishes,

Merytsetesh

Mike · Post by **Mike** » Sun Mar 24, 2024 1:37 pm

Merytsetesh wrote:I'm curious: what if you had y2>y1, so lines were always drawn in one direction vertically as well?

You can't have both.

Given x2>x1 and y2>y1, a line on screen looks like a backslash, "\", assuming a left-handed co-ordinate system. You can't enforce y2>y1 though when the endpoints already have been sorted for x2>x1, as that would exclude lines that look like a slash, "/".

Rewriting the line routine to sort the y co-ordinates for y2>y1 instead would indeed allow one to use INY only for stepping the y-co-ordinate, but in turn also requires two more main loops: the pixel bit sequence is hard-coded within the main loops! For x2>x1, one of the original two main loops would be used (with the bits ordered $80, $40, ..., $01 for ORA), x2<x1 would require the operands of the ORA/AND/EOR instructions to be placed in inverse order (for ORA: $01, $02, ..., $80). Alternatively, with still only two main loops, these operands would need to be modified everytime they are not in the correct order, which surely takes more time than just replacing INY for DEY (or vice versa).

Merytsetesh · Post by **Merytsetesh** » Mon Mar 25, 2024 1:17 pm

Mike wrote: ↑Sun Mar 24, 2024 1:37 pm Given x2>x1 and y2>y1, a line on screen looks like a backslash, "\", assuming a left-handed co-ordinate system. You can't enforce y2>y1 though when the endpoints already have been sorted for x2>x1, as that would exclude lines that look like a slash, "/".

Of course, and I'm an idiot for asking in the first place.

Teach me not to post when I've had not coffee.

Mike wrote: ↑Sun Mar 24, 2024 1:37 pmAlternatively, with still only two main loops, these operands would need to be modified everytime they are not in the correct order, which surely takes more time than just replacing INY for DEY (or vice versa).

No, that doesn't make sense to do. Better to keep things simple. Do the tables get generated each time the routine is called or only once on initialisation?

Mike · Post by **Mike** » Mon Mar 25, 2024 2:00 pm

Merytsetesh wrote:Do the tables get generated each time the routine is called or only once on initialisation?

I suppose you mean the tables that are written in lines 20..27 of the test rig: these of course need only be written once, at program start.

The main loop of the test rig just sets random co-ordinates for the endpoints, calls the line routine with SYS 15360 and then loops.

Denial

Line drawing algorithm

Re: Line drawing algorithm

Re: Line drawing algorithm

Re: Line drawing algorithm

Re: Line drawing algorithm

Re: Line drawing algorithm

Re: Line drawing algorithm