How many cycles per pixel should I expect for drawing lines?

Basic and Machine Language

Moderator: Moderators

Post Reply
matsondawson
The Most Noble Order of Denial
Posts: 343
Joined: Fri May 01, 2009 4:44 pm

How many cycles per pixel should I expect for drawing lines?

Post by matsondawson »

i.e. What's the best algorithm
PhilRanger
Vic 20 Hobbyist
Posts: 143
Joined: Thu Aug 25, 2011 10:04 am

Post by PhilRanger »

Really depends: Basic, Machine language? Changing the colors? Using pre-programmed characters 1 bye at a time? XORing stuff already there? Etc.

Sorry!
Phil Ranger
-------------
"Don't eat the trees 2" for the VIC 20 : http://www.box.net/shared/u398kj0nr0lkauzm1k67
on line: http://www.mdawson.net/vic20chrome/vic2 ... otrees.prg
matsondawson
The Most Noble Order of Denial
Posts: 343
Joined: Fri May 01, 2009 4:44 pm

Post by matsondawson »

XORing in machine language.
No colour changes.
128 x 160 resolution
8k ram
User avatar
Mike
Herr VC
Posts: 4816
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Post by Mike »

matsondawson wrote:XORing in machine language.
No colour changes.
My faster Bresenham line routine implementation uses the following code snippet 8 times unrolled, so all bit positions are covered:

Code: Select all

5.5     LDA (base),Y
  2     EOR #bit     ; ** or AND/ORA
  6     STA (base),Y
  2     DEX
  2     BEQ end
  3     LDA error
  3     SBC dy
3/2 +-- BCS
0/3 |   ADC dx
0/2 |   INY          ; ** or DEY
  3 +-> STA error

29.5/33.5 cycles/pixel

**) self-modifying code
This is for slopes with |dy/dx|<=1, slopes with |dy/dx|>1 need a slight re-arrangement of these instructions. There's some overhead when the character column is changed; the setup of work variables and preparing INY/DEY as necessary need another ~150 cycles for each call, so there ...
matsondawson
The Most Noble Order of Denial
Posts: 343
Joined: Fri May 01, 2009 4:44 pm

Post by matsondawson »

Thanks, I'm giving Star Wars a tentative go atm and need to know that my calculations are similar to others experiences.

Could you not use Y to detect the end of line by setting the base as the start of the line, and also use the reciprocal of the gradient and carry as the divider, like the following?

Code: Select all

5.5     LDA (base),Y ; base is set as start of line and Y=0
  2     EOR #bit     ; ** or AND/ORA 
  6     STA (base),Y 
  2     TXA            ; x holds the error
  3     ADC #recip ; recip is x/y*256
  2		TAX
3/2 +-- BCS 
0/2 |   DEY          ; ** or DEY Y==0 is end of line
  3     BEQ end
matsondawson
The Most Noble Order of Denial
Posts: 343
Joined: Fri May 01, 2009 4:44 pm

Post by matsondawson »

Assuming the reciprocal is a valid method, for mostly vertical lines if you don't mind an illegal instruction...

Code: Select all

.loop
5.5     LDA (base),Y ; base is set as start of line when Y=0 
  2     EOR #bit      ; ** or AND/ORA 
  6     STA (base),Y 
  2     DEY           ; ** or DEY Y==0 is end of line 
3/2     BEQ end 
  2     TXA
  2     SBX #recip    ; recip is x/y*256 - illegal inst
3/2 +-- BCS loop
.next

24.5 cycles vertical / 23.5 horizontal
User avatar
Mike
Herr VC
Posts: 4816
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Post by Mike »

matsondawson wrote:Could you not use Y to detect the end of line by setting the base as the start of the line,
Lines with slopes of |dy/dx|<1 have adjacent pixels with equal y co-ordinates, so this method might leave out several pixels at one end of the line. Furthermore, in my implementation the Y register actually contains the y co-ordinate of the current pixel; INY is required for dy>0, and DEY for dy<0.
and also use the reciprocal of the gradient and carry as the divider, [...]?
That'd then be a DDA line algorithm. An 8-bit reciprocal for 8-bit co-ordinates requires careful rounding. Otherwise the line routine will miss the endpoint due to accumulation of rounding errors with lines longer than 128 pixels. The case of |dy/dx|=1 will need to go extra, as 'recip' cannot take the value 256. :(
User avatar
darkatx
Vic 20 Afficionado
Posts: 470
Joined: Wed Feb 04, 2009 2:17 pm
Location: Canada

Post by darkatx »

Less than 190 cycles a pixel?!!
That's really fast! :shock:

I'm working on my second line routine...the first one is probably running slightly slower than Mike's (big shock there..lol) but I almost have my chunk routine done and I feel confident it should drop the cycle count some more.

Star Wars! I know the thread is old but all the best man love to see it happen! :D
Learning all the time... :)
User avatar
Mike
Herr VC
Posts: 4816
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Post by Mike »

darkatx wrote:Less than 190 cycles a pixel?!!
Of course. It is not necessary to recalculate the address of the pixel again and again as the routine steps along the line. That's only done for the first pixel, with some extra work to speed up the rest of the routine as well. Drawing the first pixel of the line then takes roughly 150 cycles, all following pixels then only take 30 .. 34 cycles each, i.e. for long lines, the routine approaches 30000 pixels/second.
User avatar
darkatx
Vic 20 Afficionado
Posts: 470
Joined: Wed Feb 04, 2009 2:17 pm
Location: Canada

Post by darkatx »

Of course. It is not necessary to recalculate the address of the pixel again and again as the routine steps along the line...
D'OH! :shock:

:::facepalm:::
Learning all the time... :)
matsondawson
The Most Noble Order of Denial
Posts: 343
Joined: Fri May 01, 2009 4:44 pm

Post by matsondawson »

So the plan I was going to use was an attempt at hardware assisted line drawing.

By wiring (T1 ->) T2 -> shift register and other combinations you can use it as a bit shifter for the line (assuming you turn off interrupts). I think I was down to 17ish cycles a pixel ignoring setup. The thing is you end up needing about 5 routines depending on the line angle, and it's only accurate to something like 1:32.

Another possibility is to combine with that (given the time sensitive nature of the routine), is to trigger interrupts to add more accuracy just by slowing the loop down once in a while.

So it's good for vertical lines. But it can also be used for horizontal, I can't remember off the top of my head how atm, as I'm at work, but it was to do with using XOR to difference the shifted bit patterns, thus getting a maximum of 8 pixels drawn per write.
Last edited by matsondawson on Mon Apr 23, 2012 2:12 am, edited 1 time in total.
User avatar
darkatx
Vic 20 Afficionado
Posts: 470
Joined: Wed Feb 04, 2009 2:17 pm
Location: Canada

Post by darkatx »

I used to be a decent programmer as a teen but then it all went to pot - you use it or lose it.
The thing you guys can do make me marvel.
:)
This stuff inspires me to keep programming and learning.
Thanks!
Learning all the time... :)
User avatar
Mike
Herr VC
Posts: 4816
Joined: Wed Dec 01, 2004 1:57 pm
Location: Munich, Germany
Occupation: electrical engineer

Post by Mike »

matsondawson wrote:hardware assisted line drawing [...] by wiring (T1 ->) T2 -> shift register and other combinations you can use it as a bit shifter for the line
There had been similar proposals for line routines which read a sawtooth signal off the SID waveform register of voice 3 - however I couldn't remember any one of these actually was implemented. Interfering interrupts or bad lines on the C64 surely didn't help in that case.
for horizontal, I can't remember off the top of my head how atm, as I'm at work, but it was to do with using XOR to difference the shifted bit patterns, thus getting a maximum of 8 pixels drawn per write.
Yes, the 'shorter' loop of a chunky algorithm runs alongside this:

Code: Select all

.loop
 LSR mask          ; 5
 BEQ fixup_column  ; 2 (assumed not taken)
 SBC dy            ; 3
 BCC fixup_y       ; 2 (assumed not taken)
 DEX               ; 2
 BNE loop          ; 3 (assumed taken)
-----------------------------
             total: 17 cycles
but it is lengthened to around 51 cycles when the y fixup is required. Stephen Judd had implemented such a routine in his 'dim4' demo. It also operates on a character set bitmap, so should port easily to the VIC-20. Which one of these (chunky/non-chunky) versions is actually faster depends a lot on the distribution of slopes in your application. Special casing horizontal lines is also a good idea.

You need fast 'driver' routines for these line plotters though, otherwise much of their speed is gobbled up in surrounding overhead ... :?
User avatar
Kweepa
Vic 20 Scientist
Posts: 1314
Joined: Fri Jan 04, 2008 5:11 pm
Location: Austin, Texas
Occupation: Game maker

Post by Kweepa »

Here's my guess at the missing fixup routines...

Code: Select all

scr word 0
mask byte 0
masks byte 255, 127, 63, 31, 15, 7, 3, 1

 ; multiply x1 by 128
 lda #0
 sta scr
 lda x1
 lsr
 lsr scr
 sta scr+1
 ; and add y1
 lda y1
 adc scr
 sta scr
 lda #0
 tay

 ; write initial part
 lda x1
 and #7
 tax
 lda masks,x
 sta mask
 eor (scr), y
 sta (scr), y

 ; set up x
 lda x2
 sec
 sbc x1
 tax

loop
 LSR mask          ; 5
 BEQ fixup_column  ; 2 (assumed not taken)
loop2
 SBC dy            ; 3
 BCC fixup_y       ; 2 (assumed not taken)
loop3
 DEX               ; 2
 BNE loop          ; 3 (assumed taken)
 lda (scr), y
 eor mask
 sta (scr), y
 rts

fixup_y
 sta dy            ; 3
 lda (scr), y      ; 5
 eor mask          ; 3
 sta (scr), y      ; 6
 lda dy            ; 3
 iny               ; 2
 jmp loop3         ; 3
                   ; 25

fixup_column
 sta dy           ; 3
 lda scr          ; 3
 bne fix2         ; 2/3
 lda #128         ; 2
 sta scr          ; 3
 bcc fix3         ; 3
fix2
 lda #0           ; 2
 sta scr          ; 3
 inc scr+1        ; 5
fix3
 lda #255         ; 2
 sta mask         ; 3
 eor (scr), y     ; 5
 sta (scr), y     ; 6
 lda dy           ; 3
 jmp loop2        ; 3
                  ; 41

; this second one assumes that each column is on a separate page
; data, or a second buffer, would be interleaved
fixup_column
 sta dy           ; 3
 inc scr+1        ; 5
 lda #255         ; 2
 sta mask         ; 3
 eor (scr), y     ; 5
 sta (scr), y     ; 6
 lda dy           ; 3
 jmp loop2        ; 3
                  ; 30
Is there a quicker way to do this?
User avatar
pixel
Vic 20 Scientist
Posts: 1330
Joined: Fri Feb 28, 2014 3:56 am
Website: http://hugbox.org/
Location: Berlin, Germany
Occupation: Pan–galactic shaman

Re:

Post by pixel »

Kweepa wrote:

Code: Select all

 ; multiply x1 by 128
 lda #0
 sta scr
 lda x1
 lsr
 lsr scr
 sta scr+1
Is there a quicker way to do this?

Code: Select all

    lda x1
    lsr
    sta scr+1
    lda #0
    ror
    sta scr
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
Post Reply