How many cycles per pixel should I expect for drawing lines?
Moderator: Moderators
-
- The Most Noble Order of Denial
- Posts: 343
- Joined: Fri May 01, 2009 4:44 pm
How many cycles per pixel should I expect for drawing lines?
i.e. What's the best algorithm
-
- Vic 20 Hobbyist
- Posts: 143
- Joined: Thu Aug 25, 2011 10:04 am
Really depends: Basic, Machine language? Changing the colors? Using pre-programmed characters 1 bye at a time? XORing stuff already there? Etc.
Sorry!
Sorry!
Phil Ranger
-------------
"Don't eat the trees 2" for the VIC 20 : http://www.box.net/shared/u398kj0nr0lkauzm1k67
on line: http://www.mdawson.net/vic20chrome/vic2 ... otrees.prg
-------------
"Don't eat the trees 2" for the VIC 20 : http://www.box.net/shared/u398kj0nr0lkauzm1k67
on line: http://www.mdawson.net/vic20chrome/vic2 ... otrees.prg
-
- The Most Noble Order of Denial
- Posts: 343
- Joined: Fri May 01, 2009 4:44 pm
- Mike
- Herr VC
- Posts: 4816
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
My faster Bresenham line routine implementation uses the following code snippet 8 times unrolled, so all bit positions are covered:matsondawson wrote:XORing in machine language.
No colour changes.
Code: Select all
5.5 LDA (base),Y
2 EOR #bit ; ** or AND/ORA
6 STA (base),Y
2 DEX
2 BEQ end
3 LDA error
3 SBC dy
3/2 +-- BCS
0/3 | ADC dx
0/2 | INY ; ** or DEY
3 +-> STA error
29.5/33.5 cycles/pixel
**) self-modifying code
-
- The Most Noble Order of Denial
- Posts: 343
- Joined: Fri May 01, 2009 4:44 pm
Thanks, I'm giving Star Wars a tentative go atm and need to know that my calculations are similar to others experiences.
Could you not use Y to detect the end of line by setting the base as the start of the line, and also use the reciprocal of the gradient and carry as the divider, like the following?
Could you not use Y to detect the end of line by setting the base as the start of the line, and also use the reciprocal of the gradient and carry as the divider, like the following?
Code: Select all
5.5 LDA (base),Y ; base is set as start of line and Y=0
2 EOR #bit ; ** or AND/ORA
6 STA (base),Y
2 TXA ; x holds the error
3 ADC #recip ; recip is x/y*256
2 TAX
3/2 +-- BCS
0/2 | DEY ; ** or DEY Y==0 is end of line
3 BEQ end
-
- The Most Noble Order of Denial
- Posts: 343
- Joined: Fri May 01, 2009 4:44 pm
Assuming the reciprocal is a valid method, for mostly vertical lines if you don't mind an illegal instruction...
Code: Select all
.loop
5.5 LDA (base),Y ; base is set as start of line when Y=0
2 EOR #bit ; ** or AND/ORA
6 STA (base),Y
2 DEY ; ** or DEY Y==0 is end of line
3/2 BEQ end
2 TXA
2 SBX #recip ; recip is x/y*256 - illegal inst
3/2 +-- BCS loop
.next
24.5 cycles vertical / 23.5 horizontal
- Mike
- Herr VC
- Posts: 4816
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Lines with slopes of |dy/dx|<1 have adjacent pixels with equal y co-ordinates, so this method might leave out several pixels at one end of the line. Furthermore, in my implementation the Y register actually contains the y co-ordinate of the current pixel; INY is required for dy>0, and DEY for dy<0.matsondawson wrote:Could you not use Y to detect the end of line by setting the base as the start of the line,
That'd then be a DDA line algorithm. An 8-bit reciprocal for 8-bit co-ordinates requires careful rounding. Otherwise the line routine will miss the endpoint due to accumulation of rounding errors with lines longer than 128 pixels. The case of |dy/dx|=1 will need to go extra, as 'recip' cannot take the value 256.and also use the reciprocal of the gradient and carry as the divider, [...]?
Less than 190 cycles a pixel?!!
That's really fast!
I'm working on my second line routine...the first one is probably running slightly slower than Mike's (big shock there..lol) but I almost have my chunk routine done and I feel confident it should drop the cycle count some more.
Star Wars! I know the thread is old but all the best man love to see it happen!
That's really fast!
I'm working on my second line routine...the first one is probably running slightly slower than Mike's (big shock there..lol) but I almost have my chunk routine done and I feel confident it should drop the cycle count some more.
Star Wars! I know the thread is old but all the best man love to see it happen!
Learning all the time...
- Mike
- Herr VC
- Posts: 4816
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Of course. It is not necessary to recalculate the address of the pixel again and again as the routine steps along the line. That's only done for the first pixel, with some extra work to speed up the rest of the routine as well. Drawing the first pixel of the line then takes roughly 150 cycles, all following pixels then only take 30 .. 34 cycles each, i.e. for long lines, the routine approaches 30000 pixels/second.darkatx wrote:Less than 190 cycles a pixel?!!
-
- The Most Noble Order of Denial
- Posts: 343
- Joined: Fri May 01, 2009 4:44 pm
So the plan I was going to use was an attempt at hardware assisted line drawing.
By wiring (T1 ->) T2 -> shift register and other combinations you can use it as a bit shifter for the line (assuming you turn off interrupts). I think I was down to 17ish cycles a pixel ignoring setup. The thing is you end up needing about 5 routines depending on the line angle, and it's only accurate to something like 1:32.
Another possibility is to combine with that (given the time sensitive nature of the routine), is to trigger interrupts to add more accuracy just by slowing the loop down once in a while.
So it's good for vertical lines. But it can also be used for horizontal, I can't remember off the top of my head how atm, as I'm at work, but it was to do with using XOR to difference the shifted bit patterns, thus getting a maximum of 8 pixels drawn per write.
By wiring (T1 ->) T2 -> shift register and other combinations you can use it as a bit shifter for the line (assuming you turn off interrupts). I think I was down to 17ish cycles a pixel ignoring setup. The thing is you end up needing about 5 routines depending on the line angle, and it's only accurate to something like 1:32.
Another possibility is to combine with that (given the time sensitive nature of the routine), is to trigger interrupts to add more accuracy just by slowing the loop down once in a while.
So it's good for vertical lines. But it can also be used for horizontal, I can't remember off the top of my head how atm, as I'm at work, but it was to do with using XOR to difference the shifted bit patterns, thus getting a maximum of 8 pixels drawn per write.
Last edited by matsondawson on Mon Apr 23, 2012 2:12 am, edited 1 time in total.
- Mike
- Herr VC
- Posts: 4816
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
There had been similar proposals for line routines which read a sawtooth signal off the SID waveform register of voice 3 - however I couldn't remember any one of these actually was implemented. Interfering interrupts or bad lines on the C64 surely didn't help in that case.matsondawson wrote:hardware assisted line drawing [...] by wiring (T1 ->) T2 -> shift register and other combinations you can use it as a bit shifter for the line
Yes, the 'shorter' loop of a chunky algorithm runs alongside this:for horizontal, I can't remember off the top of my head how atm, as I'm at work, but it was to do with using XOR to difference the shifted bit patterns, thus getting a maximum of 8 pixels drawn per write.
Code: Select all
.loop
LSR mask ; 5
BEQ fixup_column ; 2 (assumed not taken)
SBC dy ; 3
BCC fixup_y ; 2 (assumed not taken)
DEX ; 2
BNE loop ; 3 (assumed taken)
-----------------------------
total: 17 cycles
You need fast 'driver' routines for these line plotters though, otherwise much of their speed is gobbled up in surrounding overhead ...
- Kweepa
- Vic 20 Scientist
- Posts: 1314
- Joined: Fri Jan 04, 2008 5:11 pm
- Location: Austin, Texas
- Occupation: Game maker
Here's my guess at the missing fixup routines...
Is there a quicker way to do this?
Code: Select all
scr word 0
mask byte 0
masks byte 255, 127, 63, 31, 15, 7, 3, 1
; multiply x1 by 128
lda #0
sta scr
lda x1
lsr
lsr scr
sta scr+1
; and add y1
lda y1
adc scr
sta scr
lda #0
tay
; write initial part
lda x1
and #7
tax
lda masks,x
sta mask
eor (scr), y
sta (scr), y
; set up x
lda x2
sec
sbc x1
tax
loop
LSR mask ; 5
BEQ fixup_column ; 2 (assumed not taken)
loop2
SBC dy ; 3
BCC fixup_y ; 2 (assumed not taken)
loop3
DEX ; 2
BNE loop ; 3 (assumed taken)
lda (scr), y
eor mask
sta (scr), y
rts
fixup_y
sta dy ; 3
lda (scr), y ; 5
eor mask ; 3
sta (scr), y ; 6
lda dy ; 3
iny ; 2
jmp loop3 ; 3
; 25
fixup_column
sta dy ; 3
lda scr ; 3
bne fix2 ; 2/3
lda #128 ; 2
sta scr ; 3
bcc fix3 ; 3
fix2
lda #0 ; 2
sta scr ; 3
inc scr+1 ; 5
fix3
lda #255 ; 2
sta mask ; 3
eor (scr), y ; 5
sta (scr), y ; 6
lda dy ; 3
jmp loop2 ; 3
; 41
; this second one assumes that each column is on a separate page
; data, or a second buffer, would be interleaved
fixup_column
sta dy ; 3
inc scr+1 ; 5
lda #255 ; 2
sta mask ; 3
eor (scr), y ; 5
sta (scr), y ; 6
lda dy ; 3
jmp loop2 ; 3
; 30
- pixel
- Vic 20 Scientist
- Posts: 1330
- Joined: Fri Feb 28, 2014 3:56 am
- Website: http://hugbox.org/
- Location: Berlin, Germany
- Occupation: Pan–galactic shaman
Re:
Kweepa wrote:Is there a quicker way to do this?Code: Select all
; multiply x1 by 128 lda #0 sta scr lda x1 lsr lsr scr sta scr+1
Code: Select all
lda x1
lsr
sta scr+1
lda #0
ror
sta scr
A man without talent or ambition is most easily pleased. Others set his path and he is content.
https://github.com/SvenMichaelKlose
https://github.com/SvenMichaelKlose