Basic Speedup Tip
Moderator: Moderators
- saundby
- Vic 20 Enthusiast
- Posts: 166
- Joined: Wed Feb 22, 2006 11:55 pm
- Website: http://saundby.com/
- Location: Gold Country, CA
Integers
Rather counterintuitive, but it's true:
Integer variables (X%) are slower than floating point ones (X). Avoid them unless you're under greater space constraints than time constraints.
-Mark G.
Integer variables (X%) are slower than floating point ones (X). Avoid them unless you're under greater space constraints than time constraints.
-Mark G.
- Mike
- Herr VC
- Posts: 4845
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Well, they should have better said: Z*Z runs circles around Z^2.GreyGhost wrote:Something I just ran across in a RUN Magazine. It said that:
Z^2 executes slower than Z*Z
I can't think of any program I've writen that used an exponent, but faster is faster.
That said, in CBM BASIC the expression X^Y internally is computed as EXP(Y*LOG(X)). For arbitrary fractional Y, this is okay. But for integer Y, everything else is faster. Same thing with SQR(X). BASIC computes X^(1/2), i.e. EXP(LOG(X)/2). I wrote an own implementation of SQR(X) somewhere on my HD. Using Heron's algorithm, it is 4 times faster.
Michael
- Mike
- Herr VC
- Posts: 4845
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Here's the loader:Mike wrote:I wrote an own implementation of SQR(X) [...] using Heron's algorithm, it is 4 times faster.
Code: Select all
1 FORT=0TO74:READA:POKE673+T,A:NEXT:POKE1,161:POKE2,2:END
2 DATA32,27,220,32,43,220,240,66,16,3,76,72,210,165,97,133,251,41,1,9,128,133,97,169,4
3 DATA133,252,162,236,160,2,32,212,219,169,188,160,217,208,18,162,241,160,2,32,212,219
4 DATA169,236,160,2,32,15,219,169,241,160,2,32,103,216,198,97,198,252,208,229,165,251
5 DATA74,105,64,133,97,96
Compare
Code: Select all
T1=TI:FORT=0TO256:A=SQR(T):NEXT:T2=TI:PRINT(T2-T1)/60:REM 11.7 sec
Code: Select all
T1=TI:FORT=0TO256:A=USR(T):NEXT:T2=TI:PRINT(T2-T1)/60:REM 2.7 sec
Code: Select all
1 FORT=0TO256
2 S=S+ABS(SQR(T)*SQR(T)-T)
3 U=U+ABS(USR(T)*USR(T)-T)
4 NEXT
5 PRINT S,U
SQR(x): ~3.6E-5
USR(x): ~1.1E-6
Greetings,
Michael
Edit: added link to commented source.
Last edited by Mike on Mon Jun 03, 2013 2:51 pm, edited 2 times in total.
Nice. At first I made a typo, and got a routine that had a lot of calculation error.
This is an interesting test program:
It displays exactly for which square roots the two algorithms have an error, and the total amount (not the total average error). On this test run, it shows that for the square roots from 0 to 256, the SQR routine has an error in 210 cases, and your USR routine in only 34 cases. Never does the USR routine have an error when the SQR routine is correct.
Please note that a PRINT statement does some rounding when converting from MLFPT to a printable string:
A=SQR(15)*SQR(15)
B=USR(15)*USR(15)
PRINT A returns 15
PRINT B returns 15
PRINT A-15 returns 3.7252903E-09
PRINT B-15 returns -3.7252903E-09
But hmm..
PRINT (SQR(14)*SQR(14))-14 returns 3.7252903E-09
PRINT (SQR(14)*USR(14))-14 returns 3.7252903E-09
PRINT (USR(14)*SQR(14))-14 returns 0
PRINT (USR(14)*USR(14))-14 returns 0
PRINT (SQR(15)*SQR(15))-15 returns 3.7252903E-09
PRINT (USR(15)*USR(15))-15 returns -3.7252903E-09
PRINT (SQR(15)*USR(15))-15 returns 0
PRINT (USR(15)*SQR(15))-15 returns 0
I'm not sure I can explain this behavior.
This is an interesting test program:
Code: Select all
10 U=0:S=0:FORT=0TO256:E$=""
15 IFABS(SQR(T)*SQR(T)-T)<>0THENE$=" SQR":S=S+1
20 IFABS(USR(T)*USR(T)-T)<>0THENE$=E$+" USR":U=U+1
22 IFE$<>""THENPRINTT;E$
25 NEXT
30 PRINT"TOTAL SQR:";S
32 PRINT"TOTAL USR:";U
Please note that a PRINT statement does some rounding when converting from MLFPT to a printable string:
A=SQR(15)*SQR(15)
B=USR(15)*USR(15)
PRINT A returns 15
PRINT B returns 15
PRINT A-15 returns 3.7252903E-09
PRINT B-15 returns -3.7252903E-09
But hmm..
PRINT (SQR(14)*SQR(14))-14 returns 3.7252903E-09
PRINT (SQR(14)*USR(14))-14 returns 3.7252903E-09
PRINT (USR(14)*SQR(14))-14 returns 0
PRINT (USR(14)*USR(14))-14 returns 0
PRINT (SQR(15)*SQR(15))-15 returns 3.7252903E-09
PRINT (USR(15)*USR(15))-15 returns -3.7252903E-09
PRINT (SQR(15)*USR(15))-15 returns 0
PRINT (USR(15)*SQR(15))-15 returns 0
I'm not sure I can explain this behavior.
Anders Carlsson
- Mike
- Herr VC
- Posts: 4845
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
More interesting, this anomaly doesn't appear, if you assign the results to a variable first:carlsson wrote:But hmm..
PRINT (SQR(14)*SQR(14))-14 returns 3.7252903E-09
PRINT (SQR(14)*USR(14))-14 returns 3.7252903E-09
PRINT (USR(14)*SQR(14))-14 returns 0
PRINT (USR(14)*USR(14))-14 returns 0
I'm not sure I can explain this behavior.
S=SQR(14):U=USR(14):PRINT U*S-14, S*U-14
Both expressions return 0.
But this isn't restricted to my USR() function. After some tries I came up with:
PRINT 2.345*3.456
8.10432001
PRINT 3.456*2.345
8.10432
the correct result of course being 8.10432 ...
PRINT 2.345*3.456-8.10432
3.7252903E-09
PRINT 3.456*2.345-8.10432
0
A=2.345:B=3.456:PRINT A*B,B*A
8.10432001
8.10432001
(now we have an identical, but still slightly inaccurate answer.)
Seems, as if the VIC's multiplication is not commutative, unless the operands come from memory (as opposed from FAC). The C=64, and C=16, and C=128 show the same behaviour. Which is sad, as even though the associative, and distributive laws usually don't hold with FP arithmetic, at least the commutative law should hold on both addition and multiplication.
Michael
- Mike
- Herr VC
- Posts: 4845
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Further analysis:
Integer numbers can be represented exactly (as long as they're not greater than 2^32), so we can construct 2.345 and 3.456 as decimal fractions. The division of two exact integers is possibly the best way to get accurate fp numbers:
And there:
PRINT 2.345-2345/1000
9.31322575E-10
PRINT 3.456-3456/1000
0
we see, that 2.345 (opposed to 2345/1000) is presumably represented outside the required accuracy, which is the reason that we get:
A=2.345:B=3.456:PRINT A*B
8.10432001
Still this could be tolerated, but the multiplication in expressions simply is done wrong:
In continued calculations without parenthesis:
- One operand (the "running result") is always fetched from the FAC (which has an extra rounding byte),
- while the other operand is fetched from memory.
And the multiplication of two operands with two different accuracies is not commutative. If we force the running result into, and then out of memory we get a consistent result:
PRINT (2.345*1)*(3.456*1),(3.456*1)*(2.345*1)
or
PRINT (2.345+0)*(3.456+0),(3.456+0)*(2.345+0)
or
A=2.345:B=3.456:PRINT A*B (as above)
will do the trick.
Michael
P.S.: Since we're now clearly off-topic, we should continue further discussion in a new thread, and move the postings since Jun 18 to this new thread (but leaving a copy of the Jun 18 post here).
Integer numbers can be represented exactly (as long as they're not greater than 2^32), so we can construct 2.345 and 3.456 as decimal fractions. The division of two exact integers is possibly the best way to get accurate fp numbers:
And there:
PRINT 2.345-2345/1000
9.31322575E-10
PRINT 3.456-3456/1000
0
we see, that 2.345 (opposed to 2345/1000) is presumably represented outside the required accuracy, which is the reason that we get:
A=2.345:B=3.456:PRINT A*B
8.10432001
Still this could be tolerated, but the multiplication in expressions simply is done wrong:
In continued calculations without parenthesis:
- One operand (the "running result") is always fetched from the FAC (which has an extra rounding byte),
- while the other operand is fetched from memory.
And the multiplication of two operands with two different accuracies is not commutative. If we force the running result into, and then out of memory we get a consistent result:
PRINT (2.345*1)*(3.456*1),(3.456*1)*(2.345*1)
or
PRINT (2.345+0)*(3.456+0),(3.456+0)*(2.345+0)
or
A=2.345:B=3.456:PRINT A*B (as above)
will do the trick.
Michael
P.S.: Since we're now clearly off-topic, we should continue further discussion in a new thread, and move the postings since Jun 18 to this new thread (but leaving a copy of the Jun 18 post here).
Another thing to add hear. It was noted earlier that you could gain speed by defining the variables that are used most at the beginning of the program. This can be done easily with the DIM command.
0 DIM A,B,C,D,E,F,G
This line sets the variables to zero. I use A,B and so* on because they are the quickest to be looked up. You can use any Letter or Letter-Number combo you like. Just scan your program and find the variable that is used most(and in a useful spot) in your program and start the DIM statement with that variable. Then find the next most used/useful variable and so on.
Later,
*Would the variable AA come after A or is it B as the quickest to be looked up? Which is faster, AA or C?
0 DIM A,B,C,D,E,F,G
This line sets the variables to zero. I use A,B and so* on because they are the quickest to be looked up. You can use any Letter or Letter-Number combo you like. Just scan your program and find the variable that is used most(and in a useful spot) in your program and start the DIM statement with that variable. Then find the next most used/useful variable and so on.
Later,
*Would the variable AA come after A or is it B as the quickest to be looked up? Which is faster, AA or C?
Rob
- Mike
- Herr VC
- Posts: 4845
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Single-letter vs. two-letter variables
Hi Rob!
How quick the variables are found only depends on the order they're defined in the program, not their position in the alphabet.
Still two-letter variables, and letter-digit variables need two characters to match, for that reason they're slightly slower - comparable to as if they'd been declared one place later in the list as single-letter variables.
Greetings,
Michael
How quick the variables are found only depends on the order they're defined in the program, not their position in the alphabet.
Still two-letter variables, and letter-digit variables need two characters to match, for that reason they're slightly slower - comparable to as if they'd been declared one place later in the list as single-letter variables.
Greetings,
Michael
I just tried this:
10 DIM A,B,C,D,E,F,G
before RUN = 3562 bytes free = 19 bytes
after RUN = 3513 bytes free = 68 bytes
and:
10 A=0:B=0:C=0: D=0:E=0:F=0:G=0
before RUN = 3549 bytes free = 32 bytes
after RUN = 3500 bytes free = 81 bytes
They both take up the same amount of extra space after running(49 bytes).
Maybe this is more of a memory saver than a speed up.
10 DIM A,B,C,D,E,F,G
before RUN = 3562 bytes free = 19 bytes
after RUN = 3513 bytes free = 68 bytes
and:
10 A=0:B=0:C=0: D=0:E=0:F=0:G=0
before RUN = 3549 bytes free = 32 bytes
after RUN = 3500 bytes free = 81 bytes
They both take up the same amount of extra space after running(49 bytes).
Maybe this is more of a memory saver than a speed up.
Rob
- Mike
- Herr VC
- Posts: 4845
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
Good algorithms vs. Bad algorithms
This is a good place to point out, that no fiddling with variable names, or their order of declaration can rescue an otherwise bad algorithm.
A good example are sorting algorithms. Bubble-Sort, while easily implemented (I can write that routine down straight from memory) is a bad algorithm - such that it really should be forbidden to teach it to those new to programming. I just use it here, to state the point, to compare it with Quick-Sort.
Quick-Sort is more complicated to program. I confess I always have to look it up in a book, when I need to implement it myself. Luckily C has a built-in Quick-Sort.
First comes the example. Replace GOSUB 50 in line 13 with GOSUB 20 to test Bubble-Sort instead of Quick-Sort. N is specified in line 11.
Greetings,
Michael
I use the variant of Quick-Sort, that sorts the smallest sub-array first. This minimizes stack usage. DIMS(31) is enough to sort up to 65535 elements.
Quick-Sort also works with String-Arrays (use A$(), A$, and X$ instead). If you want to sort other arrays alongside A(), place the additional swap operation(s) between lines 55 and 56.
You'll see, for N up to 10, they're roughly on par, Bubble-Sort may even be faster, but look, what happens for greater N:
These are the run times in clock ticks:
A good example are sorting algorithms. Bubble-Sort, while easily implemented (I can write that routine down straight from memory) is a bad algorithm - such that it really should be forbidden to teach it to those new to programming. I just use it here, to state the point, to compare it with Quick-Sort.
Quick-Sort is more complicated to program. I confess I always have to look it up in a book, when I need to implement it myself. Luckily C has a built-in Quick-Sort.
First comes the example. Replace GOSUB 50 in line 13 with GOSUB 20 to test Bubble-Sort instead of Quick-Sort. N is specified in line 11.
Greetings,
Michael
I use the variant of Quick-Sort, that sorts the smallest sub-array first. This minimizes stack usage. DIMS(31) is enough to sort up to 65535 elements.
Quick-Sort also works with String-Arrays (use A$(), A$, and X$ instead). If you want to sort other arrays alongside A(), place the additional swap operation(s) between lines 55 and 56.
Code: Select all
10 DIMS(31):SP=0:REM ** STACK FOR QUICK-SORT
11 N=10:DIMA(N)
12 FORT=0TON:A(T)=RND(1):NEXT
13 T1=TI:L=0:R=N:GOSUB50:T2=TI
14 FORT=0TON:PRINTA(T):NEXT
15 PRINTT2-T1
16 END
18 :
19 REM ** BUBBLE-SORT
20 IFR<=LTHENRETURN
21 F=0:FORT=LTOR-1:IFA(T)>A(T+1)THENA=A(T):A(T)=A(T+1):A(T+1)=A:F=1
22 NEXT:IFFTHEN21
23 RETURN
29 :
49 REM ** QUICK-SORT
50 IFR<=LTHENRETURN
51 I=L:J=R:X=A((L+R)/2)
52 IFA(I)<XTHENI=I+1:GOTO52
53 IFA(J)>XTHENJ=J-1:GOTO53
54 IFI>JTHEN57
55 A=A(I):A(I)=A(J):A(J)=A
56 I=I+1:J=J-1
57 IFI<=JTHEN52
58 IFJ-L<R-ITHENS(SP)=I:S(SP+1)=R:SP=SP+2:R=J:GOSUB50:GOTO60
59 S(SP)=L:S(SP+1)=J:SP=SP+2:L=I:GOSUB50
60 SP=SP-2:L=S(SP):R=S(SP+1):GOTO50
These are the run times in clock ticks:
Code: Select all
Quick-Sort
N Run 1 Run 2 Run 3 Run 4 Run 5
10 56 65 60 61 59
30 230 221 231 211 207
100 837 881 867 849 858
300 3029 3169 3054 2969 2960
1000 12032 11954 11941 12553 12266
Bubble-Sort
N Run 1 Run 2 Run 3 Run 4 Run 5
10 66 47 93 82 64
30 563 501 684 633 627
100 6268 6918 6837 7474 6622
300 62476 57756 ... you get the idea
1000 not tried, expected time ~600000..700000 clock ticks.
-
- Vic 20 Afficionado
- Posts: 350
- Joined: Tue Apr 14, 2009 8:15 am
- Website: http://wimbasic.webs.com
- Location: Netherlands
- Occupation: farmer
Defined functions
The list of (float, int, string) variables is also interspersed with hooks to the DEF FNxx( ) functions.
More specifically, the bits 7s of the variablename are a flag for the datatype.
For a float, they both are 0. For a string, only the second is 1. For int, they both are 1. For FN, only the first is 1.
The value that is kept for the FN is a pointer to the function in the BASIC program.
So when you're defining a lot of those in the beginning of your program and hardly using them, be sure to DIM variables first.
More specifically, the bits 7s of the variablename are a flag for the datatype.
For a float, they both are 0. For a string, only the second is 1. For int, they both are 1. For FN, only the first is 1.
The value that is kept for the FN is a pointer to the function in the BASIC program.
So when you're defining a lot of those in the beginning of your program and hardly using them, be sure to DIM variables first.