Basic Speedup Tip

saundby · Post by **saundby** » Wed Mar 22, 2006 2:29 am

Rather counterintuitive, but it's true:

Integer variables (X%) are slower than floating point ones (X). Avoid them unless you're under greater space constraints than time constraints.

-Mark G.

GreyGhost · Post by **GreyGhost** » Fri Mar 24, 2006 11:13 am

Something I just ran across in a RUN Magazine. It said that:

Z^2 executes slower than Z*Z

I can't think of any program I've writen that used an exponent, but faster is faster.

Later,

Mike · Post by **Mike** » Mon Mar 27, 2006 5:34 am

GreyGhost wrote:Something I just ran across in a RUN Magazine. It said that:

Z^2 executes slower than Z*Z

I can't think of any program I've writen that used an exponent, but faster is faster.

Well, they should have better said: Z*Z runs circles around Z^2.

That said, in CBM BASIC the expression X^Y internally is computed as EXP(Y*LOG(X)). For arbitrary fractional Y, this is okay. But for integer Y, everything else is faster. Same thing with SQR(X). BASIC computes X^(1/2), i.e. EXP(LOG(X)/2). I wrote an own implementation of SQR(X) somewhere on my HD. Using Heron's algorithm, it is 4 times faster.

Michael

carlsson · Post by **carlsson** » Mon Mar 27, 2006 8:33 am

That explains why 9^3 = 729.000001.

Mike · Post by **Mike** » Sun Jun 18, 2006 1:49 pm

Mike wrote:I wrote an own implementation of SQR(X) [...] using Heron's algorithm, it is 4 times faster.

Here's the loader:

Code: Select all

1 FORT=0TO74:READA:POKE673+T,A:NEXT:POKE1,161:POKE2,2:END
2 DATA32,27,220,32,43,220,240,66,16,3,76,72,210,165,97,133,251,41,1,9,128,133,97,169,4
3 DATA133,252,162,236,160,2,32,212,219,169,188,160,217,208,18,162,241,160,2,32,212,219
4 DATA169,236,160,2,32,15,219,169,241,160,2,32,103,216,198,97,198,252,208,229,165,251
5 DATA74,105,64,133,97,96

As one could expect, the routine inserts into the vector of the USR() function. The 10 bytes immediately following the code are used for temporary float numbers. Here's the commented source.

Compare

Code: Select all

T1=TI:FORT=0TO256:A=SQR(T):NEXT:T2=TI:PRINT(T2-T1)/60:REM 11.7 sec

with

Code: Select all

T1=TI:FORT=0TO256:A=USR(T):NEXT:T2=TI:PRINT(T2-T1)/60:REM  2.7 sec

Finally, a check for accuracy:

Code: Select all

1 FORT=0TO256
2 S=S+ABS(SQR(T)*SQR(T)-T)
3 U=U+ABS(USR(T)*USR(T)-T)
4 NEXT
5 PRINT S,U

Ideally the ABS(...) expression would return 0. So, for the arguments of 0 to 256, this program accumulates the magnitudes of the resulting errors for both SQR(x) and USR(x). We get:

SQR(x): ~3.6E-5
USR(x): ~1.1E-6

Greetings,

Michael

Edit: added link to commented source.

carlsson · Post by **carlsson** » Fri Jun 30, 2006 8:02 am

Nice. At first I made a typo, and got a routine that had a lot of calculation error.

This is an interesting test program:

Code: Select all

10 U=0:S=0:FORT=0TO256:E$=""
15 IFABS(SQR(T)*SQR(T)-T)<>0THENE$=" SQR":S=S+1
20 IFABS(USR(T)*USR(T)-T)<>0THENE$=E$+" USR":U=U+1
22 IFE$<>""THENPRINTT;E$
25 NEXT
30 PRINT"TOTAL SQR:";S
32 PRINT"TOTAL USR:";U

It displays exactly for which square roots the two algorithms have an error, and the total amount (not the total average error). On this test run, it shows that for the square roots from 0 to 256, the SQR routine has an error in 210 cases, and your USR routine in only 34 cases. Never does the USR routine have an error when the SQR routine is correct.

Please note that a PRINT statement does some rounding when converting from MLFPT to a printable string:

A=SQR(15)*SQR(15)
B=USR(15)*USR(15)
PRINT A returns 15
PRINT B returns 15
PRINT A-15 returns 3.7252903E-09
PRINT B-15 returns -3.7252903E-09

But hmm..

PRINT (SQR(14)*SQR(14))-14 returns 3.7252903E-09
PRINT (SQR(14)*USR(14))-14 returns 3.7252903E-09
PRINT (USR(14)*SQR(14))-14 returns 0
PRINT (USR(14)*USR(14))-14 returns 0

PRINT (SQR(15)*SQR(15))-15 returns 3.7252903E-09
PRINT (USR(15)*USR(15))-15 returns -3.7252903E-09
PRINT (SQR(15)*USR(15))-15 returns 0
PRINT (USR(15)*SQR(15))-15 returns 0

I'm not sure I can explain this behavior.

Mike · Post by **Mike** » Mon Jul 03, 2006 3:52 am

carlsson wrote:But hmm..

PRINT (SQR(14)*SQR(14))-14 returns 3.7252903E-09
PRINT (SQR(14)*USR(14))-14 returns 3.7252903E-09
PRINT (USR(14)*SQR(14))-14 returns 0
PRINT (USR(14)*USR(14))-14 returns 0

I'm not sure I can explain this behavior.

More interesting, this anomaly doesn't appear, if you assign the results to a variable first:

S=SQR(14):U=USR(14):PRINT U*S-14, S*U-14

Both expressions return 0.

But this isn't restricted to my USR() function. After some tries I came up with:

PRINT 2.345*3.456
8.10432001

PRINT 3.456*2.345
8.10432

the correct result of course being 8.10432 ...

PRINT 2.345*3.456-8.10432
3.7252903E-09

PRINT 3.456*2.345-8.10432
0

A=2.345:B=3.456:PRINT A*B,B*A
8.10432001
8.10432001
(now we have an identical, but still slightly inaccurate answer.)

Seems, as if the VIC's multiplication is not commutative, unless the operands come from memory (as opposed from FAC).

The C=64, and C=16, and C=128 show the same behaviour. Which is sad, as even though the associative, and distributive laws usually don't hold with FP arithmetic, at least the commutative law should hold on both addition and multiplication.

Michael

carlsson · Post by **carlsson** » Mon Jul 03, 2006 12:07 pm

It got to do with converting decimal strings to FAC in one way or another, but one would assume they convert in the same way always.

Mike · Post by **Mike** » Tue Jul 04, 2006 1:58 am

Further analysis:

Integer numbers can be represented exactly (as long as they're not greater than 2^32), so we can construct 2.345 and 3.456 as decimal fractions. The division of two exact integers is possibly the best way to get accurate fp numbers:

And there:

PRINT 2.345-2345/1000
9.31322575E-10
PRINT 3.456-3456/1000
0

we see, that 2.345 (opposed to 2345/1000) is presumably represented outside the required accuracy, which is the reason that we get:

A=2.345:B=3.456:PRINT A*B
8.10432001

Still this could be tolerated, but the multiplication in expressions simply is done wrong:

In continued calculations without parenthesis:

- One operand (the "running result") is always fetched from the FAC (which has an extra rounding byte),
- while the other operand is fetched from memory.

And the multiplication of two operands with two different accuracies is not commutative. If we force the running result into, and then out of memory we get a consistent result:

PRINT (2.345*1)*(3.456*1),(3.456*1)*(2.345*1)

or

PRINT (2.345+0)*(3.456+0),(3.456+0)*(2.345+0)

or

A=2.345:B=3.456:PRINT A*B (as above)

will do the trick.

Michael

P.S.: Since we're now clearly off-topic, we should continue further discussion in a new thread, and move the postings since Jun 18 to this new thread (but leaving a copy of the Jun 18 post here).

GreyGhost · Post by **GreyGhost** » Sun Dec 31, 2006 1:56 pm

Another thing to add hear. It was noted earlier that you could gain speed by defining the variables that are used most at the beginning of the program. This can be done easily with the DIM command.

0 DIM A,B,C,D,E,F,G

This line sets the variables to zero. I use A,B and so* on because they are the quickest to be looked up. You can use any Letter or Letter-Number combo you like. Just scan your program and find the variable that is used most(and in a useful spot) in your program and start the DIM statement with that variable. Then find the next most used/useful variable and so on.

Later,

*Would the variable AA come after A or is it B as the quickest to be looked up? Which is faster, AA or C?

Mike · Post by **Mike** » Sun Dec 31, 2006 2:52 pm

Hi Rob!

How quick the variables are found only depends on the order they're defined in the program, not their position in the alphabet.

Still two-letter variables, and letter-digit variables need two characters to match, for that reason they're slightly slower - comparable to as if they'd been declared one place later in the list as single-letter variables.

Greetings,

Michael

Post by **Jeff-20** » Sun Dec 31, 2006 3:06 pm

I use the same idea, but I always define the variables:

A=0, B=0, C=0, etc.

Because I thought DIMA, B, C uesd more memory. Was my assumption wrong?

GreyGhost · Post by **GreyGhost** » Sun Dec 31, 2006 6:03 pm

I just tried this:

10 DIM A,B,C,D,E,F,G

before RUN = 3562 bytes free = 19 bytes
after RUN = 3513 bytes free = 68 bytes

and:

10 A=0:B=0:C=0: D=0:E=0:F=0:G=0

before RUN = 3549 bytes free = 32 bytes
after RUN = 3500 bytes free = 81 bytes

They both take up the same amount of extra space after running(49 bytes).

Maybe this is more of a memory saver than a speed up.

Mike · Post by **Mike** » Mon Jan 01, 2007 11:58 am

This is a good place to point out, that no fiddling with variable names, or their order of declaration can rescue an otherwise bad algorithm.

A good example are sorting algorithms. Bubble-Sort, while easily implemented (I can write that routine down straight from memory) is a bad algorithm - such that it really should be forbidden to teach it to those new to programming. I just use it here, to state the point, to compare it with Quick-Sort.

Quick-Sort is more complicated to program. I confess I always have to look it up in a book, when I need to implement it myself. Luckily C has a built-in Quick-Sort.

First comes the example. Replace GOSUB 50 in line 13 with GOSUB 20 to test Bubble-Sort instead of Quick-Sort. N is specified in line 11.

Greetings,

Michael

I use the variant of Quick-Sort, that sorts the smallest sub-array first. This minimizes stack usage. DIMS(31) is enough to sort up to 65535 elements.

Quick-Sort also works with String-Arrays (use A$(), A$, and X$ instead). If you want to sort other arrays alongside A(), place the additional swap operation(s) between lines 55 and 56.

Code: Select all

10 DIMS(31):SP=0:REM ** STACK FOR QUICK-SORT
11 N=10:DIMA(N)
12 FORT=0TON:A(T)=RND(1):NEXT
13 T1=TI:L=0:R=N:GOSUB50:T2=TI
14 FORT=0TON:PRINTA(T):NEXT
15 PRINTT2-T1
16 END
18 :
19 REM ** BUBBLE-SORT
20 IFR<=LTHENRETURN
21 F=0:FORT=LTOR-1:IFA(T)>A(T+1)THENA=A(T):A(T)=A(T+1):A(T+1)=A:F=1
22 NEXT:IFFTHEN21
23 RETURN
29 :
49 REM ** QUICK-SORT
50 IFR<=LTHENRETURN
51 I=L:J=R:X=A((L+R)/2)
52 IFA(I)<XTHENI=I+1:GOTO52
53 IFA(J)>XTHENJ=J-1:GOTO53
54 IFI>JTHEN57
55 A=A(I):A(I)=A(J):A(J)=A
56 I=I+1:J=J-1
57 IFI<=JTHEN52
58 IFJ-L<R-ITHENS(SP)=I:S(SP+1)=R:SP=SP+2:R=J:GOSUB50:GOTO60
59 S(SP)=L:S(SP+1)=J:SP=SP+2:L=I:GOSUB50
60 SP=SP-2:L=S(SP):R=S(SP+1):GOTO50

You'll see, for N up to 10, they're roughly on par, Bubble-Sort may even be faster, but look, what happens for greater N:

These are the run times in clock ticks:

Code: Select all

Quick-Sort

   N  Run 1  Run 2  Run 3  Run 4  Run 5

  10     56     65     60     61     59
  30    230    221    231    211    207
 100    837    881    867    849    858
 300   3029   3169   3054   2969   2960
1000  12032  11954  11941  12553  12266


Bubble-Sort

   N  Run 1  Run 2  Run 3  Run 4  Run 5

  10     66     47     93     82     64
  30    563    501    684    633    627
 100   6268   6918   6837   7474   6622
 300  62476  57756  ... you get the idea
1000  not tried, expected time ~600000..700000 clock ticks.

wimoos · Post by **wimoos** » Mon Jan 11, 2010 2:12 am

The list of (float, int, string) variables is also interspersed with hooks to the DEF FNxx( ) functions.

More specifically, the bits 7s of the variablename are a flag for the datatype.
For a float, they both are 0. For a string, only the second is 1. For int, they both are 1. For FN, only the first is 1.

The value that is kept for the FN is a pointer to the function in the BASIC program.

So when you're defining a lot of those in the beginning of your program and hardly using them, be sure to DIM variables first.

Denial

Basic Speedup Tip

Integers

Single-letter vs. two-letter variables

Good algorithms vs. Bad algorithms

Defined functions