Fun with CBM arithmetics

Mike · Post by **Mike** » Tue Dec 31, 2013 3:30 pm

I've found the bug.

The circumstances that trigger the bug are a bit complicated, but I'll try to explain.

During multiplication, this routine is called 4 times for each byte of the mantissa of one of the factors. The current mantissa byte is contained in A, and the Z flag has been set accordingly:

Code: Select all

.DA59  D0 03     BNE $DA5E
.DA5B  4C 83 D9  JMP $D983
.DA5E  4A        LSR A
.DA5F  09 80     ORA #$80
.DA61  A8        TAY
.DA62  90 19     BCC $DA7D
.DA64  18        CLC
.DA65  A5 29     LDA $29
.DA67  65 6D     ADC $6D
.DA69  85 29     STA $29
.DA6B  A5 28     LDA $28
.DA6D  65 6C     ADC $6C
.DA6F  85 28     STA $28
.DA71  A5 27     LDA $27
.DA73  65 6B     ADC $6B
.DA75  85 27     STA $27
.DA77  A5 26     LDA $26
.DA79  65 6A     ADC $6A
.DA7B  85 26     STA $26
.DA7D  66 26     ROR $26
.DA7F  66 27     ROR $27
.DA81  66 28     ROR $28
.DA83  66 29     ROR $29
.DA85  66 70     ROR $70
.DA87  98        TYA
.DA88  4A        LSR A
.DA89  D0 D6     BNE $DA61
.DA8B  60        RTS

Now, the first two instructions, BNE $DA5E and JMP $D983 are supposed to execute a shortcut in case the mantissa byte to multiply with is 0. It is sensible to optimize for this, especially because small integer constants do contain zeroes in their lower significant parts of the mantissa.

The routine would still work without that optimization, but the branch at $DA62 would always be executed (for 8 times in total), skipping the additions to $26 .. $29 - and all the routine then does is painstakingly move the bytes $26 .. $29 over to $27 .. $29, $70, one bit at a time. Of course this can be done much faster with just four load and store instrúctions. For this another routine at $D983 is 'reused', which normally 'normalizes' the mantissa:

Code: Select all

.D983  A2 25     LDX #$25
.D985  B4 04     LDY $04,X
.D987  84 70     STY $70
.D989  B4 03     LDY $03,X
.D98B  94 04     STY $04,X
.D98D  B4 02     LDY $02,X
.D98F  94 03     STY $03,X
.D991  B4 01     LDY $01,X
.D993  94 02     STY $02,X
.D995  A4 68     LDY $68
.D997  94 01     STY $01,X
.D999  69 08     ADC #$08
.D99B  30 E8     BMI $D985
.D99D  F0 E6     BEQ $D985
.D99F  E9 08     SBC #$08
.D9A1  A8        TAY
.D9A2  A5 70     LDA $70
.D9A4  B0 14     BCS $D9BA ---+ normally,
.D9A6  16 01     ASL $01,X    | this branch
.D9A8  90 02     BCC $D9AC    | is supposed
.D9AA  F6 01     INC $01,X    | to happen.
.D9AC  76 01     ROR $01,X    |
.D9AE  76 01     ROR $01,X    |
.D9B0  76 02     ROR $02,X    |
.D9B2  76 03     ROR $03,X    |
.D9B4  76 04     ROR $04,X    |
.D9B6  6A        ROR A        |
.D9B7  C8        INY          |
.D9B8  D0 EC     BNE $D9A6    |
.D9BA  18        CLC       <--+
.D9BB  60        RTS

The instructions from $D983 to $D993 do as supposed, however already beginning at $D995, LDY $68 with STY $01,X is a little bit dubious: Without the optimization, a 0 would end up at $26 as the result of being rotated to the right 8 times. Whether there is a 0 at $68 at all times the routine is called with this purpose cannot be guaranteed. But anyway, that is not what triggers the bug in the first place.

The routine at $DA59 is first called with a non-0 byte - for the examples in the earlier posts this is ultimately the result of adding something around 1..59 divided by 10^9 to the constant 0.75. The routine exits with C=1, which will become important now!

The next mantissa byte is zero, so now the shortcut at $D983 is called. With A=0 and C=1 on entry, the two instructions ADC #$08 and SBC #$08 result in A=0 and C=1 again. At $D9A4, the instruction BCS $D9BA skips the second half of the routine, which is a good thing, however it also executes a CLC at $D9BA, and from there everything goes downhill.

The third mantissa byte is *also* zero, so now the shortcut gets called a second time. This time, however C is 0 (with A again being 0), which results in C=0 and A=255 after SBC #$08. The instructions from $D9A6 .. $D9B8 are now executed, which shift the whole mantissa at least one bit to the right!

For the remaining mantissa byte, the check for a shortcut is skipped. Whatever is finally added to the resulting mantissa, the earlier parts have been inadvertently divided by 2 before, which is exactly what can be seen as false result.

Kweepa · Post by **Kweepa** » Tue Dec 31, 2013 8:29 pm

Nice find!
The same bug is of course in the c64...

Mike · Post by **Mike** » Wed Jan 01, 2014 6:13 am

Here's my proposal for a patch:

1. It is intended to keep the optimization,
2. It must be ensured, that the routine at $D983 is always called with C=1,
3. It must be ensured, that the contents of $68 are 0 upon calling $D983.

The routine in question is buried (too) deep into the BASIC ROM, and it is not vectored.
A patch will require the replacement of the BASIC ROM.

4. DA5B JMP $D983 needs to be rerouted to a suitably 'empty' place within $C000 .. $DFFF,
5. one possible candidate might be the range $DF53 .. $DF70:

The addresses $DF53 .. $DF70 contain only $AA's and are located after two tables, which contain powers of 10 and TI conversion constants, but the conversion loop beginning at $DE66 (spanning to $DEC2) stops at Y=$24 or Y=$3C and never reaches into $DF53 .. $DF70. There are also no other references in the BASIC ROM to that address range.

[...]

Edit: I replaced the original version of the patch with a revised one (download). For the source, see a few posts further down this topic.

orion70 · Post by **orion70** » Wed Jan 01, 2014 10:36 am

Congrats on being the first to release software in 2014

Mike · Post by **Mike** » Wed Jan 01, 2014 2:13 pm

orion70 wrote:Congrats on being the first to release software in 2014

PM sent.

Kweepa · Post by **Kweepa** » Thu Jan 02, 2014 5:03 pm

Checked out the other emulators in VICE.
BASIC 2, 3.5 and 4 suffer from this bug, but it's absent in BASIC 7.

Mike · Post by **Mike** » Fri Jan 03, 2014 8:56 am

In the last two days I checked the patched BASIC ROM in two installations of VICE on my notebook.

As a picture is more worth than a thousand words, I got the idea to use my Mandelbrot zoomer for illustrating purposes:

Here's a co-ordinate set, where both x- and y-coordinates just straddle across a critical interval:

Code: Select all

x_min = -0.531250128
x_max = -0.531249872
y_min =  0.531249904
y_max =  0.531250096
1024 Iterations

Now for the results, with the original ROM to the left (download) and with the patched ROM to the right (download):

Parts of the left picture are distorted and torn, compared to the right picture.

Here's the same co-ordinate set computed on the PC, same display algorithm, just higher resolution:

Edit: refined example

Post by **Jeff-20** » Fri Jan 03, 2014 7:16 pm

I love it.

buzbard · Post by **buzbard** » Sun Jan 19, 2014 1:24 pm

Nice work Mike! Thanks.

Kweepa wrote:Nice find!
The same bug is of course in the c64...

Here's the same patch modified for the C64.

Code: Select all

10 open2,8,2,"basic,s,r"
11 open3,8,3,"basic-p,s,w"
12 fort=40960to49151
13 get#2,a$:a=asc(a$+chr$(0))
14 ift=47708thena= 83:goto22
15 ift=47709thena=191:goto22
16 ift=48979thena=133:goto22
17 ift=48980thena=104:goto22
18 ift=48981thena= 56:goto22
19 ift=48982thena= 76:goto22
20 ift=48983thena=131:goto22
21 ift=48984thena=185
22 print#3,chr$(a);
23 next
24 close3
25 close2

The VIC and the C64 have almost exactly the same BASIC ROM just located in different memory ranges.
VIC - C000-DFFF
C64 - A000-BFFF

Mike · Post by **Mike** » Sun Jan 19, 2014 4:19 pm

On the C64, it is also possible to copy the BASIC-ROM into the RAM, and patch it there. Here's a program which does exactly this:

Code: Select all

1 POKE1,PEEK(1)AND248OR7
2 FORT=40960TO53247:POKET,PEEK(T):NEXT
3 POKE47708,83:POKE47709,191
4 FORT=0TO5:READA:POKE48979+T,A:NEXT
5 POKE1,PEEK(1)AND248OR6
6 DATA133,104,56,76,131,185

Here's a revised version of the patch, which installs an own routine to handle the mantissa in the free range of $DF53 .. $DF70, and does not anymore 're-use' the normalize routine. Even if it now uses up some more bytes of that free space, it is now implemented as it should have been done in the first place. The jump at $DA5B now is altered to:

Code: Select all

.DA5B  4C 5A DF  JMP $DF5A

... and the routine at $DF5A now is:

Code: Select all

.DF5A  A5 29     LDA $29
.DF5C  85 70     STA $70
.DF5E  A5 28     LDA $28
.DF60  85 29     STA $29
.DF62  A5 27     LDA $27
.DF64  85 28     STA $28
.DF66  A5 26     LDA $26
.DF68  85 27     STA $27
.DF6A  A0 01     LDY #$01
.DF6C  98        TYA
.DF6D  4A        LSR A
.DF6E  85 26     STA $26
.DF70  60        RTS

This is, what the original shortcut at $D983 was supposed to do, only that one fails under those circumstances I already described in the earlier postings. The following program applies the new patch to a *.bin file of the BASIC ROM:

Code: Select all

10 OPEN2,8,2,"BASIC,S,R"
11 OPEN3,8,3,"BASIC-P,S,W"
12 READB,C,D
13 FORT=49152TO57343
14 GET#2,A$:A=ASC(A$+CHR$(0))
15 IFB<>TTHEN18
16 IFA<>CTHENPRINT"BAD SOURCE FILE!":GOTO20
17 A=D:READB,C,D
18 PRINT#3,CHR$(A);
19 NEXT
20 CLOSE3
21 CLOSE2
22 :
23 DATA 55900,131, 90
24 DATA 55901,217,223
25 DATA 57178,170,165
26 DATA 57179,170, 41
27 DATA 57180,170,133
28 DATA 57181,170,112
29 DATA 57182,170,165
30 DATA 57183,170, 40
31 DATA 57184,170,133
32 DATA 57185,170, 41
33 DATA 57186,170,165
34 DATA 57187,170, 39
35 DATA 57188,170,133
36 DATA 57189,170, 40
37 DATA 57190,170,165
38 DATA 57191,170, 38
39 DATA 57192,170,133
40 DATA 57193,170, 39
41 DATA 57194,170,160
42 DATA 57195,170,  1
43 DATA 57196,170,152
44 DATA 57197,170, 74
45 DATA 57198,170,133
46 DATA 57199,170, 38
47 DATA 57200,170, 96
48 DATA    -1, -1, -1

I also found some interesting reference about those early implementations of Microsoft BASIC on the 65xx. The owner of the blog reconstructed one single source code repository for CBM BASIC V1, OSI BASIC, AppleSoft I, KIM-1 BASIC, CBM BASIC V2 (PET), Intellivision Keyboard Component BASIC and MicroTAN BASIC ... and with the exception of OSI BASIC (it only uses a 3-byte mantissa, thus the error can't happen there) all share that bug in the multiplication routine ...!

Kweepa wrote:Checked out the other emulators in VICE.
BASIC 2, 3.5 and 4 suffer from this bug, but it's absent in BASIC 7.

Yes, I saw that, too. The bug is absent in V7, because Commodore effectively disabled the optimization that caused it. Of course, that means that multiplication now is generally slower on the C128.

Even *more* interesting is, that they somehow came to the same conclusions about the carry flag as I did, but they were not quite sure about how to handle it. In the BASIC ROM, the routine is located at $8A55:

Code: Select all

.8A55  D0 04     BNE $8A5B
.8A57  EA        NOP
.8A58  4C 62 89  JMP $8962
.8A5B [...]

... with $8962 again being the short-cut. At $8A57, they had possibly put a SEC before - and then decided to go the save route by disabling the optimization instead (only in one of five cases, the routine is still called over $8A55, all other 4 calls enter at $8A5B). Anyway they had more than enough space in the ROM to do it right, but alas.

Mike · Post by **Mike** » Thu Dec 14, 2017 1:29 pm

Mike wrote:A patch will require the replacement of the BASIC ROM.

In case you've been wondering what became of this, look here.

Kakemoms · Post by **Kakemoms** » Sat Feb 17, 2018 3:35 pm

I am hoping to get this patch (and others) into the SuperVixen. It will be in a copy of Basic in the external cartridge, so not a replacement ROM, but patched during copy to external RAM.

I am wondering if anyone has a list of such patches? I am thinking about a file format to get them loaded and patched during boot, so that anyone finding such problems in the future, will also be able to add their own patch.

What do you think about it?

Mike · Post by **Mike** » Sat Feb 17, 2018 5:02 pm

Regards your question of a feasible file format: you'd most probably be content with a *.prg file that executes the patch by copying the original ROM contents, doing a check whether it works on the original byte values it aims to replace, and putting the replacement bytes there. Much similar to how I proceeded with the file based patch here.

Kakemoms wrote:I am wondering if anyone ha[s] a list of such patches?

From what I know, this is probably the first - and thus far only one - direct patch of the VIC-20 BASIC ROM.

There are other known bugs in the CBM BASIC interpreter, but they are mostly irrelevant or easy to avoid: broken type checks lead to overflow of the string stack in the IF ... THEN clause and POS() function with bogus string arguments, or to crashes as with the statement PRINT 5+"A"+-5 ... then there is also a bug in the line number parse routine, which results in a warm start or crash upon entering a certain range of (invalid!) line numbers. In other cases, it is rather easy to improve the BASIC interpreter by means of a BASIC extension/wedge or USR() function - if you count the slowness of the SQR() function as "bug".

As I wrote earlier in the thread, the multiplication routine is buried too deep in the interpreter, and it is not vectored. A soft replacement would require a duplication of a good deal of ROM code of the expression parser down to the multiplication routine itself, and that fixed shortcut routine. We are talking about way more than 1K code here. And it also only helps BASIC programs: machine code programs that happen to use the arithmetic routines of the interpreter have no other choice than to call the multiply routine in the BASIC ROM directly - exactly because there is no vector pointing there - and thus will be in the dark about a soft loaded fix!

Those are the reasons I made a real job of replacing the BASIC ROM. The other alternative that came to my mind, supplying RAM under ROM, was more like taking a sledgehammer to crack a nut - at least for that single fix.

Kakemoms · Post by **Kakemoms** » Sat Feb 17, 2018 6:02 pm

Mike wrote:Those are the reasons I made a real job of replacing the BASIC ROM. The other alternative that came to my mind, supplying RAM under ROM, was more like taking a sledgehammer to crack a nut - at least for that single fix.

Well, there is no way to get the 65C02 to see the internal ROM, so it will have to be in RAM anyway. Thus a patch is straightforward..

wimoos · Post by **wimoos** » Mon Sep 24, 2018 11:29 am

Mike,

This is just to let you know about this test I did.

I have now used your most recent patch on the Basic image. Starting VICE gives: "VIC20MEM: Error - Warning: Unknown Basic image. Sum: 31364 ($7A84)."

Reverting to the original Basic image:

Regards,

Wim.

Denial

Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Proposal for a patch

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Fun with CBM arithmetics

Re: Adding 0.1 + 0.2 in CBM float