The circumstances that trigger the bug are a bit complicated, but I'll try to explain.
During multiplication, this routine is called 4 times for each byte of the mantissa of one of the factors. The current mantissa byte is contained in A, and the Z flag has been set accordingly:
Code: Select all
.DA59 D0 03 BNE $DA5E
.DA5B 4C 83 D9 JMP $D983
.DA5E 4A LSR A
.DA5F 09 80 ORA #$80
.DA61 A8 TAY
.DA62 90 19 BCC $DA7D
.DA64 18 CLC
.DA65 A5 29 LDA $29
.DA67 65 6D ADC $6D
.DA69 85 29 STA $29
.DA6B A5 28 LDA $28
.DA6D 65 6C ADC $6C
.DA6F 85 28 STA $28
.DA71 A5 27 LDA $27
.DA73 65 6B ADC $6B
.DA75 85 27 STA $27
.DA77 A5 26 LDA $26
.DA79 65 6A ADC $6A
.DA7B 85 26 STA $26
.DA7D 66 26 ROR $26
.DA7F 66 27 ROR $27
.DA81 66 28 ROR $28
.DA83 66 29 ROR $29
.DA85 66 70 ROR $70
.DA87 98 TYA
.DA88 4A LSR A
.DA89 D0 D6 BNE $DA61
.DA8B 60 RTS
The routine would still work without that optimization, but the branch at $DA62 would always be executed (for 8 times in total), skipping the additions to $26 .. $29 - and all the routine then does is painstakingly move the bytes $26 .. $29 over to $27 .. $29, $70, one bit at a time. Of course this can be done much faster with just four load and store instrúctions. For this another routine at $D983 is 'reused', which normally 'normalizes' the mantissa:
Code: Select all
.D983 A2 25 LDX #$25
.D985 B4 04 LDY $04,X
.D987 84 70 STY $70
.D989 B4 03 LDY $03,X
.D98B 94 04 STY $04,X
.D98D B4 02 LDY $02,X
.D98F 94 03 STY $03,X
.D991 B4 01 LDY $01,X
.D993 94 02 STY $02,X
.D995 A4 68 LDY $68
.D997 94 01 STY $01,X
.D999 69 08 ADC #$08
.D99B 30 E8 BMI $D985
.D99D F0 E6 BEQ $D985
.D99F E9 08 SBC #$08
.D9A1 A8 TAY
.D9A2 A5 70 LDA $70
.D9A4 B0 14 BCS $D9BA ---+ normally,
.D9A6 16 01 ASL $01,X | this branch
.D9A8 90 02 BCC $D9AC | is supposed
.D9AA F6 01 INC $01,X | to happen.
.D9AC 76 01 ROR $01,X |
.D9AE 76 01 ROR $01,X |
.D9B0 76 02 ROR $02,X |
.D9B2 76 03 ROR $03,X |
.D9B4 76 04 ROR $04,X |
.D9B6 6A ROR A |
.D9B7 C8 INY |
.D9B8 D0 EC BNE $D9A6 |
.D9BA 18 CLC <--+
.D9BB 60 RTS
The routine at $DA59 is first called with a non-0 byte - for the examples in the earlier posts this is ultimately the result of adding something around 1..59 divided by 10^9 to the constant 0.75. The routine exits with C=1, which will become important now!
The next mantissa byte is zero, so now the shortcut at $D983 is called. With A=0 and C=1 on entry, the two instructions ADC #$08 and SBC #$08 result in A=0 and C=1 again. At $D9A4, the instruction BCS $D9BA skips the second half of the routine, which is a good thing, however it also executes a CLC at $D9BA, and from there everything goes downhill.
The third mantissa byte is *also* zero, so now the shortcut gets called a second time. This time, however C is 0 (with A again being 0), which results in C=0 and A=255 after SBC #$08. The instructions from $D9A6 .. $D9B8 are now executed, which shift the whole mantissa at least one bit to the right!
For the remaining mantissa byte, the check for a shortcut is skipped. Whatever is finally added to the resulting mantissa, the earlier parts have been inadvertently divided by 2 before, which is exactly what can be seen as false result.