VIC-I anomalies in emulation

Schlowski · Post by **Schlowski** » Wed Jan 18, 2006 8:12 am

Ok, now I tested Mikes program here in emulation with the same RAM expansions and addresses as in my previous test - and the result is always 752, regardless of expansion and address.

So now can somebody explain to me the behaviour of my basic test program regarding the address 2000 in 3k expansion RAM? Why is basic slower when poking to 2000 than to 7000?

Björg

carlsson · Post by **carlsson** » Wed Jan 18, 2006 8:49 am

A wild shot: While the VIC-I chip can not use the 3K expansion memory, it still has access to the 16K address range it lies within. Unlike the C64 with bad lines and all that stuff, collissions on the bus are not supposed to happen (?) on the VIC-20, but maybe something is going on?

I just run Mike's latest program on a real PAL VIC-20, using various memory configurations and address ranges and I always get 752, just like you get in emulation. I agree with Mike that the differences you got from the Basic program probably was overhead from evaluating numeric expressions (ADR=7000).

Here is my take on a Basic program:

Code: Select all

1 S=TI:FORI=0TO255:FORT=7680TO7690:POKET,I:NEXT:NEXT:T=TI
2 PRINT"TIME:";T-S

Code: Select all

MEMORY / ADR \ Result
unexp |  6680 | 649
unexp |  7680 | 651
+3K   |  2680 | 647
+3K   |  3680 | 648
+3K   |  6680 | 649
+3K   |  7680 | 651
+16K  |  6680 | 649
+16K  |  7680 | 652
+16K  |  8680 | 651
+16K  | 12680 | 684
+16K  | 17680 | 684
+16K  | 23680 | 679

Since the ML program ran exactly as many ticks regardless of location and expansion, I attribute the different values to how difficult it is for Basic to translate the integers into MFLPT.

Schlowski · Post by **Schlowski** » Wed Jan 18, 2006 9:00 am

That idea with the floating point conversion struck me a few minutes ago, but then I thought it can't be that simple numbers will have such an impact.

But when we see the results of Mikes and my program, suddenly this seem to be plausible...

I will make some test without poking, coming back soon

Björg

carlsson · Post by **carlsson** » Wed Jan 18, 2006 9:01 am

I even entered Schlowski's original program into my PAL VIC-20 (original). For some reason, it runs faster than Schlowski's VIC, actually the same values as the VICE emulator. I believe Boray in his QBench tests found that VIC-20 and VIC-20CR perform slightly different, even in the same video standard. When it comes to speed of emulation, it seems I have little to complain about, although certainly there are timing issues.

Code: Select all

10 ADR=7000 
20 T1=TI 
30 FORI=0TO255 
40 FORT=0TO9 
50 POKEADR+T,I 
60 NEXT 
70 NEXT 
80 T1=TI-T1 
90 PRINT"TIME:"T1

ScV = Schlowski's VIC-20
CaV = Carlsson's VIC-20

Code: Select all

Mem | Addr | Emu | ScV | CaV
+3K | 7000 | 688 | 693 | 688
+3K | 2000 | 698 | 703 | 698
+3K | 2100 | ??? | ??? | 688
+3K | 2200 | ??? | ??? | 689

I would say that you found an integer (2000) which takes a lot of time to evaluate into floating-point. By moving 100 bytes forward, the program runs exactly as fast in expanded memory as in default memory.

Schlowski · Post by **Schlowski** » Wed Jan 18, 2006 9:07 am

This got really funny...

Code: Select all

10 T1=TI
20 ADR=7000
30 FORI=0TO255
40 FORT=0TO9
50 L=ADR+T
60 NEXT
70 NEXT
80 T1=TI-T1
90 PRINT"TIME:"T1

With ADR=7000 571 ticks,
With ADR=2000 567/568 ticks

so here 2000 is faster than 7000...

What have I learned so far? Always know what you are trying to time and have a keen eye on the little differences!

Now I have to find out why both of my VICs are slower than Carlssons.
I have to check with my third one because the first two are two-prong modells and the last one has the din power connector like the C-64 - just a little difference again

Björg

Schlowski · Post by **Schlowski** » Wed Jan 18, 2006 9:09 am

Just to mention, there must be anything else going on with these numbers internally computation-wise since the conversion from string "2000" to floating point value 2000 is done only once at program start and not in the loop. And this should not take 10/60 seconds, even on our beloved VIC...

carlsson · Post by **carlsson** » Wed Jan 18, 2006 9:22 am

Mine is two-prong. Maybe my PSU delivers a slightly higher voltage?

Your latest program on my VIC, still using 3K memory expansion:

ADR=2000, CaV=563 ticks, CaE=568 ticks
ADR=2200, CaV=560 ticks, CaE=565 ticks
ADR=7000, CaV=566 ticks, CaE=570 ticks

(CaE = VICE 1.16, the version I'm still using of the emulator)

I notice that you read the timer before assigning ADR, which will slow it down a small notch, but nothing substancial.

Another benchmark program, replace line 50:

Code: Select all

50 IFADR+TTHEN

Mem | Addr | CaE | CaV
+3K | 7000 | 510 | 510
+3K | 2000 | 507 | 507
+3K | 2100 | 505 | 505
+3K | 2200 | 505 | 505

So, POKE and IF appears to run at the same speed on my VIC and the emulator, while implicit LET (store a variable) is slightly slower in the emu than in real. A bit of pointless benchmarks, but if we want to nail down exactly on which points emulation lacks... I still think the VIC-I anomalies should be of higher priority though.

Schlowski · Post by **Schlowski** » Wed Jan 18, 2006 9:44 am

I totally agree with you, VIC-I anomalies should have higher priority.

As we can see, even real VICs differ in their timings so that different timings in emulation only be like another VIC model

Björg

Boray · Post by **Boray** » Wed Jan 18, 2006 11:05 am

carlsson wrote:I believe Boray in his QBench tests found that VIC-20 and VIC-20CR perform slightly different, even in the same video standard.

No, I think they perform the same, but you can get slightly different values from TI measurements from time to time on the same machine. Try your test a couple of times on the same machine and you will see. This must have to do with interrupts and the exact moment when you start the routine.

/Anders

Boray · Post by **Boray** » Wed Jan 18, 2006 11:10 am

Also where on the screen you are when you start.

Mike · Post by **Mike** » Wed Jan 18, 2006 11:28 am

O.K. back to the VIC anomalies:

I've seen in your screenshot, that the VIC chip can't access the colour RAM as character set (and possible not as screen either). So this seems like unconnected space to the VIC chip.

But on the other hand, it seems like the left half of the characters is (more or less?) static, and more, that it contains the top four bits of the character code (mostly SPACE, i.e. 32 -> '0010').

Code: Select all

1 FORT=0TO505:POKE4096+T,0:POKE37888+T,0:NEXT
2 POKE36869,5*16+12
3 FORY=0TO15:FORX=-4TO3
4 POKE4096+11+X+16*Y,16*Y
5 NEXT:NEXT

This program writes 8 columns of characters 0,16,32,... in the middle of the screen. I'd expect a binary code to run down there like in:

Code: Select all

0000junk0000junk0000junk0000junk0000junk0000junk0000junk0000junk
repeat for 8 scan lines...
0001junk0001junk...
repeat for another 8 scan lines
...
and so on up to
1111junk1111junk...

If my assumption holds, then the character code could be remnants of the preceding cycle on the bus, provided the 6502 didn't 'discharge' it away (therefore the 'more or less' static). This happens thus:

Bus clock low: VIC chip reads character code from screen.
Bus clock high: 6502 doesn't access data bus (for whatever reason) -> data bus 'retains' level because the lines have a certain capacity
Bus clock low: VIC chip tries to retrieve character data from unconnected space, and reads the data it already read 1µs before ...

Anders, could you test that? And make another screenshot?

Greetings,

Michael

carlsson · Post by **carlsson** » Wed Jan 18, 2006 5:08 pm

What I didn't point out with my previous screenshot, is that it is not a static display. The garbage scrolls around, because when screen or charset is set at unconnected memory, the latest value on the bus is what appears on screen.

Even in this case, the display flashes around. Here is an example:

POKE 36869,5*16+12

POKE 36869,5+16*12

I tried to come up with an example that gives a static display without halting the computer. Viznut used SYS2 which points to a JAM instruction and freezes the computer. Not so practical if you want to get some use from these "video modes". My theory was that by disabling interrupts and constantly loading a value into the accumulator, a static display would be achieved.

Code: Select all

1C00 LDA #$5C
1C02 STA $9005
1C05 SEI
1C06 LDA #$01
1C08 JMP $1C06

1 DATA169,92,141,5,144,120,169,1,76,6,28,-1:A=7168
2 READB:IFB>-1THENPOKEA,B:A=A+1:GOTO2
3 SYS7168
4 REM FORA=0TO505:POKE4096+A,0:POKE37888+A,0:NEXT

Not quite static, but we're on our way. I believe the JMP causes data to be put out on the bus. If someone can come up with a code snippet of a way to put out a static picture without freezing the computer, I'd be interested. Maybe my hacking skills are too poor to know how to do it.

Mike · Post by **Mike** » Thu Jan 19, 2006 1:49 am

There are 71 cycles per line on a PAL VIC 20. You could try with:

Code: Select all

     LDA #$55       ; point
     STA $9005      ; $9005 to completely unconnected space
     SEI
loop LDA #$00       ; 2 cycles
     LDA #$01       
     ...
     LDA #$xx       ; alltogether 34 LDA instructions.
     JMP loop       ; 3 cycles

and 34*2+3=71. The loop shouldn't cross a 256 byte page, since this adds in another cycle.

Now, when reading character data, the VIC chip will either lock on the $A9 of the LDA opcode, or its immediate data. In most cases the JMP instruction will happen in the middle of a scan line, so you'll see a transistion there (odd number of cycles!).

If this works, we need a way to ensure, that:

- the JMP instruction is executed in the border,
- the VIC chip locks onto $A9 while reading the "screen" and
- onto the immediate data while reading the "character data"

If this even works while the program is in expansion RAM, we could provide the VIC chip with different 'loops' for each scanline (the JMP then simply points to the next instruction), and also sync vertically.

That would give us hi-res graphics over the whole screen.

I hope I didn't promise too much.

Michael

Schlowski · Post by **Schlowski** » Thu Jan 19, 2006 2:42 am

Wow, this sounds very interesting.
I never got really into these timing things like rasters and the sort, but I always found it very addictive.
For sure I will follow these discussions and see what I can learn here!

Björg

carlsson · Post by **carlsson** » Thu Jan 19, 2006 9:51 am

Mike wrote:That would give us hi-res graphics over the whole screen.

You mean we could remove borders and fill the whole visible area with hires graphics? A "standard" hires screen of e.g. 200x160 or any other format has already been done with conventional methods as you know.

But it is a good point you made about 71 cycles. I think Marko's timing routines will be easy to use to synchronize the timers to raster, to know that the loop starts on a new raster line. Once there, I'll bang away some code - it doesn't have to be LDA all the way - and report if I make any progress. At last an example of cross-developing where testing immediately requires a real computer.