VIC-I anomalies in emulation
Moderator: Moderators
Ok, now I tested Mikes program here in emulation with the same RAM expansions and addresses as in my previous test - and the result is always 752, regardless of expansion and address.
So now can somebody explain to me the behaviour of my basic test program regarding the address 2000 in 3k expansion RAM? Why is basic slower when poking to 2000 than to 7000?
Björg
So now can somebody explain to me the behaviour of my basic test program regarding the address 2000 in 3k expansion RAM? Why is basic slower when poking to 2000 than to 7000?
Björg
A wild shot: While the VIC-I chip can not use the 3K expansion memory, it still has access to the 16K address range it lies within. Unlike the C64 with bad lines and all that stuff, collissions on the bus are not supposed to happen (?) on the VIC-20, but maybe something is going on?
I just run Mike's latest program on a real PAL VIC-20, using various memory configurations and address ranges and I always get 752, just like you get in emulation. I agree with Mike that the differences you got from the Basic program probably was overhead from evaluating numeric expressions (ADR=7000).
Here is my take on a Basic program:
Since the ML program ran exactly as many ticks regardless of location and expansion, I attribute the different values to how difficult it is for Basic to translate the integers into MFLPT.
I just run Mike's latest program on a real PAL VIC-20, using various memory configurations and address ranges and I always get 752, just like you get in emulation. I agree with Mike that the differences you got from the Basic program probably was overhead from evaluating numeric expressions (ADR=7000).
Here is my take on a Basic program:
Code: Select all
1 S=TI:FORI=0TO255:FORT=7680TO7690:POKET,I:NEXT:NEXT:T=TI
2 PRINT"TIME:";T-S
Code: Select all
MEMORY / ADR \ Result
unexp | 6680 | 649
unexp | 7680 | 651
+3K | 2680 | 647
+3K | 3680 | 648
+3K | 6680 | 649
+3K | 7680 | 651
+16K | 6680 | 649
+16K | 7680 | 652
+16K | 8680 | 651
+16K | 12680 | 684
+16K | 17680 | 684
+16K | 23680 | 679
Anders Carlsson
That idea with the floating point conversion struck me a few minutes ago, but then I thought it can't be that simple numbers will have such an impact.
But when we see the results of Mikes and my program, suddenly this seem to be plausible...
I will make some test without poking, coming back soon
Björg
But when we see the results of Mikes and my program, suddenly this seem to be plausible...
I will make some test without poking, coming back soon
Björg
I even entered Schlowski's original program into my PAL VIC-20 (original). For some reason, it runs faster than Schlowski's VIC, actually the same values as the VICE emulator. I believe Boray in his QBench tests found that VIC-20 and VIC-20CR perform slightly different, even in the same video standard. When it comes to speed of emulation, it seems I have little to complain about, although certainly there are timing issues.
ScV = Schlowski's VIC-20
CaV = Carlsson's VIC-20
I would say that you found an integer (2000) which takes a lot of time to evaluate into floating-point. By moving 100 bytes forward, the program runs exactly as fast in expanded memory as in default memory.
Code: Select all
10 ADR=7000
20 T1=TI
30 FORI=0TO255
40 FORT=0TO9
50 POKEADR+T,I
60 NEXT
70 NEXT
80 T1=TI-T1
90 PRINT"TIME:"T1
CaV = Carlsson's VIC-20
Code: Select all
Mem | Addr | Emu | ScV | CaV
+3K | 7000 | 688 | 693 | 688
+3K | 2000 | 698 | 703 | 698
+3K | 2100 | ??? | ??? | 688
+3K | 2200 | ??? | ??? | 689
Anders Carlsson
This got really funny...
With ADR=7000 571 ticks,
With ADR=2000 567/568 ticks
so here 2000 is faster than 7000...
What have I learned so far? Always know what you are trying to time and have a keen eye on the little differences!
Now I have to find out why both of my VICs are slower than Carlssons.
I have to check with my third one because the first two are two-prong modells and the last one has the din power connector like the C-64 - just a little difference again
Björg
Code: Select all
10 T1=TI
20 ADR=7000
30 FORI=0TO255
40 FORT=0TO9
50 L=ADR+T
60 NEXT
70 NEXT
80 T1=TI-T1
90 PRINT"TIME:"T1
With ADR=2000 567/568 ticks
so here 2000 is faster than 7000...
What have I learned so far? Always know what you are trying to time and have a keen eye on the little differences!
Now I have to find out why both of my VICs are slower than Carlssons.
I have to check with my third one because the first two are two-prong modells and the last one has the din power connector like the C-64 - just a little difference again
Björg
Mine is two-prong. Maybe my PSU delivers a slightly higher voltage?
Your latest program on my VIC, still using 3K memory expansion:
ADR=2000, CaV=563 ticks, CaE=568 ticks
ADR=2200, CaV=560 ticks, CaE=565 ticks
ADR=7000, CaV=566 ticks, CaE=570 ticks
(CaE = VICE 1.16, the version I'm still using of the emulator)
I notice that you read the timer before assigning ADR, which will slow it down a small notch, but nothing substancial.
Another benchmark program, replace line 50:
So, POKE and IF appears to run at the same speed on my VIC and the emulator, while implicit LET (store a variable) is slightly slower in the emu than in real. A bit of pointless benchmarks, but if we want to nail down exactly on which points emulation lacks... I still think the VIC-I anomalies should be of higher priority though.
Your latest program on my VIC, still using 3K memory expansion:
ADR=2000, CaV=563 ticks, CaE=568 ticks
ADR=2200, CaV=560 ticks, CaE=565 ticks
ADR=7000, CaV=566 ticks, CaE=570 ticks
(CaE = VICE 1.16, the version I'm still using of the emulator)
I notice that you read the timer before assigning ADR, which will slow it down a small notch, but nothing substancial.
Another benchmark program, replace line 50:
Code: Select all
50 IFADR+TTHEN
Mem | Addr | CaE | CaV
+3K | 7000 | 510 | 510
+3K | 2000 | 507 | 507
+3K | 2100 | 505 | 505
+3K | 2200 | 505 | 505
Anders Carlsson
No, I think they perform the same, but you can get slightly different values from TI measurements from time to time on the same machine. Try your test a couple of times on the same machine and you will see. This must have to do with interrupts and the exact moment when you start the routine.carlsson wrote:I believe Boray in his QBench tests found that VIC-20 and VIC-20CR perform slightly different, even in the same video standard.
/Anders
PRG Starter - a VICE helper / Vic Software (Boray Gammon, SD2IEC music player, Vic Disk Menu, Tribbles, Mega Omega, How Many 8K etc.)
Also where on the screen you are when you start.
PRG Starter - a VICE helper / Vic Software (Boray Gammon, SD2IEC music player, Vic Disk Menu, Tribbles, Mega Omega, How Many 8K etc.)
- Mike
- Herr VC
- Posts: 4839
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
O.K. back to the VIC anomalies:
I've seen in your screenshot, that the VIC chip can't access the colour RAM as character set (and possible not as screen either). So this seems like unconnected space to the VIC chip.
But on the other hand, it seems like the left half of the characters is (more or less?) static, and more, that it contains the top four bits of the character code (mostly SPACE, i.e. 32 -> '0010').
This program writes 8 columns of characters 0,16,32,... in the middle of the screen. I'd expect a binary code to run down there like in:
If my assumption holds, then the character code could be remnants of the preceding cycle on the bus, provided the 6502 didn't 'discharge' it away (therefore the 'more or less' static). This happens thus:
Bus clock low: VIC chip reads character code from screen.
Bus clock high: 6502 doesn't access data bus (for whatever reason) -> data bus 'retains' level because the lines have a certain capacity
Bus clock low: VIC chip tries to retrieve character data from unconnected space, and reads the data it already read 1µs before ...
Anders, could you test that? And make another screenshot?
Greetings,
Michael
I've seen in your screenshot, that the VIC chip can't access the colour RAM as character set (and possible not as screen either). So this seems like unconnected space to the VIC chip.
But on the other hand, it seems like the left half of the characters is (more or less?) static, and more, that it contains the top four bits of the character code (mostly SPACE, i.e. 32 -> '0010').
Code: Select all
1 FORT=0TO505:POKE4096+T,0:POKE37888+T,0:NEXT
2 POKE36869,5*16+12
3 FORY=0TO15:FORX=-4TO3
4 POKE4096+11+X+16*Y,16*Y
5 NEXT:NEXT
Code: Select all
0000junk0000junk0000junk0000junk0000junk0000junk0000junk0000junk
repeat for 8 scan lines...
0001junk0001junk...
repeat for another 8 scan lines
...
and so on up to
1111junk1111junk...
Bus clock low: VIC chip reads character code from screen.
Bus clock high: 6502 doesn't access data bus (for whatever reason) -> data bus 'retains' level because the lines have a certain capacity
Bus clock low: VIC chip tries to retrieve character data from unconnected space, and reads the data it already read 1µs before ...
Anders, could you test that? And make another screenshot?
Greetings,
Michael
What I didn't point out with my previous screenshot, is that it is not a static display. The garbage scrolls around, because when screen or charset is set at unconnected memory, the latest value on the bus is what appears on screen.
Even in this case, the display flashes around. Here is an example:
POKE 36869,5*16+12
POKE 36869,5+16*12
I tried to come up with an example that gives a static display without halting the computer. Viznut used SYS2 which points to a JAM instruction and freezes the computer. Not so practical if you want to get some use from these "video modes". My theory was that by disabling interrupts and constantly loading a value into the accumulator, a static display would be achieved.
Not quite static, but we're on our way. I believe the JMP causes data to be put out on the bus. If someone can come up with a code snippet of a way to put out a static picture without freezing the computer, I'd be interested. Maybe my hacking skills are too poor to know how to do it.
Even in this case, the display flashes around. Here is an example:
POKE 36869,5*16+12
POKE 36869,5+16*12
I tried to come up with an example that gives a static display without halting the computer. Viznut used SYS2 which points to a JAM instruction and freezes the computer. Not so practical if you want to get some use from these "video modes". My theory was that by disabling interrupts and constantly loading a value into the accumulator, a static display would be achieved.
Code: Select all
1C00 LDA #$5C
1C02 STA $9005
1C05 SEI
1C06 LDA #$01
1C08 JMP $1C06
1 DATA169,92,141,5,144,120,169,1,76,6,28,-1:A=7168
2 READB:IFB>-1THENPOKEA,B:A=A+1:GOTO2
3 SYS7168
4 REM FORA=0TO505:POKE4096+A,0:POKE37888+A,0:NEXT
Not quite static, but we're on our way. I believe the JMP causes data to be put out on the bus. If someone can come up with a code snippet of a way to put out a static picture without freezing the computer, I'd be interested. Maybe my hacking skills are too poor to know how to do it.
Anders Carlsson
- Mike
- Herr VC
- Posts: 4839
- Joined: Wed Dec 01, 2004 1:57 pm
- Location: Munich, Germany
- Occupation: electrical engineer
There are 71 cycles per line on a PAL VIC 20. You could try with:
and 34*2+3=71. The loop shouldn't cross a 256 byte page, since this adds in another cycle.
Now, when reading character data, the VIC chip will either lock on the $A9 of the LDA opcode, or its immediate data. In most cases the JMP instruction will happen in the middle of a scan line, so you'll see a transistion there (odd number of cycles!).
If this works, we need a way to ensure, that:
- the JMP instruction is executed in the border,
- the VIC chip locks onto $A9 while reading the "screen" and
- onto the immediate data while reading the "character data"
If this even works while the program is in expansion RAM, we could provide the VIC chip with different 'loops' for each scanline (the JMP then simply points to the next instruction), and also sync vertically.
That would give us hi-res graphics over the whole screen.
I hope I didn't promise too much.
Michael
Code: Select all
LDA #$55 ; point
STA $9005 ; $9005 to completely unconnected space
SEI
loop LDA #$00 ; 2 cycles
LDA #$01
...
LDA #$xx ; alltogether 34 LDA instructions.
JMP loop ; 3 cycles
Now, when reading character data, the VIC chip will either lock on the $A9 of the LDA opcode, or its immediate data. In most cases the JMP instruction will happen in the middle of a scan line, so you'll see a transistion there (odd number of cycles!).
If this works, we need a way to ensure, that:
- the JMP instruction is executed in the border,
- the VIC chip locks onto $A9 while reading the "screen" and
- onto the immediate data while reading the "character data"
If this even works while the program is in expansion RAM, we could provide the VIC chip with different 'loops' for each scanline (the JMP then simply points to the next instruction), and also sync vertically.
That would give us hi-res graphics over the whole screen.
I hope I didn't promise too much.
Michael
You mean we could remove borders and fill the whole visible area with hires graphics? A "standard" hires screen of e.g. 200x160 or any other format has already been done with conventional methods as you know.Mike wrote:That would give us hi-res graphics over the whole screen.
But it is a good point you made about 71 cycles. I think Marko's timing routines will be easy to use to synchronize the timers to raster, to know that the loop starts on a new raster line. Once there, I'll bang away some code - it doesn't have to be LDA all the way - and report if I make any progress. At last an example of cross-developing where testing immediately requires a real computer.
Anders Carlsson