DrVeryEvil wrote:
I am still curious why the bit instruction is needed after .Y is loaded with 9. There is no need to waste 3 cycles at that point, is there?
The answer lies in the bit of code you left out, namely before the sync loop you posted:
At first, the start of a certain line is waited for in the way tokra already described: wait for (double-)line X-2, then wait for (double-)line X.
The raster register changes at one certain point during horizontal retrace. As long as the waited-for value doesn't appear, the wait loop has a jitter of 7 cycles: 4 because of the CMP instruction, 3 because of the branch when it is executed. The CMP instruction reads the register in its fourth cycle and compares it directly; on success the not executed branch needs 2 cycles. Now, depending on
all of the preceding code ...
... you may be 'lucky', and just are spot on the transition (in 'cycle' 0), then BNE executes for cycles 1 and 2, and the instruction after BNE starts execution in cycle 3.
... or you have 'bad luck', and CMP just sees the value of the preceding (double-)line for the last time it's there (i.e., in 'cycle' -1) - then the loop executes BNE once in cycles 0, 1 and 2; another CMP (which succeeds) in cycles 3, 4, 5 and 6; and finally the non-executed BNE in cycles 7 and 8. Thus, the instruction after BNE starts in cycle 9.
Thats means, with the possible positions 3, 4, 5, 6, 7, 8 and 9 there are 7 different positions, where the simple wait loop can come out, and for this reason, this still isn't stable.
Now, the sync loop works by wasting 129 cycles per double-line (for NTSC), when CMP $9004 returns the value of the next double-line, and wasting 130 cycles per double-line when both LDX and CMP instructions return the same value. In the former case, the start of the loop clocks in one cycle earlier, and thus 'drifts' to the left. In the latter case, that iteration and all following ones are kept in lock with the raster beam.
The BIT instruction now makes sure this final sync loop doesn't already start too early with possibly matching LDX/CMP values! Ideally, the LDX load fetch and CMP compare fetch should be 129 cycles apart. But this isn't the case here, so the sync could report a false positive, when the LDX is little bit too far left. Rather
a little more time is wasted (but not too much), to ensure there can't be a false positive. Then, within 9 iterations, the sync loop locks in.
...
The technique described here is mainly used to define a exact position to start the timer.
The interrupt processing itself requires another compensation, as it is only started when the current instruction has finished, which will introduce another jitter of up to 7 cycles. The interrupt service routine has then to execute a variable delay to counteract that jitter and have the rest of the ISR execute (once again) in sync with the raster beam. The low-byte of the timer is read to derive that variable delay.
Please take a look at the following threads for examples:
VIC 20 in Black and White mode and
** New Frontiers in VIC-Hires-Graphics, Part 10. Actually, I use another technique for the initial sync, which uses a binary decision tree and syncs in faster, but needs more code.