Well, I worked in this one-clock synchronization on the 12MHz version but it didn't improve the OSCCAL deviation allowed. With the original and this code only 0x46-0x48 worked. Full code linked:
usbdrvasm12.incCode: Select all
waitForK1:
inc YL
sbic USBIN, USBMINUS
brne waitForK1 ; just make sure we have ANY timeout
waitForJ:
sbic USBIN, USBMINUS
rjmp waitForK
sbic USBIN, USBMINUS
rjmp waitForK
...
timeout:
...
rjmp sofError
waitForK:
push YH
nop
rewaitForK:
nop
sbic USBIN, USBMINUS ;1 [-3]
rjmp . ;1 [-2]
nop ;1 [-1]
nop ;1 [0]
nop ;1 [1]
;foundK:
;{4} after falling D- edge, average delay: 4 cycles [we want 4 for center sampling]
;we have 1 bit time for setup purposes, then sample again. Numbers in brackets
;are cycles from center of first sync (double K) bit after the instruction
nop ;1 [2]
nop ;1 [3]
lds YL, usbInputBufOffset;2 [5]
clr YH ;1 [6]
subi YL, lo8(-(usbRxBuf));1 [7]
sbci YH, hi8(-(usbRxBuf));1 [8]
sbic USBIN, USBMINUS ;1 [9] we want two bits K
rjmp rewaitForK ;2 [10]
push shift ;2 [12]
push x1 ;2 [14]
push x2 ;2 [16]
in x1, USBIN ;1 [17] <-- sample bit 0
The first change was replacing waitForJ with an unrolled loop. This way it synchronizes to within 2 clocks. Then, I replaced the old unrolled waitForK with 8 NOP instructions, since the now-unrolled waitForJ synchronized the same, so 8 clocks later we're just after the K transition. There was one problem; the waitForJ loop can start in the middle of J already having begun, so the unrolled loop here won't synchronize to the edge, it'll just go ahead to waitForK. So I added a loop before waitForJ which waits for K, so that waitForJ will then find a transition.
With the synchronization now moved to waitForJ, and waitForK just NOPs, I could put the final 1-clock synchronization into waitForK. At the right point, it simply checks whether the K transition has occurred, and if not, delays an extra clock.
I made one further change of reworking the foundK code so that it could sample the middle of the second K pair one clock later where is desired. This allowed moving the push YH before foundK, and eliminating a branch.
As for verifying that all the delays are correct, I used a simple counting model. First, we want reads of consecutive bits to be 8 clocks apart. So if we're checking the following code, we count the number of cycles of the instruction that reads the bit and all the ones between it and the next instruction to read.
Code: Select all
in r0, USBIN ; 1
nop ; 1
push r1 ; 2
pop r1 ; 2
nop ; 1
nop ; 1
in r1, USBIN
That totals 8 clocks, so the timing is correct here.
The first timing in the code is
Code: Select all
waitForJ:
sbic USBIN, USBMINUS
rjmp waitForK
; in rXX, USBIN ; 1
sbic USBIN, USBMINUS ; 1
rjmp waitForK ; 2
...
waitForK:
push YH ; 2
nop ; 1
nop ; 1
sbic USBIN, USBMINUS
rjmp .
There are two timing cases.
The first is when J occurs just before the SBIC, then an RJMP to waitForK. For that one, there are only 7 clocks before the next read, so it comes one clock early. In that case, the SBIC will still find the J state (high), and thus execute the rjmp, takng 3 clocks.
The second timing case is when J occurs during the second clock of the SBIC before it. I've put a comment-out IN instruction to show this. In that case, it's as if that IN instruction saw J occur, so there are 8 clocks before the next read at the SBIC. In that case, the SBIC finds the K state (low) and skips the rjmp, taking only 2 clocks.
So this SBIC seems correctly situated, able to detect the two timing cases for the waitForJ unrolled loop.
Next is the timing from the SBIC to the next SBIC that checks in the middle of the second bit:
Code: Select all
sbic USBIN, USBMINUS ;1 [-3]
rjmp . ;1 [-2]
nop ;1 [-1]
nop ;1 [0]
nop ;1 [1]
nop ;1 [2]
nop ;1 [3]
lds YL, usbInputBufOffset;2 [5]
clr YH ;1 [6]
subi YL, lo8(-(usbRxBuf));1 [7]
sbci YH, hi8(-(usbRxBuf));1 [8]
sbic USBIN, USBMINUS ;1 [9] we want two bits K
The sbic effectively reads during the first clock of the first K bit. When it reads during the last clock of the previous bit (J), an extra clock of delay is inserted just after.
There are 12 clocks for SBIS through the SBCI. This puts the next SBIC 1.5 bits after the beginning of the first K bit, which is what is desired.
In the case where there wasn't a double K bit,
Code: Select all
sbic USBIN, USBMINUS ;1 [9] we want two bits K
rjmp rewaitForK ;2 [10]
...
rewaitForK:
nop
sbic USBIN, USBMINUS ;1 [-3]
we have 4 clocks for SBIC through NOP, putting the next SBIC right at the beginning of a bit, as desired.
Finally, the case where we do find a double K.
Code: Select all
sbic USBIN, USBMINUS ;1 [9] we want two bits K
rjmp rewaitForK ;2 [10]
push shift ;2 [12]
push x1 ;2 [14]
push x2 ;2 [16]
in x1, USBIN ;1 [17] <-- sample bit 0
There are 8 clocks for SBIC through PUSH x2, putting the IN right in the middle of the next bit.
So all the timing seems to check out. I've tried adding/removing a NOP from just before rewaitForK, in case I had the timing off by one. Again, this is only when using OSCCAL values just outside the three that work.
I wonder whether it's the send timing that's the problem. The original OSCCAL for 16MHz was 0xA2, and 0x47 was the optimal 12MHz value. Assuming roughly linear steps, that's about 0.044MHz/step (the way OSCCAL overlaps around 0x80 means that the step is larger than this). Two steps break it, which is about 0.9% variation.