Clock accuracy limits for 12MHz implementation

General discussions about V-USB, our firmware-only implementation of a low speed USB device on Atmel's AVR microcontrollers
Post Reply
cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Clock accuracy limits for 12MHz implementation

Post by cpldcpu » Sat Dec 07, 2013 11:55 pm

The documentation states that the 12MHz version needs a quartz oscillator and only the 12.8MHz version will allow a timing deviation of +-1%. What is the accuracy requirement for the 12MHz version? I tested both on the ATtiny841, which has Atmels newest generation RC oscillator, and it seems that even the 12MHz version works nicely when calibrating the RC-Oscillator from the idle pulses. The 12MHz version would be preferrable, since it is much smaller.

blargg
Rank 3
Rank 3
Posts: 102
Joined: Thu Nov 14, 2013 10:01 pm

Re: Clock accuracy limits for 12MHz implementation

Post by blargg » Sun Dec 08, 2013 1:48 am

I think that the 12.8 MHz and 16.5 MHz versions use real-time re-synchronization with the bit transitions, while the others synchronize only to the first transition and then rely on the clock accuracy, thus making them less tolerant of inaccuracy. Documentation for the clock rates is pretty scattered out from what I've found, so I'm not certain about the above. Them being significantly larger agrees with this.

In my study of the code, the whole-MHz versions only synchronized to within two AVR clocks of the transition (sbis; rjmp), so if you had this synchronize to within one bit, you'd halve the jitter and make it more tolerant of clock error. This is possible by reading the port several times in a row into multiple registers, then checking them and delaying appropriately:

Code: Select all

    ; Wait until USBMINUS is 0
    sbis USBIN, USBMINUS
    rjmp coarse
    sbis USBIN, USBMINUS
    rjmp coarse
    sbis USBIN, USBMINUS
    rjmp coarse
    sbis USBIN, USBMINUS
    rjmp coarse
    ret
   
coarse:
    ; Now we're synchronized with an error less than two clocks,
    ; and USBMINUS is 0. Delay appropriately so that the next in
    ; will never read USBMINUS going more than a cycle after it
    ; goes 1. If we omitted the above coarse synchronization,
    ; we'd need many more consecutive IN instructions below, and
    ; free registers for them.
   
    ;delay...
   
    ; Capture USBMINUS becoming 1 within this four-clock window
    in   r20,USBIN
    in   r21,USBIN
    in   r22,USBIN
    ;fourth clock checked by sbis below
   
    ; Delay 0 extra if USBMINUS became 1 before first IN above.
    ; Delay 1 extra if USBMINUS became 1 before second IN above.
    ; Delay 2 extra if USBMINUS became 1 before third IN above.
    ; Delay 3 extra if USBMINUS became 1 before SBIS below.
    sbis USBIN, USBMINUS
    rjmp .
    sbis r20, USBMINUS
    rjmp .
    sbis r21, USBMINUS
    rjmp .
    sbis r22, USBMINUS
    rjmp .
   
    ; Now we're synchronized to the USB transition with an error of
    ; less than a clock.

The snag is getting enough free registers, and that this synchronization delays many cycles. If you're using the polled approach for USB handling, you could free these registers up easily beforehand. For the interrupt approach, you could have GCC reserve them globally so they don't need to be saved in the ISR.

cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Re: Clock accuracy limits for 12MHz implementation

Post by cpldcpu » Sun Dec 08, 2013 10:07 am

Nice idea, that should double the period until a first bit error can occur. Actually it should not be a problem to free additional registers. Since the first bits of the sync code are repeated it does not matter if you miss one or two bits. The extry time could be used to push more registers.

The size differenz between 12MHz and 12.8MHz is more than 200 bytes. My feeling right now is that it might make more sense to invest the space into protocol level error detection and just use the 12Mhz version at risk.

cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Re: Clock accuracy limits for 12MHz implementation

Post by cpldcpu » Sun Dec 08, 2013 1:05 pm

Amazingly it also works with the 16Mhz version, when tuning the internal RC oscillator from 8Mhz to 16MHz. No CRC errors.

The reasons for this could be the new RC-oscillator revision in the ATtiny841. They are temperature compensated and have a single range calibration value, meaning that they can be tuned with better accuracy than the previous split range ones. The entire 8Bit OSCCAL0 range is roughly 6 to 17MHz. This is equivalent to approximately 43kHz tuning per LSB -> <0.35% error at 12MHz, which is better than the +-1% required for the 12.8MHz and 16.5MHz version. The long term stability may be doubtful, but if OSCCAL0 is recalibrated every couple of seconds, as for example in a bootloader, it may be workable.

blargg
Rank 3
Rank 3
Posts: 102
Joined: Thu Nov 14, 2013 10:01 pm

Re: Clock accuracy limits for 12MHz implementation

Post by blargg » Sat Dec 14, 2013 2:07 am

16MHz osccal'd on an attiny85 seems to work here right out the box. I'm attempting to use the one-clock synchronization code idea from above in the 16MHz code, to make it more robust. I just had an insight that you really only need the two-clock-accuracy synchronization code done twice at a one-clock offset. The first narrows it into a two-clock window, and then you only need a single check in the middle of this window that delays an extra clock if the transition was after the check. The 16MHz code is more difficult due to fractional clocks, so I did a rough sketch of this new idea on the 12MHz code, which has 8 clocks per bit:

Code: Select all

waitFor0:
    sbis    USBIN, USBMINUS
    rjmp    waitFor1
    sbis    USBIN, USBMINUS
    rjmp    waitFor1
    ...
waitFor1:
    nop
    nop
    nop
    nop
    sbis    USBIN, USBMINUS ;[0]
    rjmp    .               ;[1]
found1:
    nop                     ;[2]
    in      r0, USBIN       ;[3] ; reads in center of bit

waitForK's timing:

1->0 might occur just before first SBIS, in which case the SBIS in foundK reads nearly 7 clocks later, one clock before it changes to 1. It doesn't skip the RJMP ., so delays an extra clock. The NOP in found1 thus runs 2 clocks after the 0->1 transition, which occured during the first clock of the RJMP . .

1->0 might occur just after the first SBIS, in which case the SBIS in foundK reads nearly 9 clocks later, one clock after it changes to 1. It skips the RJMP ., so doesn't delay the extra clock. The NOP in found1 thus runs nearly 3 clocks after the 0->1 transition.

So the NOP in found1 runs from 2 to almost 3 clocks after the 0->1 transition. The ideal bit read time is 4 clocks after the transition; this reads from 3 to almost 4 clocks, 3.5 on average. This is from the first detection of the new state after a transition, which is probably delayed slightly due to logic thresholds, so reading a tad early seems better.

This complicates the checking for double-K (0). The old code needed two transitions: coarse 0->1, and find 1->0 unrolled. It then checked for a 0 1.5 bit periods later. If it was a 1, it went back to the unrolled loop. This allowed it to start at any of the three synchronization 1 bits at the beginning.

This new code needs 3 transitions: 0->1 in the coarse loop, 1->0 in the 2-clock unrolled loop, and 0->1 in the 1-clock check. Then it needs to wait 2.5 bit periods to check for a double-zero. If it's a 1, it can't go back and re-synchronize, so it has to wait exactly 8 cycles then check again. It also must catch the first or second 1 synchronization bit; the third is too late, unlike the original code. So this reduces allowable interrupt latency. Maybe if the first 0->1 wait loop were replaced with the unrolled code (unrolled more perhaps), then it'd only need two transitions. We'd need to know the maximum the unrolled code would need to wait for the first 0->1 transition.

Whoever tries this, OSCCAL needs to be adjusted up and down until the original 12MHz code breaks, to see how tolerant it is of variation. Then this new code needs to be tested the same way to confirm that it really is more tolerant of oscillator variance.

blargg
Rank 3
Rank 3
Posts: 102
Joined: Thu Nov 14, 2013 10:01 pm

Re: Clock accuracy limits for 12MHz implementation

Post by blargg » Sat Dec 14, 2013 5:52 am

Well, I worked in this one-clock synchronization on the 12MHz version but it didn't improve the OSCCAL deviation allowed. With the original and this code only 0x46-0x48 worked. Full code linked:

usbdrvasm12.inc

Code: Select all

waitForK1:
    inc     YL
    sbic    USBIN, USBMINUS
    brne    waitForK1        ; just make sure we have ANY timeout
waitForJ:
    sbic    USBIN, USBMINUS
    rjmp    waitForK
    sbic    USBIN, USBMINUS
    rjmp    waitForK
    ...
timeout:
    ...
    rjmp    sofError

waitForK:
    push    YH
    nop
rewaitForK:
    nop
    sbic    USBIN, USBMINUS ;1 [-3]
    rjmp    .               ;1 [-2]
    nop                     ;1 [-1]
    nop                     ;1 [0]
    nop                     ;1 [1]
   
;foundK:
;{4} after falling D- edge, average delay: 4 cycles [we want 4 for center sampling]
;we have 1 bit time for setup purposes, then sample again. Numbers in brackets
;are cycles from center of first sync (double K) bit after the instruction
    nop                     ;1 [2]
    nop                     ;1 [3]
    lds     YL, usbInputBufOffset;2 [5]
    clr     YH                  ;1 [6]
    subi    YL, lo8(-(usbRxBuf));1 [7]
    sbci    YH, hi8(-(usbRxBuf));1 [8]

    sbic    USBIN, USBMINUS ;1 [9] we want two bits K
    rjmp    rewaitForK      ;2 [10]

    push    shift           ;2 [12]
    push    x1              ;2 [14]
    push    x2              ;2 [16]

    in      x1, USBIN       ;1 [17] <-- sample bit 0

The first change was replacing waitForJ with an unrolled loop. This way it synchronizes to within 2 clocks. Then, I replaced the old unrolled waitForK with 8 NOP instructions, since the now-unrolled waitForJ synchronized the same, so 8 clocks later we're just after the K transition. There was one problem; the waitForJ loop can start in the middle of J already having begun, so the unrolled loop here won't synchronize to the edge, it'll just go ahead to waitForK. So I added a loop before waitForJ which waits for K, so that waitForJ will then find a transition.

With the synchronization now moved to waitForJ, and waitForK just NOPs, I could put the final 1-clock synchronization into waitForK. At the right point, it simply checks whether the K transition has occurred, and if not, delays an extra clock.

I made one further change of reworking the foundK code so that it could sample the middle of the second K pair one clock later where is desired. This allowed moving the push YH before foundK, and eliminating a branch.

As for verifying that all the delays are correct, I used a simple counting model. First, we want reads of consecutive bits to be 8 clocks apart. So if we're checking the following code, we count the number of cycles of the instruction that reads the bit and all the ones between it and the next instruction to read.

Code: Select all

    in      r0, USBIN   ; 1
    nop                 ; 1
    push    r1          ; 2
    pop     r1          ; 2
    nop                 ; 1
    nop                 ; 1
    in      r1, USBIN

That totals 8 clocks, so the timing is correct here.

The first timing in the code is

Code: Select all

waitForJ:
    sbic    USBIN, USBMINUS
    rjmp    waitForK
;   in     rXX, USBIN       ; 1
    sbic    USBIN, USBMINUS ; 1
    rjmp    waitForK        ; 2
    ...
waitForK:
    push    YH              ; 2
    nop                     ; 1
    nop                     ; 1
    sbic    USBIN, USBMINUS
    rjmp    .

There are two timing cases.

The first is when J occurs just before the SBIC, then an RJMP to waitForK. For that one, there are only 7 clocks before the next read, so it comes one clock early. In that case, the SBIC will still find the J state (high), and thus execute the rjmp, takng 3 clocks.

The second timing case is when J occurs during the second clock of the SBIC before it. I've put a comment-out IN instruction to show this. In that case, it's as if that IN instruction saw J occur, so there are 8 clocks before the next read at the SBIC. In that case, the SBIC finds the K state (low) and skips the rjmp, taking only 2 clocks.

So this SBIC seems correctly situated, able to detect the two timing cases for the waitForJ unrolled loop.

Next is the timing from the SBIC to the next SBIC that checks in the middle of the second bit:

Code: Select all

    sbic    USBIN, USBMINUS ;1 [-3]
    rjmp    .               ;1 [-2]
    nop                     ;1 [-1]
    nop                     ;1 [0]
    nop                     ;1 [1]
    nop                     ;1 [2]
    nop                     ;1 [3]
    lds     YL, usbInputBufOffset;2 [5]
    clr     YH                  ;1 [6]
    subi    YL, lo8(-(usbRxBuf));1 [7]
    sbci    YH, hi8(-(usbRxBuf));1 [8]

    sbic    USBIN, USBMINUS ;1 [9] we want two bits K

The sbic effectively reads during the first clock of the first K bit. When it reads during the last clock of the previous bit (J), an extra clock of delay is inserted just after.

There are 12 clocks for SBIS through the SBCI. This puts the next SBIC 1.5 bits after the beginning of the first K bit, which is what is desired.

In the case where there wasn't a double K bit,

Code: Select all

    sbic    USBIN, USBMINUS ;1 [9] we want two bits K
    rjmp    rewaitForK      ;2 [10]
    ...
   
rewaitForK:
    nop
    sbic    USBIN, USBMINUS ;1 [-3]

we have 4 clocks for SBIC through NOP, putting the next SBIC right at the beginning of a bit, as desired.

Finally, the case where we do find a double K.

Code: Select all

    sbic    USBIN, USBMINUS ;1 [9] we want two bits K
    rjmp    rewaitForK      ;2 [10]

    push    shift           ;2 [12]
    push    x1              ;2 [14]
    push    x2              ;2 [16]

    in      x1, USBIN       ;1 [17] <-- sample bit 0

There are 8 clocks for SBIC through PUSH x2, putting the IN right in the middle of the next bit.

So all the timing seems to check out. I've tried adding/removing a NOP from just before rewaitForK, in case I had the timing off by one. Again, this is only when using OSCCAL values just outside the three that work.

I wonder whether it's the send timing that's the problem. The original OSCCAL for 16MHz was 0xA2, and 0x47 was the optimal 12MHz value. Assuming roughly linear steps, that's about 0.044MHz/step (the way OSCCAL overlaps around 0x80 means that the step is larger than this). Two steps break it, which is about 0.9% variation.

cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Re: Clock accuracy limits for 12MHz implementation

Post by cpldcpu » Sat Dec 14, 2013 1:34 pm

I wonder whether it's the send timing that's the problem. The original OSCCAL for 16MHz was 0xA2, and 0x47 was the optimal 12MHz value. Assuming roughly linear steps, that's about 0.044MHz/step (the way OSCCAL overlaps around 0x80 means that the step is larger than this). Two steps break it, which is about 0.9% variation


I guess that would be a reasonable assumption. There is not too much that can be done to fix send, other than having a "pll" in between. Maybe you could probe by changing osccal between received and send?

blargg
Rank 3
Rank 3
Posts: 102
Joined: Thu Nov 14, 2013 10:01 pm

Re: Clock accuracy limits for 12MHz implementation

Post by blargg » Sat Dec 14, 2013 8:54 pm

My send reasoning could be wrong, since how could the 12.8MHz/16.5MHz versions work with RC sending? I'd also think that the receiver in the host would re-synchronize on every edge.

Post Reply