Page 1 of 1

Minimal USB implementation

Posted: Sat Jan 04, 2014 5:16 pm
by cpldcpu
To understand how to optimize the memory footprint of V-USB, I created a small ATtiny85 based device that controls a single WS2812 RGB LED via USB. This is very similar to the Blink[1] and other devices.

Current functionality
* Enumeration.
* Only SETUP-request can be received.
* Responses are limited to strings from flash memory or zero sized replies.
* All SETUP packets that are not system requests are forwarded to a WS2812 RGB LED on PB0.

I went as far as stripping down all the code from usbdrv.c and integrating it into a single function. I removed all code that was not required for the core functionality and combined some of the remaining functions:

* Removed data section and initiliazed variables "by hand"
* Turned global sram variables into local variable in the main loop.
* Reduced input buffer to single size
* Removed handling of USB reset.
* Used 16 MHz V-USB code instead of 16.5 MHz
* Included assembler implementation of osccal.c

My ultimate goal was to implement the code on a ATtiny 10. Currently this is not possible, because not enough SRAM is left for the stack and not enough flash is left for the 12MHz V-USB implementation.

Current resource usage:
* 1018 bytes Flash
* 28 bytes SRAM
* Uses only regs R16-R31

You can find the code here:
https://github.com/cpldcpu/u-wire

Maybe somebody has an idea how to reduce the SRAM or flash footprint further?

Re: Minimal USB implementation

Posted: Sun Jan 05, 2014 7:14 am
by blargg
Very interesting. Your effort should also be useful to someone learning how USB works hands-on, as it clears away all but the essentials.

I've noticed that calling assembly routines frustrates the C optimizer, because it must assume the worst about registers preserved. Here was an attempt I made at inlining the CRC routine and communicating what registers it trashed (though it may do it wrong, as I still find the asm specification syntax confusing):

Code: Select all

static __attribute__((naked)) inline void usbCrc16Append( volatile unsigned char* data, unsigned char len )
{
    asm volatile (
"\n    ldi     r20, 0xFF"
"\n    ldi     r21, 0xFF"
"\n    rjmp    usbCrc16LoopTest"
"\nusbCrc16r18Loop:"
"\n    ld      r18, Z+"
"\n    eor     r18, r20      ; r19 is now 'x' in table()"
"\n    mov     r19, r18      ; compute parity of 'x'"
"\n    swap    r18"
"\n    eor     r18, r19"
"\n    mov     r20, r18"
"\n    lsr     r18"
"\n    lsr     r18"
"\n    eor     r18, r20"
"\n    inc     r18"
"\n    andi    r18, 2        ; r18 is now parity(x) << 1"
"\n    cp      r1, r18       ; c = (r18 != 0), then put in high bit"
"\n    ror     r19           ; so that after xoring, shifting, and xoring, it gives"
"\n    ror     r18           ; the desired 0xC0 with r21"
"\n    mov     r20, r18"
"\n    eor     r20, r21"
"\n    mov     r21, r19"
"\n    lsr     r19"
"\n    ror     r18"
"\n    eor     r21, r19"
"\n    eor     r20, r18"
"\nusbCrc16LoopTest:"
"\n    subi    %1, 1"
"\n    brsh    usbCrc16r18Loop"
"\n    com     r20"
"\n    com     r21"
"\n    st      Z+, r20"
"\n    st      Z, r21"
"\n"
    : "=z" (data), "=r" (len)
    : "0"  (data), "1"  (len)
    : "memory", "r18", "r19", "r20", "r21" );
}


Also, that's for the optimized CRC routine, so you'll want to convert the slower, shorter one.

Re: Minimal USB implementation

Posted: Tue Jan 14, 2014 7:28 pm
by cpldcpu
Update: I managed to get it to work on a meager ATtiny10!

>Here was an attempt I made at inlining the CRC routine and communicating what registers it trashed
Excellent idea. In fact I can onto a lot of trouble with registers on the ATiny10. I will look into this.

Re: Minimal USB implementation

Posted: Sat Jan 18, 2014 9:52 am
by cpldcpu
I tried inlining the crc routine. Unfortunately it only saved two bytes.

Re: Minimal USB implementation

Posted: Sat Jan 18, 2014 12:44 pm
by stf92
Why don't you include the CRC routine in usbdrvasm.S. Christian Starkjohann did this in his first implementations. That way the compiler will have nothing to do with it and code will be minimal.

Re: Minimal USB implementation

Posted: Sat Jan 18, 2014 12:59 pm
by cpldcpu
That is where it is right now. The idea was that inlining saves code space.

Re: Minimal USB implementation

Posted: Sun Jan 19, 2014 4:30 am
by stf92
You mean call overhead, I see!

Re: Minimal USB implementation

Posted: Sun Jan 19, 2014 7:55 am
by blargg
Actually not so much call overhead, but richer information to the optimizer about exactly what registers are modified. It could probably also be written to let the compiler assign all the registers it uses.

Re: Minimal USB implementation

Posted: Sun Jan 19, 2014 5:14 pm
by cpldcpu
blargg wrote:Actually not so much call overhead, but richer information to the optimizer about exactly what registers are modified. It could probably also be written to let the compiler assign all the registers it uses.


I wish :) I have not found a way to define variables in the assemblercode without having the compiler initialize them.

Re: Minimal USB implementation

Posted: Mon Jan 20, 2014 2:44 am
by blargg
Can you just set them as out variables? That would force the compiler to give them registers and let it know that you're modifying them.

Re: Minimal USB implementation

Posted: Tue Jan 21, 2014 9:52 am
by cpldcpu
Whenever I tried that, the compiler would also initialize the variables, which took up more space than it saved.

Re: Minimal USB implementation

Posted: Wed Jan 22, 2014 2:53 am
by blargg
At least with avr-gcc 4.5.3, I think I was able to silence the optimizer warnings by initializing a variable with itself, e.g.char c = c;. Too bad there are so many snags to inlining assembly, as otherwise it could be possible to get more optimal code by using just C rather than mixing it with assembly.

Re: Minimal USB implementation

Posted: Wed Jan 22, 2014 5:51 pm
by cpldcpu
That looks like something that would break arbitrarily with new compiler versions :)

Re: Minimal USB implementation

Posted: Wed Mar 19, 2014 10:31 pm
by cpldcpu