Minimal USB implementation

General discussions about V-USB, our firmware-only implementation of a low speed USB device on Atmel's AVR microcontrollers
Post Reply
cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Minimal USB implementation

Post by cpldcpu » Sat Jan 04, 2014 5:16 pm

To understand how to optimize the memory footprint of V-USB, I created a small ATtiny85 based device that controls a single WS2812 RGB LED via USB. This is very similar to the Blink[1] and other devices.

Current functionality
* Enumeration.
* Only SETUP-request can be received.
* Responses are limited to strings from flash memory or zero sized replies.
* All SETUP packets that are not system requests are forwarded to a WS2812 RGB LED on PB0.

I went as far as stripping down all the code from usbdrv.c and integrating it into a single function. I removed all code that was not required for the core functionality and combined some of the remaining functions:

* Removed data section and initiliazed variables "by hand"
* Turned global sram variables into local variable in the main loop.
* Reduced input buffer to single size
* Removed handling of USB reset.
* Used 16 MHz V-USB code instead of 16.5 MHz
* Included assembler implementation of osccal.c

My ultimate goal was to implement the code on a ATtiny 10. Currently this is not possible, because not enough SRAM is left for the stack and not enough flash is left for the 12MHz V-USB implementation.

Current resource usage:
* 1018 bytes Flash
* 28 bytes SRAM
* Uses only regs R16-R31

You can find the code here:
https://github.com/cpldcpu/u-wire

Maybe somebody has an idea how to reduce the SRAM or flash footprint further?

blargg
Rank 3
Rank 3
Posts: 102
Joined: Thu Nov 14, 2013 10:01 pm

Re: Minimal USB implementation

Post by blargg » Sun Jan 05, 2014 7:14 am

Very interesting. Your effort should also be useful to someone learning how USB works hands-on, as it clears away all but the essentials.

I've noticed that calling assembly routines frustrates the C optimizer, because it must assume the worst about registers preserved. Here was an attempt I made at inlining the CRC routine and communicating what registers it trashed (though it may do it wrong, as I still find the asm specification syntax confusing):

Code: Select all

static __attribute__((naked)) inline void usbCrc16Append( volatile unsigned char* data, unsigned char len )
{
    asm volatile (
"\n    ldi     r20, 0xFF"
"\n    ldi     r21, 0xFF"
"\n    rjmp    usbCrc16LoopTest"
"\nusbCrc16r18Loop:"
"\n    ld      r18, Z+"
"\n    eor     r18, r20      ; r19 is now 'x' in table()"
"\n    mov     r19, r18      ; compute parity of 'x'"
"\n    swap    r18"
"\n    eor     r18, r19"
"\n    mov     r20, r18"
"\n    lsr     r18"
"\n    lsr     r18"
"\n    eor     r18, r20"
"\n    inc     r18"
"\n    andi    r18, 2        ; r18 is now parity(x) << 1"
"\n    cp      r1, r18       ; c = (r18 != 0), then put in high bit"
"\n    ror     r19           ; so that after xoring, shifting, and xoring, it gives"
"\n    ror     r18           ; the desired 0xC0 with r21"
"\n    mov     r20, r18"
"\n    eor     r20, r21"
"\n    mov     r21, r19"
"\n    lsr     r19"
"\n    ror     r18"
"\n    eor     r21, r19"
"\n    eor     r20, r18"
"\nusbCrc16LoopTest:"
"\n    subi    %1, 1"
"\n    brsh    usbCrc16r18Loop"
"\n    com     r20"
"\n    com     r21"
"\n    st      Z+, r20"
"\n    st      Z, r21"
"\n"
    : "=z" (data), "=r" (len)
    : "0"  (data), "1"  (len)
    : "memory", "r18", "r19", "r20", "r21" );
}


Also, that's for the optimized CRC routine, so you'll want to convert the slower, shorter one.

cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Re: Minimal USB implementation

Post by cpldcpu » Tue Jan 14, 2014 7:28 pm

Update: I managed to get it to work on a meager ATtiny10!

>Here was an attempt I made at inlining the CRC routine and communicating what registers it trashed
Excellent idea. In fact I can onto a lot of trouble with registers on the ATiny10. I will look into this.

cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Re: Minimal USB implementation

Post by cpldcpu » Sat Jan 18, 2014 9:52 am

I tried inlining the crc routine. Unfortunately it only saved two bytes.

stf92
Rank 1
Rank 1
Posts: 21
Joined: Sat Jan 11, 2014 5:09 pm

Re: Minimal USB implementation

Post by stf92 » Sat Jan 18, 2014 12:44 pm

Why don't you include the CRC routine in usbdrvasm.S. Christian Starkjohann did this in his first implementations. That way the compiler will have nothing to do with it and code will be minimal.

cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Re: Minimal USB implementation

Post by cpldcpu » Sat Jan 18, 2014 12:59 pm

That is where it is right now. The idea was that inlining saves code space.

stf92
Rank 1
Rank 1
Posts: 21
Joined: Sat Jan 11, 2014 5:09 pm

Re: Minimal USB implementation

Post by stf92 » Sun Jan 19, 2014 4:30 am

You mean call overhead, I see!

blargg
Rank 3
Rank 3
Posts: 102
Joined: Thu Nov 14, 2013 10:01 pm

Re: Minimal USB implementation

Post by blargg » Sun Jan 19, 2014 7:55 am

Actually not so much call overhead, but richer information to the optimizer about exactly what registers are modified. It could probably also be written to let the compiler assign all the registers it uses.

cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Re: Minimal USB implementation

Post by cpldcpu » Sun Jan 19, 2014 5:14 pm

blargg wrote:Actually not so much call overhead, but richer information to the optimizer about exactly what registers are modified. It could probably also be written to let the compiler assign all the registers it uses.


I wish :) I have not found a way to define variables in the assemblercode without having the compiler initialize them.

blargg
Rank 3
Rank 3
Posts: 102
Joined: Thu Nov 14, 2013 10:01 pm

Re: Minimal USB implementation

Post by blargg » Mon Jan 20, 2014 2:44 am

Can you just set them as out variables? That would force the compiler to give them registers and let it know that you're modifying them.

cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Re: Minimal USB implementation

Post by cpldcpu » Tue Jan 21, 2014 9:52 am

Whenever I tried that, the compiler would also initialize the variables, which took up more space than it saved.

blargg
Rank 3
Rank 3
Posts: 102
Joined: Thu Nov 14, 2013 10:01 pm

Re: Minimal USB implementation

Post by blargg » Wed Jan 22, 2014 2:53 am

At least with avr-gcc 4.5.3, I think I was able to silence the optimizer warnings by initializing a variable with itself, e.g.char c = c;. Too bad there are so many snags to inlining assembly, as otherwise it could be possible to get more optimal code by using just C rather than mixing it with assembly.

cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Re: Minimal USB implementation

Post by cpldcpu » Wed Jan 22, 2014 5:51 pm

That looks like something that would break arbitrarily with new compiler versions :)

cpldcpu
Rank 2
Rank 2
Posts: 44
Joined: Sun Nov 10, 2013 11:26 am

Re: Minimal USB implementation

Post by cpldcpu » Wed Mar 19, 2014 10:31 pm


Post Reply