Minimal USB implementation
Minimal USB implementation
To understand how to optimize the memory footprint of V-USB, I created a small ATtiny85 based device that controls a single WS2812 RGB LED via USB. This is very similar to the Blink[1] and other devices.
Current functionality
* Enumeration.
* Only SETUP-request can be received.
* Responses are limited to strings from flash memory or zero sized replies.
* All SETUP packets that are not system requests are forwarded to a WS2812 RGB LED on PB0.
I went as far as stripping down all the code from usbdrv.c and integrating it into a single function. I removed all code that was not required for the core functionality and combined some of the remaining functions:
* Removed data section and initiliazed variables "by hand"
* Turned global sram variables into local variable in the main loop.
* Reduced input buffer to single size
* Removed handling of USB reset.
* Used 16 MHz V-USB code instead of 16.5 MHz
* Included assembler implementation of osccal.c
My ultimate goal was to implement the code on a ATtiny 10. Currently this is not possible, because not enough SRAM is left for the stack and not enough flash is left for the 12MHz V-USB implementation.
Current resource usage:
* 1018 bytes Flash
* 28 bytes SRAM
* Uses only regs R16-R31
You can find the code here:
https://github.com/cpldcpu/u-wire
Maybe somebody has an idea how to reduce the SRAM or flash footprint further?
Current functionality
* Enumeration.
* Only SETUP-request can be received.
* Responses are limited to strings from flash memory or zero sized replies.
* All SETUP packets that are not system requests are forwarded to a WS2812 RGB LED on PB0.
I went as far as stripping down all the code from usbdrv.c and integrating it into a single function. I removed all code that was not required for the core functionality and combined some of the remaining functions:
* Removed data section and initiliazed variables "by hand"
* Turned global sram variables into local variable in the main loop.
* Reduced input buffer to single size
* Removed handling of USB reset.
* Used 16 MHz V-USB code instead of 16.5 MHz
* Included assembler implementation of osccal.c
My ultimate goal was to implement the code on a ATtiny 10. Currently this is not possible, because not enough SRAM is left for the stack and not enough flash is left for the 12MHz V-USB implementation.
Current resource usage:
* 1018 bytes Flash
* 28 bytes SRAM
* Uses only regs R16-R31
You can find the code here:
https://github.com/cpldcpu/u-wire
Maybe somebody has an idea how to reduce the SRAM or flash footprint further?
Re: Minimal USB implementation
Very interesting. Your effort should also be useful to someone learning how USB works hands-on, as it clears away all but the essentials.
I've noticed that calling assembly routines frustrates the C optimizer, because it must assume the worst about registers preserved. Here was an attempt I made at inlining the CRC routine and communicating what registers it trashed (though it may do it wrong, as I still find the asm specification syntax confusing):
Also, that's for the optimized CRC routine, so you'll want to convert the slower, shorter one.
I've noticed that calling assembly routines frustrates the C optimizer, because it must assume the worst about registers preserved. Here was an attempt I made at inlining the CRC routine and communicating what registers it trashed (though it may do it wrong, as I still find the asm specification syntax confusing):
Code: Select all
static __attribute__((naked)) inline void usbCrc16Append( volatile unsigned char* data, unsigned char len )
{
asm volatile (
"\n ldi r20, 0xFF"
"\n ldi r21, 0xFF"
"\n rjmp usbCrc16LoopTest"
"\nusbCrc16r18Loop:"
"\n ld r18, Z+"
"\n eor r18, r20 ; r19 is now 'x' in table()"
"\n mov r19, r18 ; compute parity of 'x'"
"\n swap r18"
"\n eor r18, r19"
"\n mov r20, r18"
"\n lsr r18"
"\n lsr r18"
"\n eor r18, r20"
"\n inc r18"
"\n andi r18, 2 ; r18 is now parity(x) << 1"
"\n cp r1, r18 ; c = (r18 != 0), then put in high bit"
"\n ror r19 ; so that after xoring, shifting, and xoring, it gives"
"\n ror r18 ; the desired 0xC0 with r21"
"\n mov r20, r18"
"\n eor r20, r21"
"\n mov r21, r19"
"\n lsr r19"
"\n ror r18"
"\n eor r21, r19"
"\n eor r20, r18"
"\nusbCrc16LoopTest:"
"\n subi %1, 1"
"\n brsh usbCrc16r18Loop"
"\n com r20"
"\n com r21"
"\n st Z+, r20"
"\n st Z, r21"
"\n"
: "=z" (data), "=r" (len)
: "0" (data), "1" (len)
: "memory", "r18", "r19", "r20", "r21" );
}
Also, that's for the optimized CRC routine, so you'll want to convert the slower, shorter one.
Re: Minimal USB implementation
Update: I managed to get it to work on a meager ATtiny10!
>Here was an attempt I made at inlining the CRC routine and communicating what registers it trashed
Excellent idea. In fact I can onto a lot of trouble with registers on the ATiny10. I will look into this.
>Here was an attempt I made at inlining the CRC routine and communicating what registers it trashed
Excellent idea. In fact I can onto a lot of trouble with registers on the ATiny10. I will look into this.
Re: Minimal USB implementation
I tried inlining the crc routine. Unfortunately it only saved two bytes.
Re: Minimal USB implementation
Why don't you include the CRC routine in usbdrvasm.S. Christian Starkjohann did this in his first implementations. That way the compiler will have nothing to do with it and code will be minimal.
Re: Minimal USB implementation
That is where it is right now. The idea was that inlining saves code space.
Re: Minimal USB implementation
You mean call overhead, I see!
Re: Minimal USB implementation
Actually not so much call overhead, but richer information to the optimizer about exactly what registers are modified. It could probably also be written to let the compiler assign all the registers it uses.
Re: Minimal USB implementation
blargg wrote:Actually not so much call overhead, but richer information to the optimizer about exactly what registers are modified. It could probably also be written to let the compiler assign all the registers it uses.
I wish I have not found a way to define variables in the assemblercode without having the compiler initialize them.
Re: Minimal USB implementation
Can you just set them as out variables? That would force the compiler to give them registers and let it know that you're modifying them.
Re: Minimal USB implementation
Whenever I tried that, the compiler would also initialize the variables, which took up more space than it saved.
Re: Minimal USB implementation
At least with avr-gcc 4.5.3, I think I was able to silence the optimizer warnings by initializing a variable with itself, e.g.char c = c;. Too bad there are so many snags to inlining assembly, as otherwise it could be possible to get more optimal code by using just C rather than mixing it with assembly.
Re: Minimal USB implementation
That looks like something that would break arbitrarily with new compiler versions