Many thanks to ARod(alejmrm) who ported and posted two "C" version of the following ASM OSCCAL routine. Thanks a million ARod!
PURPOSE:
Discuss methods used to reduce size of typical OSCCAL routine using ARV machine language.
QUOTE
Your code will not properly calibrate the oscillator!
QUOTE
It is clear from this that your code does not do the same operations that the designers of the Butterfly thought were required for accurate calibration!
QUOTE
What value does TMP have when entering the fragment?
QUOTE
My main question also revolves around TMP, indirectly.
QUOTE
Seems to be an interest in the way that you get the OSCCAL value, and I am interested in the details... do you mind to move it to a new thread to talk about it? As some body said, what kind of asumptions are you taking?, etc...
Thanks.. really clever ideas!
PREAMBLE:
When Giorgos announced his latest version of a Butterfly Bootloader. His first complaint about condensing its size was the length of the OSCCAL routine. Since I had to deal with same problem myself doing bootloaders, I thought I might help him out by giving him my much smaller routine. At the time I did not expect the critical reaction and wide interest in the routine.
Due to the many comments messing-up his original thread about the routine and how it works, plus private mail request, I started this thread to answer all those questions without hijacking his original thread. I hope you find this discussion informative as well as entertaining.
CAVEAT LECTOR:
Some of the techniques contained herein are outside normal coding practices and may be offensive to some programmers.
AVR_Admin- 04-13-2006
TITLE: ADVENTURES IN SHRINKING THE OSCCAL ROUTINE
INTRODUCTION: THE HEART OF THE OSCCAL ROUTINE
The heart of the OSCCAL routine is very simple.The basic idea is to set up a counter on the Oscillator, after a fixed amount of time you examine the counter to see if Oscillator is running fast or slow. You then adjust the OSCCAL register, then go back and check again. We repeat this process until the Oscillator is running within an acceptable range.
Because the Oscillator rate can change with temperature etc. I go back and re-calibrate it each time the Butterfly wakes, something the other Bootloaders seem to have missed.
I am going to dig up an older bootloader with original OSSCAL routine and compare it to my current condensed version and try to remember the steps and logic I used to get from one to the other.
CONTENTS AND CODE FRAGMENTS:
For the purpose of this discussion I am only focusing on the actual OSCCAL adjustment loop and not the pre-amble or set-up.
As you examine the code fragments and techniques used, please keep in mind that the goal is to produce the smallest code possible and many traditional and accepted coding practices may go out-the-window.
FINAL WARNING:
Make sure that all the necessary precautions have been taken... you are about to enter the mind of a certified Hack.
AVR_Admin- 04-13-2006
TITLE: ADVENTURES IN SHRINKING THE OSCCAL ROUTINE
Here's a typical OSCCAL routine. This one clipped from George Kolovos' dis-assembly of the Atmel *.hex file.
The first thing that jumps-out-at-me are the two routines at the bottom that adjust the OSCCAL register. To me they stand out like two warts on an otherwise straight-forward routine. So I would focus on those first.
CODE
; ////////////////////////////////////////////////////////////////////////////// ; // Description: An enhanced bootloader for the ATMEL Butterfly unit // ; // Copyright (C) 2006 George Kolovos <gkolovos AT hotmail DOT com> // ; // // ; //////////////////////////////////////////////////////////////////////////////
The first thing that strikes me is that with the exception of the INC & DEC statement the two routines are identical. Perhaps we can combine them somehow and turn two warts into just one.
We'll start by combining the two writes into one by using a programming technique I call "Sharing-a-Piece-of-Arse!"
The next thing I notice is that there are two reads of the OSCCAL register. If we move that read back into the main program BEFORE these rotuines are called we can eliminate another line of code:
Now we've really shrunk those two routines down into one rather small extention of the main program.
AVR_Admin- 04-13-2006
TITLE: ADVENTURES IN SHRINKING THE OSCCAL ROUTINE
Next I'd like to move that write (STS OSCCAL) out of this program "extension" and back to the main routine. There's no savings in terms of memory space for doing this so you'll just have to trust me on this for the moment, I have something in mind for later.
Since the write is the last thing we do before jumping back to the start of our calibration loop, the ideal place to move it would be right at the start.
We're not saving anything by doing this, but I'm half-way to doing something that will. So just wait a bit.
Remember the LDS OSCALL statement that we removed earlier, well a good place for it would be just before we enter our main calibration loop, but we'll need to switch to another unused register so we can hold that value undisturbed. Since tmp1 gets used inside the routine I'll call this new register TMP and make equal to R0.
Remember we haven't saved anything yet, however I am working up to something that requires that I make this move. So after moving the write out and making the other changes, re-adjusting the RJMPs and cleaning up, this is what we have:
CODE
; MAIN OSCCAL CALIBRATION LOOP LDS OSCCAL,TMP ;<===MOVED HERE OSC_TEST: STS OSCCAL,TMP ;<===MOVED HERE <MAIN CALIBRATION LOOP> RET
At this point I think I've answed some of the questions that were quoted at the start of this tread concerning the initial value of TMP prior to entering the calibration loop:
QUOTE
What value does TMP have when entering the fragment?
QUOTE
My main question also revolves around TMP, indirectly.
QUOTE
As some body said, what kind of asumptions are you taking?
Well obviously from the code, the value of TMP prior to entering the main loop is the current value of OSCCAL that we are going to adjust within the routine. So I hope that answers the above questions to everyone's satisfaction.
CODE
LDS OSCCAL,TMP
;MAIN OSCCAL CALIBRATION LOOP OSC_TEST:
I should have posted this one extra line. I didn't expect to be cross-examined on the routine, I was posting it to help someone who knew exactly what TMPs value would be.
AVR_Admin- 04-13-2006
TITLE: ADVENTURES IN SHRINKING THE OSCCAL ROUTINE
Well the two "warts" are down to a single line with a jump back to the main routine. We've taken two warts, combined into a single wart, then turn it into two small blemishes.
I don't think there is much more we can do with them at this point. We're gonna' have to go back and re-examine the main routine. We'll start by looking at where these two routines are called from:
CODE
sts TCCR1B,Zero sbic TIFR1,TOV1 rjmp OSC_Too_Fast ;<== CALL TOO FAST
cpi XL,Byte1(OSC_Hi) ldi tmp1,Byte2(OSC_Hi) cpc XH,tmp1 brsh OSC_Too_Fast ;<== CALL TOO FAST
The sequence is TOO_FAST / too_slow / TOO_FAST.
For reasons that will become apparent shortly I want the sequence to change from 2FAST/2slow/2FAST to 2FAST/2FAST/2slow. We can do this easily by just switching the last two tests around.
If you'd like to see a actual example of the routine we've worked out so far, even though we're only half-way into my next reduction, Giorgos somehow got this incomplete version into his latest Bootloader.
The following is a snippet from his latest source code. I've highlighted the areas of interest for us: (Assume TMP=R0) and notice that the sequence is now 2FAST/2FAST/2slow and it contains all the exact modifications we've made so far.
CODE
lds r0,OSCCA;<=== FETCHING OSCAL PRIOR TO ENTERING LOOP
OSC_Test: sts OSCCAL,r0;<=== SETTING OSCCAL AT START OF LOOP ser tmp1 out TIFR1,tmp1 out TIFR2,tmp1 sbis TIFR2,TOV2 rjmp PC-1 lds XL,TCNT1L lds XH,TCNT1H sts TCNT1H,Zero sts TCNT1L,Zero sbic TIFR1,TOV1 rjmp OSC_Too_Fast;<============== TOO_FAST cpi XL,Byte1(Upper_Limmit) ldi tmp1,Byte2(Upper_Limmit) cpc XH,tmp1 brsh OSC_Too_Fast;<============ TOO_FAST
TITLE: DANGEROUS DAN's SHRINKAGE OF THE OSCCAL ROUTINE
One programming "Trick" when you have an A-or-B option like we have above with the TOO_FAST-or-TOO_SLOW option, is to ASSUME that one of them is true, and later if it turns out not to be true, you re-adjust.
CODE
DANGEROUS DAN PROGRAM TIP: USE ASSUMPTIONS TO SIMPLIFY YOUR CODE
Obviously it's best to assume the case that will be true the most often, but here we don't have that option, but we do have TWO calls to the TOO_FAST routine and only one to the TOO_SLOW so let's choose to ASSUME that our oscillator is always running too fast.
I notice that when the Oscillator is too fast we decrement TMP=R0 so I add this as the first line of our routine. Now when the Oscillator actually turns out to be too fast our assumption is correct so we just jump back to the start of the main loop and totally by-pass the old TOO_FAST routine.
Making this change and removing the TOO_FAST routine we end-up saving another program line and streamline our routine.
While we're at it another trick with the AVRs with so many registers is to pre-define them to values that you might find handy. Just about everyone has a ZERO, but I also define ONE, TWO, THREE, FOUR, V128 and FF=255 because I find they come-in-handy.
I notice that in the 2nd line of the following code segment Giorgos is setting the TMP1 to 255 using the SER command and writing it to the TIFRn ports. If we use my pre-defined FF Register we can knock off another word from this routine.
CODE
DANGEROUS DAN PROGRAM TIP: USE PRE-DEFINED REGISTERS
CODE
OSC_TEST: DEC R0;<======= NEW: ASSUME TOO FAST STS OSCCAL,R0 ;------------------------------ ; ser TMP1 ; out TIFR1,TMP1 ; out TIFR2,TMP1 ;------------------------------ OUT TIFR1,FF ;<======== CHANGED OUT TIFR2,FF ;<======== CHANGED sbis TIFR2,TOV2 rjmp PC-1 lds XL,TCNT1L lds XH,TCNT1H sts TCNT1H,Zero sts TCNT1L,Zero
We're assuming that the oscillator is running fast and DECrementing R0 without permission so if it turns out that we're wrong we need to adjust for this.
The simplest solution is to add one to get us back to where we should be; then we add another one to increment the OSCCAL value. My first reaction is to simply add another INC R0 to the TOO_SLOW routine. But we have a register called TWO=2 so instead of adding any more program steps I simply change the INC R0 to ADD R0,TWO.
Now let's sweep away the discarded code fragment and see where we're at.
CODE
OSC_TEST: DEC R0 STS OSCCAL,R0 OUT TIFR1,FF OUT TIFR2,FF sbis TIFR2,TOV2 rjmp PC-1 lds XL,TCNT1L lds XH,TCNT1H sts TCNT1H,Zero sts TCNT1L,Zero
Earlier I wanted the structure of the main routine to be switched from 2FAST/2slow/2FAST to 2FAST/2FAST/2slow? This next move will explain why.
If you remember we moved the DEC R0 out of our old OSC_TOO_FAST routine and stuck in the main-line to eliminate the entire routine. We can do the same with the ADD R0,TWO in the TOO_SLOW routine and remove it also. This simplifies our code and eliminates another program statement.
The reason I wanted the 2FAST/2FAST/2slow structure was to make this move. If we left it as it was, we'de have to INC R0, then later ADD R0,TWO then SUB R0,TWO.
CODE
OSC_TEST: DEC R0 STS OSCCAL,R0 OUT TIFR1,FF OUT TIFR2,FF sbis TIFR2,TOV2 rjmp PC-1 lds XL,TCNT1L lds XH,TCNT1H sts TCNT1H,Zero sts TCNT1L,Zero
So we've finally removed those two blemishes and have a fairly decent piece of code now.
Two final things I like to do is remove the two lines that shut-off the timers, I deal with that in another part of my code.
The other thing is to get rid of that ugly rjmp PC-1 on line six. No progammer I know ever uses this nomeclature for relative jumps. It's a sure sign that this was a dis-assembly of someone's HEX file.
AVR_Admin- 04-13-2006
TITLE: ADVENTURES IN SHRINKING THE OSCCAL ROUTINE
What we have so far looks like this:
CODE
OSC_TEST: DEC R0 ;SET-UP TIMERS STS OSCCAL,R0 OUT TIFR1,FF OUT TIFR2,FF W6103: SBIS TIFR2,TOV2 ;WAIT RJMP W6103 lds XL,TCNT1L ;READ TIMER lds XH,TCNT1H sts TCNT1H,Zero sts TCNT1L,Zero
SBIC TIFR1,TOV1 ;CHECK TOO FAST RJMP OSC_TEST cpi XL,Byte1(Upper_Limmit) ldi tmp1,Byte2(Upper_Limmit) CPC XH,tmp1 BRSH OSC_TEST
ADD R0,TWO ;CHECK TOO SLOW cpi XL,Byte1(Lower_Limmit) ldi tmp1,Byte2(Lower_Limmit) cpc XH,tmp1 BRLO OSC_TEST RET ;RETURN IF WE'RE JUST RIGHT
PRELIMINARY CONCLUSION:
So far we've taken a fairly large piece of code, straightened it out by removed two ugly routines hanging off the end, and reduced it from about 32 lines to 22, about 2/3rds it's original size.
Even if you're not hell-bent on reducing a routine's size, reducing it's complexity is always a good thing. Bugs are directly proportional to some power of the length and complexity of the code. With Microshaft Windoze being millions of lines of code, is it any wonder it crashes on a regular basis?
With less "going on" in the silicon, there's less chance of anything going awry. Also, trying to debug a "flat, smooth" routine is always far easier and much faster than trying to sort out some lumpy, bent and twisted piece of "spaghetti code" from an unskilled code-smithy.
CODE
BUGS ~= [ SIZE x COMPLEXITY ]**P, where P >1
PROGRAMING TIP: REDUCE SIZE & COMPLEXITY OF CODE
Now that we've removed all the ugliness from the code and reduced it in size, Most people would expect that I stop at this point.
Obvioulsy don't know me very well, because this is the exact point where I start pulling out my bag of "dirty" tricks and try to squeeze the program down even further. I'm not happy until I've beaten a routine down so far it changes from Coal to Diamond.
AVR_Admin- 04-13-2006
TITLE: ADVENTURES IN SHRINKING THE OSCCAL ROUTINE
THE SAGA CONTINUES: REDUCTUM AD ABSURDUM
From the time we are children, we are taught integers with the visual aid of a ruler. So we normally think of a single computer integer byte as running from the value of 0 at one end to "255" at the other end running in a straight line, liek a ruler.
However, your microprocessor thinks of integers in a different way than us humans. To a digital MCU a byte wraps around on itself. The "distance" between 0 and 255 is not 255 but 1, if you subtract one from zero you get 255 and if you add one to 255 you get zero. So it's best to think of unsigned integers as little circles that wrap around on themselves the way a "digital brain" does.
CODE
DAN's PROGRAM TIP: THINK OF UNSIGNED INTEGERS AS LITTLE CIRCLES
In our routine we are adjusting the OSCCAL value which is a single byte that will wrap around on itself. So instead of incrementing it in one direction when we are too slow, and decrmenting it when we are too high, perhaps we can just take it in ONE direction knowing it will eventually wrap-around to the value we need. Sure it will take a little longer at the microprocessor level, but will that translate into any real difference on a human scale?
CODE
OSC_TEST: DEC TMP ;SET-UP TIMERS STS OSCCAL,TMP OUT TIFR1,FF OUT TIFR2,FF W6103: SBIS TIFR2,TOV2 ;WAIT RJMP W6103 lds XL,TCNT1L ;READ TIMER lds XH,TCNT1H sts TCNT1H,ZERO sts TCNT1L,ZERO
SBIC TIFR1,TOV1 ;CHECK TOO FAST? RJMP OSC_TEST cpi XL,Byte1(Upper_Limmit) ldi tmp1,Byte2(Upper_Limmit) CPC XH,tmp1 ;CHEC TOO FAST? BRSH OSC_TEST
ADD TMP,TWO;<=============== CAN WE REMOVE THIS? cpi XL,Byte1(Lower_Limmit) ldi tmp1,Byte2(Lower_Limmit) cpc XH,tmp1 ;CHECK TOO SLOW? BRLO OSC_TEST RET ;OSCCAL IS FINE
AVR_Admin- 04-13-2006
TITLE: ADVENTURES IN SHRINKING THE OSCCAL ROUTINE
My latest Bootloader, the [CRICKET], will "chirp" when reset and "chirp" again after the oscillator's been calibrated, then it waits a while for you to start your upload and if nothing happens it will give a final chirp before going to sleep. I did this because I got tired of havng to press on the joystick each time I wanted to upload a new program.
The time between the two initial "chirps" is the time that it takes to calibrate the oscillator. So I loaded a version of the Bootloader with the ADD TMP,TWO included into a Butterfly and another without it into another and compared results. The one with it missing was slightly slower, but without the audible clues, no one would ever notice. So on a human scale there's not much difference.
By safely removing the ADD TMP, TWO line we can save another program step:
CODE
OSC_TEST: DEC TMP ;SET-UP TIMERS STS OSCCAL,TMP OUT TIFR1,FF OUT TIFR2,FF W6103: SBIS TIFR2,TOV2 ;WAIT RJMP W6103 lds XL,TCNT1L ;READ TIMER lds XH,TCNT1H sts TCNT1H,ZERO sts TCNT1L,ZERO
SBIC TIFR1,TOV1 ;CHECK TOO FAST? RJMP OSC_TEST cpi XL,Byte1(Upper_Limmit) ldi tmp1,Byte2(Upper_Limmit) CPC XH,tmp1 ;CHECK TOO FAST? BRSH OSC_TEST ;<===== ADD TMP,TWO REMOVED! cpi XL,Byte1(Lower_Limmit) ldi tmp1,Byte2(Lower_Limmit) cpc XH,tmp1 ;CHECK TOO SLOW? BRLO OSC_TEST RET ;OSCCAL IS FINE
So the logic of our routine has changed: instead of incrementing or decrementing based on whether the oscillator is fast or slow, now it knows the oscillator is out, and decrements the OSCCAL register. If we happen to be moving it in the "wrong" direction, no problem because, once it hits ZERO it will wrap to 255 and start working downward towards the correct setting. In fact the high bit is not used so it will "wrap" at 127 so the entire process is very fast.
The fact that it takes a tiny bit longer is actually a bonus, because when first started, it's best to let the oscillator "stablize" and the more time that passes, the better. So not only have we removed a program step, we've actually improved over the "standard" routine.
NOTE: For the curious among you, I've posted a copy of the [CRICKET] Bootloader for Butterfly below:
AVR_Admin- 04-13-2006
TITLE: ADVENTURES IN SHRINKING THE OSCCAL ROUTINE
Since the entire logic has changed, We should re-writing the test part of the routine. Now that we're not concerned if we're running fast or slow, but only if we're outside the acceptable range, maybe there's some savings to be had.
Since Upper and Lower Limits are both constants, perhaps we can calculate the difference at assembler time. Then we just read the Ocillator subtract the ideal speed of 6103 and see if difference is within that range.
At this point I really did not expect to see much savings since a compare is almost the same as a subtraction, so instead of testing the Oscillator reading against an Upper_Limit and a Lower_Limit, we're subtracting the Ideal_Limit and comparing the difference to the difference between the Upper_Limit and the Lower_Limit.
Both approaches would take about the same number of program steps, two 16-bit compares is going to be the same as a 16-bit subtraction and a 16-bit compare. So I stop here for a while.
AVR_Admin- 04-13-2006
TITLE: ADVENTURES IN SHRINKING THE OSCCAL ROUTINE
All that leave us with is that first test for the Oscillator running too fast. I wonder if we could do something with these lines of code that check if out timer has overflowed.
CODE
SBIC TIFR1,TOV1 RJMP OSC_TEST
I've seen OSCCAL routines with 6103 +/- 100uS but most seem to use +/- 50uS and I use a "tighter" +/- 40uS.
This means that our "correct" range is 80uS long. What are the chances that when the timer overflows that it will co-incidentally fall within this range and give us a "False Positive?"
Well based on pure randomness it would be:
CODE
Probability = Correct_Range/Total_Range x 100% Probability = 80/65,536 x 100% Probability = 0.12%
One in a thousand...Hmm, not too bad, however, the real probability is much, much lower than this.
There's a small chance that the Oscillator can be so fast that it's outside our range test and overflows to 10. There's an even smaller chance that it will be over by 100. There's an even micro-chance that it will be out by 1000.
The probability that the oscillator could be that far out, by 6100, AND still fall within my small range to give a false positive are slim-to-none. We can safely eliminate this line from the code and save ourselves two more program lines.
cpi XL,Byte1(Lower_Limmit) ldi tmp1,Byte2(Lower_Limmit) cpc XH,tmp1 ;OCILLATOR OUT? BRLO OSC_TEST RET ;OSCCAL IS FINE
AVR_Admin- 04-13-2006
TITLE: ADVENTURES IN SHRINKING THE OSCCAL ROUTINE
Then it dawned on me...
The reason that re-writing the comparision section wasn't a great idea was because a 16-bit subtraction and a 16-bit compare are essentially the same as two 16-bit compares.
However, from the calculations I just made I realize that my "correct range" is only 80 and that will fit into a byte, so that translates into a 16x8 bit compare not a 16x16. This may save us another line of code:
So the "new" concept was to subtract the correct range from our readings and check if results fell within our range:
CODE
TST_OSC: DEC TMP ;SET-UP TIMERS STS OSCCAL,TMP OUT TIFR1,FF OUT TIFR2,FF W6103: SBIS TIFR2,TOV2;WAIT RJMP W6103 lds XL,TCNT1L ;READ TIMER lds XH,TCNT1H sts TCNT1H,ZERO sts TCNT1L,ZERO
The above code looks great: nice, small, then I realize if clock reading is less than 6,103 - 40 then I've got to deal with a "negative" number. I like to avoid "signed" integers whenever I can because they sometimes have a habit of coming back to bite-you-in-the-butt!
Forumer™ is Voted #1 Free Forum Hosting provider
Build your own community today with the largest message board hosting company.