Full Version : ADVENTURES IN DIVISION BY 10
avr >>COMPUTER MATH >>ADVENTURES IN DIVISION BY 10


AVR_Admin- 04-13-2006
TITLE: DIRTY DAN's DIRTY MATH TRICKS

PURPOSE:

On Going Tutorial on Math Tricks in AVR Native Machine Language for Beginners.

PREAMBLE:

While trying to help someone in another FORUM with a Division-by-60 routine to converting seconds to minutes, someone made the remark:

QUOTE

What RetroDan has is a quick and dirty approximation!


I think it was meant as an insult, but it gave me a great name for the type of binary routines I love: Dirty Math!

Quick & Dirty Routines that are small and super fast.

CAVEAT LECTOR:

If you don't like taking chances and experimenting and don't really care about how big your programs are, or how fast they excecute then this is not a place for you. May I recommend the Atmel Appnotes #200, 201 and 204 that are chock full great standard math routines for the AVR microcontrollers.

AVR_Admin- 04-13-2006
TITLE: DIRTY DAN's MATH TIPS

INTRODUCTION:

Here are a few quick tips before we start:

CODE
DIRTY DAN'S TIP #1: ALWAYS USE A POWER OF TWO


If you are writing any program or routine, even if the savings are not obvious at first, pick numbers that are powers-of-two. Somewhere down the line you'll probably be glad you did. Make it a habit and your microcontroller will thank you too.

For example: if you need to take an average, don't take 10 readings and divide by 10, take 8 or 16. If you're setting up a table, don't make entries 10 bytes long, make them 8 or 16. It will greatly simplify things later. There's nothing magical about powers-of-ten to a microprocessor. They're set up for powers-of-two and the number of fingers we have means absolutely nothing to them.


CODE
DIRTY DAN'S TIP #2: AVOID DIVISION AT ALL COSTS!


While Addition, Subtraction and Multiplication can be done in just a few clock cycles, most division involves looping and uses a lot of processor time. Division is normally done by repeated subtractions, the "standard" 16x16 division routine from the Atmel Appnotes averages over 250 clocks cycles compare that to just 9 for a 16x16 multiply and just 2 for a 16x16 Addition.

One problem with high-level languages is that this dirty little fact is kept hidden from the programer. So while you're happily writing code that does Bilinear Interpolation or Cubic Splines, your processor is swearing at you under his breath everytime you ask for a division. You're wondering why things are grinding to a halt with smoke drifting up from the PCB.

CODE
DIRTY DAN'S TIP #3: IF YOU MUST DIVIDE, DO IT BY POWERS OF TWO!


Division by a power of two can be easily and quick accoplished with just a few right-shifts. Each time you shift your bits to the right you are essentially dividing your number in half. So to divide by four you just shift-right twice.

CODE
DIRTY DAN'S TIP #4: IF YOU MUST DIVIDE, DO IT ONLY ONCE!


If your equation has multiple divisions involved, try and re-arrange your equation so you only divide once.

For example: if you need to divide by 10 and later divide by 12, perhaps you can re-arrange things and just divide once by 120 and be done with it.

AVR_Admin- 04-13-2006
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN

PREAMBLE:

I've been working on my own driver routines for the LCD Display on the Butterfly (same as STK502) that are super small and fast. When it came time to write a little routine that displays a number in decimal, rather than Hex, I ran into my old nemesis DIVISION!

To be more precises, Division-by-Ten. Since we humans are so fond of Division-by-Ten, I thought it would make an excellent starting point for our examination of my Dirty Math Tricks.

For this routine, all we want is to take a single byte value and display it on the LCD Display in Decimal Number Form. So we need to take a number with a maximum value of $FF and break it into three: 2 / 5 / 5.

The way most of us humans would do this is to first divide by it by 100 and then by 10 and then keep the remainder;

CODE

255 / 100 = 2     (remainder of 55)
55  / 10  = 5
remainder = 5


Oh no, that means we need to have two division routines! One for division by 100 and another by ten. Mybe we should just use a general division routine and change the divisor from 100 to 10 rather than have two routines. So I go and check the Atmel Appnote 200 looking for a general 8x8 unsigned division routine and here is what I found:

CODE

;***************************************************************************
;* "div8u" - 8/8 Bit Unsigned Division
;*
;* This subroutine divides the two register variables "dd8u" (dividend) and
;* "dv8u" (divisor). The result is placed in "dres8u" and the remainder in
;* "drem8u".
;*  
;* Number of words :14
;* Number of cycles :97
;* Low registers used :1 (drem8u)
;* High registers used  :3 (dres8u/dd8u,dv8u,dcnt8u)
;***************************************************************************

.def drem8u =r15;remainder
.def dres8u =r16;result
.def dd8u =r16;dividend
.def dv8u =r17;divisor
.def dcnt8u =r18;loop counter

div8u: sub drem8u,drem8u;clear remainder and carry
ldi dcnt8u,9;init loop counter
d8u_1: rol dd8u;shift left dividend
dec dcnt8u;decrement counter
brne d8u_2;if done
ret  ;    return
d8u_2: rol drem8u;shift dividend into remainder
sub drem8u,dv8u;remainder = remainder - divisor
brcc d8u_3;if result negative
add drem8u,dv8u;    restore remainder
clc  ;    clear carry to be shifted into result
rjmp d8u_1;else
d8u_3: sec  ;    set carry to be shifted into result
rjmp d8u_1


Only 14 program steps is not bad. Almost 100 clock cycles per call and we need to call it twice fort a single byte conversion. That's about 200 clock cycles each. I'm sure we can do better than that!

AVR_Admin- 04-13-2006
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)

Previously we divided our 1 byte number by 100 then by 10 and when it comes time to expand our routine to handle 16 bit numbers we'll need to divide by 10,000, 1,000, 100 and finally by 10. At approx. 100 cycles per call that's 400 clock cycles per number.

Maybe we should look at the problem again from the "other side" the right-hand-side instead of left.

CODE
255 / 10 = 25   Remainder = 5
25  / 10 = 2    Remainder = 5
2   / 10 = 0    Remainder = 2


Well we've managed to extract the 255 again, but now we need Three calls to the division routine and for a 16 byte number we'll need Five. The only thing we have going for us using this method is that if we can find a "special" division routine for Division-by-Ten that is smaller and/or faster than the Atmel routine, we might have something we can work with.


AVR_Admin- 04-13-2006
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)

I remember someone posting a file of Division-by-a-Constant Routines, so why don't we see if we can get luck there.

Here is what I found, exactly what we're looking for, a specialized routine for Division-by-Ten:

CODE

;***************************************************************************
;* Function "Div16_10"
;* Divides an unsigned 16 bit word (XH:XL) by 10
;* Returns quotient in YH:YL and remainder in XL
;*
;* Author: Andreas Lenze (andreas.lenze@t-online.de)
;* Equations by D: W. Jones:
;*
;* Reciprocal mul w. extra precision:
;* unsigned int A;
;* unsigned int scaled_reciprocal = 0xCCCD;
;* unsigned int Q; /* the quotient */
;*
;* Q = ((A * 0xCCCD) >> 19)
;*
;* Uses: high regs: 7 (r17, r18, r19, X, Y)
;*  low regs:  3 (r0, r1, r2)
;*
;*  words:     36 (w. push/pop = 8 words)
;*  cycles:    46 (w. push/pop = 16 cycles)
;*
;* Note: Hardware multiplier required ("mul" instruction)
;*
;***************************************************************************

Div16_10:
push r2
push r19
push r18
push r17

ldi YH,0xCC ; scaled reciprocal for /10
ldi YL,0xCD

; Q = A * 0xCCCD
; (r19:r18:r17[:rXX] = XH:XL * YH:YL)
       clr     r2
       mul     XH, YH ; ah * bh
       movw    r19:r18, r1:r0
       mul     XL, YL ; al * bl
mov r17,r1 ; r0 to [rXX] is superfluous
       mul     XH, YL ; ah * bl
       add     r17, r0
       adc     r18, r1
       adc     r19, r2
       mul     YH, XL ; bh * al
       add     r17, r0
       adc     r18, r1
       adc     r19, r2

; Q = Q >> 16: use r19:r18 as word
; Q = Q >> 3
lsr r19 ; do the last 3 shifts
ror r18
lsr r19
ror r18
lsr r19
ror r18

; r19:r18 now "Q" (= result >> 19)
; R = A - 10*Q;
ldi r17,10 ; multiply r19:r18 by 10
mul     r18, r17; al * bl
sub XL,r0
clr XH
movw YL,r18 ; make copy of Q

; XL holds "R"
; YH:YL holds "Q"
pop r17
pop r18
pop r19
pop r2

ret


Wow, I looks huge! Let's see 46 program steps, that's a little on the heavy side, and 10 registers used ouch!. However, it is a 16 bit Divide-by-Ten Routine and it excecutes in about 1/2 the time that the Atmel 8 bit routine did, so we're moving in the right direction. Two or Three calls to this would be 100 to 150 cycles versus the 200 to 300 using the Atmel Routine.

If we can't come up with something on our own, perhaps we could strip this down to an 8 bit routine, or perhaps just bite the bullet because sooner-or-later we're gonna' need to convert 16 bit values anyway...

AVR_Admin- 04-13-2006
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)

A while back, while literally sitting-on-the-throne, I happened to realized that 255 is evenly divisable by 5 and that it is only one away from 256 which is a Power-of-Two. After a while, it becomes a habit to always be on-the-lookout for Powers-of-Two. Either that or I'm slowly drifting into the world of padded-rooms and nice doctors in white smocks. Of all the things I've lost, I miss my mind the most.

To divide by Ten we might be able to first Divide-by-Five then Right-Shift to Divide-by-Two and the result would be a Division-by-Ten. I stored that fact away and wondered if the 255-256 link could be put to use sometime.

Pehaps today is a good day to drag it out of the bottom drawer and dust it off...

Let's see: 255 divided by 5 is 51 and 255 is awlful close to 256 and furthermore division by 256 is super-super easy and requires ZERO program steps. The "Dirty Trick" to that is to simply ignore the least significant Byte and we've essentially Divided our number by 256.

Since 255 is almost equal to 256, then 255/256 is almost equal to 1

Since multiplication by one does not change a value
Then 1/5 times 255/256 shoud be pretty close to 1/5

Since division by 256 can be accomplished with the Dirty Math Trick of ignoring the lowest byte, we'll leave that to last and our equation becomes:

1/5 time 255 and later we'll truncate the lower byte.
CODE

1/5 x 255 = 255 / 5 = 51

So mathematically speaking if we multiply our number by 51 and later
ignore the lower byte, we should get a pretty good approximation of
Division-by-Five. That's where our first routine will start...

AVR_Admin- 04-13-2006
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)

Let's start by writing a routine that muliplies by 51 and we'll look at the high byte and see if our mathematical assumptions were correct.

Here is the code:
CODE

DIV5: LDI B,51
     MUL A,B ;ANSWER LEFT IN R1
     RET


Great it's only 3 lines long!!! Let's test it's accuracy.
We're interested in a single byte so values can go from zero to 255
so lets check a few numbers in that range and see how it performs:

CODE

10 =>  1 (off by -1)
50 =>  9 (off by -1)
100 => 19 (off by -1)
150 => 29 (off by -1)
200 => 39 (off by -1)
250 => 49 (off by -1)

Hmm, we're alway off by minus one, so if we add one, we should be right on the money!

So the new Division-by-Five Routine becomes:

CODE

DIV5: LDI B,51
     MUL A,B
     INC R1; ANSWER IN R1
     RET


So we're half-way-home, all we need to do now is divide our answer by two and we should have a "Quick and Dirty" Divide-by-Ten Routine...

AVR_Admin- 04-13-2006
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)

There's two ways we can use our Division-by-Five Routine for a Division-by-Ten Routine.

We can modify the DIV5 routine. Earlier I stated that a "Quick and Dirty" way to Divide-by-Two is to just Shift your bits to the right. This is how we modify our DIV5 Routine to get a DIV10:


CODE

DIV10: LDI B,51
      MUL A,B
      INC R1  ;R1=A/5
      LSR A   ;R1=(A/5)/2 = A/10
      RET



The other way we could have accomplished the same thing would be to have the DIV10 routine call the DIV5 routine and that way we "kill-two-birds-with-one-stone" because we get two usable routines from it:


CODE

DIV10: RCALL DIV5 ;R1=A / 5
      LSR R1     ;R1=(A/5) / 2 = A/10
      RET

DIV5:  LDI B,51
      MUL A,B
      INC R1     ;ANSWER IN R1
      RET


Just to be on-the-safe-side let's double check the accuracy of these routines:

CODE
10 =>  1 Correct!
50 =>  5 Correct!
100 => 10 Correct!
150 => 15 Correct!
200 => 20 Correct!
250 => 25 Correct!

So we did it. We have our first "Dirty Math" Routine that is 100% accurate over the range we need.
Stay tuned however, there's still more to this story...

AVR_Admin- 04-13-2006
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)

PRE-SUMMARY:

Not bad for a days work eh? We took a rather large routine that we needed and reduced it to a "Quick and Dirty" little routine using a few "Dirty Math Tricks".

Let's compare and see how well we did.
First let's compare to the Atmel 8x8 Unsigned Division Routine:
CODE

                DIV10a   DIV10b   ATMEL
REGISTERS:         2        2        4
DIFFERENCE:      -2       -2        -
 PERCENT:       -50%     -50%       -

PROGRAM WORDS:    5         7       15(incl RET)
DIFFERENCE:    -10        -8        -
 PERCENT:      -66%      -53%       -

SPEED/CYCLES:     5         8       97
DIFFERENCE:    -92       -89
 PERCENT:      -95%      -92%

So we've used 1/2 the register space, about 60% less program space and we're running at about 20 times the speed that the Atmel Routine does.

Let compare to the Lenze/Jones Routine:
CODE

                DIV10a   DIV10b   LENZE
REGISTERS:         2        2       10
DIFFERENCE:      -8       -8        -
 PERCENT:       -80%     -80%       -

PROGRAM WORDS:    5         7       36
DIFFERENCE:    -31        -29       -
 PERCENT:      -86%       -80%      -

SPEED/CYCLES:     5         8       46
DIFFERENCE:    -41       -38        -
 PERCENT:      -89%      -83%       -


Against the Lenze/Jones routine we're using 80% less Register Space, about 85% less Program Space and w'e're running at between 6 to 10 times the speed.

Many might think that this is a great accomplishment and that I should stop at this point. Especially those that probably would have just used the larger routines and not had a second thought about it.

Well, obviously some of you don't know me very well.
I think I can pull a few more "Dirty Tricks" from out of my hat and turn this Dirty Littlle Routine into something totally scandalous. I'm not happy until the routine is so small that it pulls numbers from thin air, like magick...

AVR_Admin- 04-13-2006
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Continued)

Let's re-examine what it is we're doing mathematically with our Dirty Little DIV10 Routine:

We're multiplying by 51, adding one, then dividing by two, and then truncating to get a final division of 256:

CODE

A / 5 ~= [ [ ( A x 51 ) + 1 ] / 2 ] / 256


Let's ignore that final division by 256 because it's accomplished with ZERO program steps by ignoring the lower byte and there's no way we can improve on that unless there are opcodes that can move us backwards in time.

So we want to look closely at the right side of this equation:
CODE

256 x [A / 5] ~= ( 51A + 1) / 2
256 x [A / 5] ~= 51A/2 + 1/2


Well 1/2 doesn't really mean much to a microprocessor, you either have a bit that is on or its off. We'll ignore it for now and if need be, we'll increment something in the end like we did with our DIV5 routine.

So now we have:
CODE

256 x [A / 5] ~= 51A/2
256 x [A / 5] ~= 25.5 x A
256 x [A / 5] ~= 25A


Since we dropped a 1/2 earlier and now we're stuck with another 0.5 perhaps we should round the 25.5 upto 26 rather than truncate it down to 25.
CODE

256 x [A / 5] ~= 25.5 x A
256 x [A / 5] ~= 26A

So to divide by 10 all we need to do is multiply by 26 and ignore lower byte:
CODE

MUL10: LDI B,26
      MUL A,B  ;ANSWER IN R1


It sounds incredable, but let's check the results and see what our error ratio is:

CODE

10 =>  1  Correct!
50 =>  5  Correct!
100 => 10  Correct!
150 => 15  Correct!
200 => 20  Correct!
250 => 25  Correct!


So it's 100% accurate over the range we need and it's only two program lines.

We started with rather large routines requiring 50 to 100 clock cycles per call and have something that runs in just three.

Can we reduce it even more?

Not really, not until Atmel comes out with a MULI command that allows us to multiply a register with an immedaite value rather than loading it into another register first.

Maybe next year...

It would probably look like this:
CODE

DIV10: MULI A,26 ;ANSWER IN R1


AVR_Admin- 04-13-2006
TITLE: DIRTY DAN"s MATH TRICKS: DIVISION-BY-TEN (Conclusion)

IN CONCLUSION:

By systematically applying a few Dirty Tricks and following our instincts combined with a little high-school math, we took some rather large routines and reduced them down into just a couple program steps. Something that many would say is impossible., but never say never my friend. If I can do it, so can you!

REQUEST FOR FEED BACK:

I hope you found this posting entertaining as well as educational.

If you would like to see more like it, please be sure to let the moderators know.

Thank you for your time and consideration.

AVR_Admin- 04-13-2006
QUOTE ("alenze")

This is not true. For a 'div by 10' this is an approximation which will not yield correct results for all integers between 1 and 255 decimal (try for example 169, 159 ... for 'A' to see it fail). The 'scaled reciprocal' for a correct 'division by 10' of an 8 bit integer would have to be at least 9 bits to reach the required precision.

AVR_Admin- 04-13-2006
QUOTE ("lfmorrison")
Well now, Andreas... The argument could be made that RetroDan's solution is *more correct* than the traditional integer approach.

169/10 = 16 if we follow traditional definitions.

Of course, the "true" result is 16.9. But we cannot express 16.9 in an integer. So we round. The traditional approach for computer integer arithmetic is to always round down.

Mathemeticians would tell us that we should round anything from 0.0 up to 0.49999999... as 0, and anything from 0.500000000...01 up to 0.9999999999999999... as 1. There is a gray area for 0.5 exactly. I was taught a convention that in that case, you round to the even-numbered integer.

From that prespective, RetroDan's solution comes much closer to the "mathematically" correct solution. 169/10 = 17.

As for the amount of pushing and popping required... well, that's relative to the register usage conventions at hand. In the worst case (with all destoryed registers saved and restored), RetroDan's solution is still smaller and faster than the best-case (with no destroyed registers saved and restored) "Lenze" solution.

AVR_Admin- 04-13-2006
QUOTE ("clawson")
Yup, actually I thought the *26 then ditch 8 bit thing was really clever. Obviously it works because 26/256 is 0.1015625 which for small numbers is OK as a way of multiplying by roughly 0.1. If you tried a larger value like 800 then I guess it'd be 800 * 26 = 20,800 and divided by 256 this is then 81 - so the extra 0.0015625 has come into play to create the error. But on small numbers it is a clever technique for an "integer divide"

In fact just checked and it's good for all values up to 630 in fact, the first one where it goes wrong is 631

Cliff

AVR_Admin- 04-13-2006
QUOTE ("lfmorrison")
Finding remainders:
- Multiply the result by 10. Label as 'x'.
- Subtract 'x' from the original input.
- This result is equal the the "Remainder". Using RetroDan's code, the remainder can either "positive" or "negative", but it will always have a magnitude of 10 or less. If negative, then feel free to subtract 1 from the quotient and add 10 to the remainder. And presto! You have a result that is totally identical to the "Lenze" result, for all possible 8-bit inputs.


CODE
; RetroDan's original code
DIV10: LDI B,51
      MUL A,B
      INC R1  ;R1=A/5
      LSR R1  ;R1=(A/5)/2 = A/10
; Luke's insertion
      MOV C, R1
      LDI B, 10
      MUL C, B
      SUB A, R0
     ; 'A' contains the (signed) remainder.
     ; sign of the remainder indicates whether it is an:
     ; - under-estimate (positive remainder), or
     ; - over-esitmate (negative remainder)
     ; 'C' contains the quotient
     ; Now a 100% _mathematically_ correct result for all 8-bit inputs.
     ; (Destroys the original input operand 'A'.
     ;  one could return the remainder in register
     ;  'B' at the cost of one extra MOV instruction.)
      RET





Forumer™ is Voted #1 Free Forum Hosting provider
Build your own community today with the largest message board hosting company.