Who is a clever programmer?
In 1983, this guy was a computer programmer at Lucasfilm.
He was optimizing a real time animation program and wanted to “unroll" a loop in C. This is a way to optimize code by reducing the number of times the computer has to check if the loop can be ended.
This was the original loop:
send(to, from, count) register short *to, *from; register count; { do { /* count > 0 assumed */ *to = *from++; } while(--count > 0); }
Normally, a loop is unrolled by dividing the loops by a number and copying the code within the loop that many times. So, something that loops 64 times would loop 8 times with the code repeated 8 times, eliminating 56 times that the program has to check if the loop is complete like this:
send(to, from, count) register short *to, *from; register count; { register n = count / 8; do { *to = *from++; *to = *from++; *to = *from++; *to = *from++; *to = *from++; *to = *from++; *to = *from++; *to = *from++; } while (--n > 0); }
This may be tough to understand but can you attempt these Interview Questions on C:
iq.opengenus.org/tough-questions-on-c
The problem was that he didn't know ahead of time how many times it would loop, so he couldn't be guaranteed an even division. This would normally be handled with two loops, one like the above code and one for the remainder. He wanted it more compact and even more optimized though, so he drew on his experience with assembly, looked at the spec for the switch statement in C and realized he could mix a loop and a case statement to do this:
send(to, from, count) register short *to, *from; register count; { register n = (count + 7) / 8; switch (count % 8) { case 0: do { *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; } while (--n > 0); } }
His name is Tom Duff, and that brilliant block of code is known as Duff's device. When I was first learning C, a senior developer told me about it and explained it to me. I was so impressed, that I scoured my code for a loop to unroll and successfully implemented it.
Granted, my code was fast enough and it didn't make much of a difference in performance, but I was able to say I used Duff's device. Yes, it was premature optimization, but I didn't care. It was Duff's device!