Understanding Big O Notation in analyzing Computing Algorithms

Understanding Big O Notation in analyzing Computing Algorithms

There are three fundamental questions that we need to ask of every algorithm.

  • Is our algorithm correct?
  • What is the operational (time) complexity or resource(memory/storage) complexity of our algorithm? In simpler terms how much time or memory does it take as a function of n (input size)?
  • Can we do better in performance?

We had a brief mention of algorithms in this article, in which we touched on different approaches to solving problems, but before we can dive into each technique, it is important to understand the metrics used to gauge the performance and efficiency of an algorithm.

An algorithm is said to be correct if, for every input instance, it finishes with the correct output. There are instances in which an algorithm that is incorrect may be useful if its error rate can be controlled, and that is out of scope for the purpose of this article.

We have other measures of complexity other than time and storage in arithmetic complexitycommunication complexity and circuit complexity and these are also out of the scope of this article. When the nature of complexity is not explicitly given, the assumption by default is that it is the operational(time) complexity that is referred to.

An algorithm’s complexity may change according to input size.

An algorithm that runs ten times faster but uses ten times as much memory might be perfectly acceptable in a server environment with vast amounts of available memory, but may not be appropriate in an embedded environment where available memory is severely limited.

Analyzing what happens as the number of inputs to a problem definition becomes very large is what is called Asymptotic analysis. If an algorithm takes 5 milliseconds with one thousand items, we need to keep in the back of our minds how long it will take with 1 million items. Will it take 5 seconds, 5 minutes or 5 years? How then do we represent the complexity of an algorithm?

Enters the Big O notation

As the amount of needed resources varies with the input, algorithmic complexity is generally expressed as a function n -> f(n), where n is the size of the input, and f(n) is either the worst-case complexity which is the maximum of the amount of resources that are needed for all inputs of size n or average-case complexity, but it only makes sense to always be safe than sorry and look at the worst-case scenario and code against it.

Big O Notation is a way to represent how long an algorithm will take to execute as a function f(n).This in its simplest terms is determined by the number of computer steps required to process an algorithm. It enables a software programmer to determine how efficient an approach to solving a problem is.

Different algorithms can be classified into the same Big O (complexity) band, but one may still be more efficient than the other.

In the examples that we will show below, we are mainly deducing algorithmic complexity with regards to its operational complexity as rate of growth in computation operations against size of input elements, but we must remember that we can also measure complexity by looking at resource complexity as well i.e. how much additional memory relative to the size n of input data set an algorithm needs, or indeed arithmetic,communication and circuit complexity.

A problem is regarded as inherently complex if its solution requires significant time and storage resources 

Constant O(1) Algorithms

A Constant O(1) algorithm is one whose complexity is constant regardless of input size and will always take the same amount of time to be processed. The 1 does not mean that there is only one operation or that the operation takes a small amount of time. It might take 1 millisecond or an hour. The point is that the size of input does not influence the time the operation takes.

int GetItemCount(int[] items) 
{ 
   return items.Length; 
}

In this scenario it does not matter whether there are 10 items or 20 million items. It will take the same amount of time to get the length of the items array. Push() and pop() methods into and out of an array will also always take constant time. This is the most desirable complexity and we must always strive to achieve, and obviously it will not always be possible.

Logarithmic O(log n) Algorithms

A Logarithmic O(log n) algorithm is one whose complexity is logarithmic to its size. Examples of logarithmic include many divide and conquer algorithms. The binary search tree Contains method implements an O(log n) algorithm.

Linear O(n) Algorithms

A Linear O(n) algorithm is one whose complexity grows in a linear and proportional manner with respect to the size of your input. We can expect that if an input size of 1 takes 5 milliseconds, then 1,000 items will take 5 seconds. We can recognize a Linear O(n) by looking for a looping mechanism that accesses each member.

public long GetSum(int[] items) 
{ 
  long sum = 0; 
  foreach (int i in items) 
  { 
    sum += i; 
  } 
  return sum;    
}

Linearithmic O(n log n) Algorithms

A Linearithmic O(n log n) otherwise known as log linear is an algorithm with complexity O(n log n). Some divide and conquer algorithms like merge sort and quick sort fall into this band.

Quadratic O(n2) Algorithms

A quadratic O(n2) algorithm is one whose complexity is quadratic to its size. There will be times that you can't avoid having a quadratic complex algorithm, but the mere fact of using it points you to the fact that you need to reconsider your design. Quadratic algorithms do not scale out well as input size grows.

Compare with a linear O(n) algorithm that loops through all n elements of a list, a quadratic example is a double nested loop. More practical, most inefficient sort algorithms fall under this category, like selection sort, insertion sort, and bubble sort.

As another example, if an array with 1000 integers would require 1,000,000 computer steps to complete. An input with one million items would take one trillion (1,000,000,000,000) computer steps. To put this into perspective, if each operation takes one millisecond to complete, an O(n2) algorithm that receives an input of one million items will take nearly 32 years to complete. Making that algorithm 100 times faster would still take 84 days.

Exponential O(2^n) Algorithms

Exponential O(2^n) complexity is an algorithm whose growth doubles with each addition to the input data set. With small inputs the growth curve is very small, but it rises exponentially when the input gets larger.

Factorial O(n!) Algorithms

Factorial O(n!) complexity represents an algorithm which has to do n! computations to solve a problem. These algorithms take a huge performance hit for each additional input item. For example, an algorithm that takes 2 seconds to compute 2 elements, will take 6 seconds to calculate 3 and 24 seconds to calculate 4 elements!

Computation Comparisons

Case Study : Fibonacci Sequence

A fifteenth century Italian mathematician Leornardo Fibonacci came up with a unique and interesting number sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34...;

These numbers can be generated by the following rule:




This simply means that getting the next Fibonacci number Fn in the sequence is done by adding up the 2 numbers before it Fn - 1 and Fn -2

Let's have a glimpse of why this number series is interesting.

Let's look at the following stunningly beautiful picture from a staircase in a Vatican museum.

How did the Vatican stairs architect come up with that spiral with precision? I'm positive that it wasn't just a matter of lying down on a beach in Tuscany on one fine evening observing the sky for some stars formation.

Let's revisit the numbers by our good old friend Signore Fibonacci. If we carefully cut out square tiles using those numbers as dimensions and stack them up together, we will come up with an interestingly similar shape.






Back to business... Let's try to write a method that calculates the nth Fibonacci number using c#

int Fibonacci(int n)
 {
    if (n == 0)
        return 0;
    if (n == 1)
        return 1;
    return Fibonacci(n - 1) + Fibonacci(n - 2);
 }
//Fibonacci(9).Dump("Fib 9 "); using LINQPad returns 34

or if we use F#

let rec Fibonacci n =
 
  match n with 
  | 0 -> 0 

  | 1 -> 1 

  | n -> Fibonacci (n - 1) + Fibonacci (n - 2)
 
//Call the function and print results
printfn "(Fib ) = %i" (Fibonacci 9) 

Firstly is our algorithm a.k.a way of finding the nth Fibonacci number correct? According to the given rule it is correct. Secondly let's scrutinize the number of computer steps involved in our solution. Let C(n) be the number of computer steps required to process Fibonacci(n).

If nth term n is less than or equal to 1, the procedure halts almost immediately at the first or second return statement. That means number of computer steps C(n) ≤ 2 for n ≤ 1

Time is usually represented in seconds, minutes, etc, but for the sake of measuring algorithmic complexity, let's measure our time as the number of computer steps needed for a calculation, and therefore let T(n) be the time taken for computer steps required C(n).

Time taken for n less than or equal to 1 now becomes T(n) ≤ 2 for n ≤ 1

At this point, when n is less than 2, we are doing well in time or operational complexity.

When our nth term is equal to 2 or more, we make 2 recursive calls to Fibonacci method as in the last return statement in our proposed solution above:

return Fibonacci(n - 1) + Fibonacci(n - 2);

and this will take computer steps C(n-1) and C(n - 2) respectively, and therefore take time T(n-1) and T(n-2).

There will be an additional 2 computer steps when checking the value of n in C(n-1), then C(n -2) and an additional 1 step in doing the addition between Fibonacci(n-1) and Fibonacci(n-2), making a total of 3 additional computer steps, and therefore the total number of computer steps for nth term n greater than 1 is represented as:

C(n) = C(n-1) + C(n-2) + 3,

and that is the same in time taken T(n) = T(n-1) + T(n-2) + 3 for n>1

If we look at our defined rule in the equation for Fn above, we will immediately notice that time taken T(n)≥ F(n) and that is not good news, as the running time for our algorithm grows as fast as the Fibonacci numbers grow, and trust me Fibonaccis do grow pretty fast...

T(n) is exponential in n and that implies that our algorithm is not recommended as it is impractically slow except for very small values of n...

"Thundering typhoons and billions of blistering barnacles!" Am sure if our coder was Captain Haddock, he would remark, and who would blame him he does have a working solution, doesn't he?

In short, our naive recursive algorithm is correct but hopelessly inefficient. But can we do better and come up with a solution that is not exponential?

Let's look at how the recursive calls worked in our solution above.

We will immediately see that many of the computations are actually repeated as an example check how many times we do an Fn-3 computation...3 times! Am sure we can do better than that. Let's try and keep the value that we have already computed for Fn as soon as it is known in our solution:

int Fibonacci2(int n)
{
  if (n == 0) return 0;
  int number = n - 1;//decrement the length because the series starts with 0
  int[] fibArray = new int[number + 1];
  fibArray[0] = 0;
  fibArray[1] = 1;
  for (int i = 2; i <= number; i++)
  {
    fibArray[i] = fibArray[i - 1] + fibArray[i - 2];
  }
  return fibArray[number];
}

How did we improve in this implementation Fibonacci2 compared to the first one? Fibonacci2 takes much less computer steps C(n), and therefore less computation time T(n).

The inner for loop consists of a single computer step and executed n-1 times, therefore the number of computer steps used by Fibonacci2 is linear in n, that will help with a huge save in running time, and therefore reduce our operational complexity . We have reduced complexity from Exponential O(2^n) to Linear O(n).

Can we do better with Fibonacci3 ? Yes, but we will leave that for a future article.

Happy problem solving to you...until the next article

要查看或添加评论,请登录

Kenneth Fukizi的更多文章

社区洞察

其他会员也浏览了