登录查看更多内容

On P, NP, Partitions and Interviews

Nabarun Mondal

Engineering Leader - Precision Shooter (Not that precise)

发布日期: 2017年4月3日

Roaming around various forums where IT interview questions are discussed, bears fruit sometimes! Like today : an example. This is what we are going to talk about.

Story : The Problem for Looters

Once upon a time there were 2 Dacoits. Once their loot was over, they pondered upon the problem of dividing their loots into two equal partitions. It ended up being a subtle problem:

Not all the loots have the same worth G = { 1, 11, 23, ... }
There were many looted goods : |G| ~ 100

Thus, the problem: how can one partitions the goods into two equal halves of same value! Obviously, they could not do it, and decided to kill each other, thus reducing the problem to 1 partition, hence immediately solvable. Not so much for a nice end.

But fear not, we are not here to mourn over the loss of both the dacoits. Powered by this modern device known as computer, we will be able to solve *every problem* ! Behold, AI and Machine Learning is here. All will be solved, because infinite number of humans are typing infinite code day in and day out. My friend Kiriti terms this as his infinite programmers theorem. But alas...reality differs entirely.

Being Formal

Given the dacoits are no more, Von Neumann in the heaven pondered upon the idea of combinatorics to solve this. The formal approach is as follows:

Given a multiset M ( a set that may contain repetition ) of integers ( note the absent of any sign check ) generate two partition (P1,P2) of M such that sum of all elements in one is equal to sum of all elements in the other: SUM(P1) = SUM(P2).

But being geeks of the newly found nation of Computer Science, we choose not to stop at 2 partitions, we insisted on variable: k number of partitions. What more, we found out that the exact equality is too boring and too mundane, because most probably we can never make them exactly equal. So what about sum of partitions which are close to one another?

Oh English, why you are so Terrible choice for Logic!

We probably have noticed this already, the last statement does not make any sense whatsoever. We want k partitions of a multiset. That part is certain. But what would be the property of the partitions?

"Given a sorted array with "n" elements, distributed them into "k" nearly equally weighing buckets"

Terrible, is not it? What is nearly equally? Ok, let's start being formal, by raising at least a thousand eyebrows (and our chance of flanking the interview in question) we propose:

Define sum(X) as the sum of all elements from the collection X

Define p_i as the partition indexed by i of a Multiset

Define abs( x ) as the absolute value of x

Define m = sum( abs( sum(p_i) - sum(p_j) )

Minimise this *m*.

When m is minimum, find the partitions p_i 's which caused it.

That has to be the answer. Well, we abused Einstein Notation a bit.

P is NOT (probably) NP

There are problems, then there are hard problems, and then there are harder problems. The problem class NP belongs to the HARD groups...well not entirely but you can take that. The key takeaway - given an optimisation problem is in NP, the validation of the solution also goes to NP. ( Thanks to Amleshwar to fix it here). You can also visit the Complexity Zoo - to see how much confusing things really are.

Take for example an optimisation problem that is clearly not in NP but P.

Find the maximum element of an array:

MAX = reduce(array) as def(inx,item,ctx,last){ last < item ? item : last }

To verify, we need to get the test result MAX, and compare it against all elements of the input array to check, in any case if the MAX is less than some element in the input array. Then, the MAX is wrong. One iteration. This validation is in P : Polynomial time:

// One iteration over elements of array is good enough to verify MAX
is_valid = ! exists( array ) where def(inx, item ) { item > MAX }

What about our current problem with the formulation? To verify that we need to, well, in the same way, iterate over all possible combinations of partition pairs. Somehow that does not smell like P, that actually smells like NP and moreover smells like CO-NP Complete. Software Testing is NP Hard and sometimes, not even computable ( how to test - RECAPTCHA? ).

Exactness Reduces Bounds

Hold heart, young padwans. There is really no need to switch to the Dark side immediately - because your loved problem can not be saved. If we look closely to the definition of *m* :

m = sum( abs( sum(p_i) - sum(p_j) )

It is obvious that the minimum m will be 0. Wow. That is something. That means, for exact solution, the validation is, well, all individual abs() pairs must produce 0! That is, all the sum(p_i) are equals to the average value! Now we are talking!

Define A := sum(M)/k

Verify that A is indeed an integer and the resulting sum(p_i) == A

This shows that force perhaps may be strong with P's. Indeed it is. Read carefully. There are subtle math involved, but the basic is - given there is an exact solution, the average holds the key. Even with an exact solution, sometimes even having exact solution can not be solved with any heuristics:

[771, 121, 281, 854, 885, 734, 486, 1003, 83, 62]

Why it happens? Because with average, the problem changes into subset sum problem! Just replace 0 with the average and you are good. Oh my! But then, there is a DP solution available. Cool.

Iterating by Counting via Encoding

What happens when the exact solution does not exist, by design? Anger, Fear follows. We have *surely* trespassed into the domain of NP. Albeit there are heuristics, the only way out is, iterating over all the partitions of the M, and minimising for m.

Suppose that is the problem to solve, to generate all possible partitions of a multiset M. How does one do it? Observe that to partition into k, we can imagine a number in base k whose length of digits is N, where N is the size of M.

If we go for k-base encoding for all numbers from 0 to k^N, such that all k digits are present in the resulting base k encoding, we have in fact generated all the partitions possible. How? By simply collecting for the partition 'i' the 'i'th digits indices from the multiset! Here is an example with k=2, and M = { -1,1,1,1,8, 10 } thus N = 6.

inx:  5 4  3  2  1  0 // indices 
no:   0 0  0  0  0  1 // min no such that all 2 digits are present!
arr: -1 1  1  1  8  10 // p0 = { -1,1,1,1,8} ; p2 = { 10 }

Of course we are cheating, in this case, there is a perfect solution. :-). But... well, we are generalising it never the less! Thus a solution can be found simply by:

// https://www.careercup.com/question?id=5725965217955840
k = 2
M = [ -1,1,1,1,8,10 ]
N = size(M) 
last = k ** N // the largest one k to the power N 
min = num('inf') // for min use infinity 
min_p = null // partitions store, as of now nothing 
// now we iterate - recursion is only for the divine
for ( n : [0:last] ){
  s = str(n,k) // representing number n in base k 
  // reject when all the k digits do not exist in base k rep  
  continue( size(set(s.value) ) != k )
  // left pad with zeros thus making the string N size  
  s = '0' ** ( N - size(s)) + s 
  // collect into k partitions ( change the char into int as key )
  p = mset( M ) as def(inx){ int(s[inx]) }
  // now generate m - calculating all pairs between 0,k-1
  m = sum ( comb( [0:k] , 2 ) ) as def(inx, pair){ 
    // size(x) is abs(x) 
    size( sum(p[pair[0]]) - sum(p[pair[1]])) 
  }
  // search for min m 
  if ( m < min ){  
   min = m
   min_p = p
  }
}
// wrap it up...
println(min)
println(min_p)
// Finally, Galaxy has peace! In less than 42 lines, of course

And finally, I am not going to ask for this in an coding interview, unless one is hiring a researcher, save alone poor SDE2s. Practical problems are non trivial! We end by solving the other partition problem 771 121 281 854 885 734 486 1003 83 62 - the code above generates :

{0=[771, 281, 854, 734], 1=[121, 885, 486, 1003, 83, 62]}

easily verifiable in Theta(n).

Menno Mafait

Thinknowlogy is the world's only naturally intelligent knowledge technology, based on Laws of Intelligence that are naturally found in the human language. Open souce software.

7 年

In regard to "AI and Machine Learning is here": Scientists fail to define intelligence in a natural way (as a set of natural laws). Therefore, AI is limited to engineering: specific solutions to specific problems. And machine learning is limited to (perform tasks based on) pattern recognition. Here's a challenge that scientists fail to implement: I defy anyone to beat the simplest results of my Controlled Natural Language reasoner in a generic way, from natural language, through algorithms, to natural language: https://mafait.org/challenge/.

1 次回应

查看更多评论

要查看或添加评论，请登录

Nabarun Mondal的更多文章

Created by a Machine : On Machines and Creativity

2017年5月17日

Created by a Machine : On Machines and Creativity

We are seeing a lot of petty discussions about machines capable of doing anything and everything from Quora to LinkedIn…

5 条评论
Implication, Probability, Logic : IPL

2017年4月24日

Implication, Probability, Logic : IPL

IPL is the most important thing ever happened since India became independent. No matter what the critics say, it is the…

3 条评论
Programming, Pragmatism, Nirvana

2017年4月13日

Programming, Pragmatism, Nirvana

Being Agile is one of the most persisting fashion of today. While the agile manifesto ( I sincerely ask you to read it…
Re-Setting Expectations On Testing

2017年2月28日

Re-Setting Expectations On Testing

Pledge - The Lamenting I came along this lament, not so long time ago, by a very senior Engineering Manager working for…
Currency: Puzzles into Backtracking

2016年12月17日

Currency: Puzzles into Backtracking

Money is *the* prime mover for any society. Apparently, with only a bit of push, monkeys got introduced to currency and…
Seldom been KISSED?

2016年11月15日

Seldom been KISSED?

Power of Being Simple Much power lies in simple, stupid reasoning. If one pause to study the epic The Selfish Gene, one…

1 条评论
Performance Stats - Why Average is Wrong

2016年3月13日

Performance Stats - Why Average is Wrong

This came up recently in a discussion. The idea was, is anything average can be used as a metric of something as…

13 条评论
Keep it simple, because we are stupid

2016年3月12日

Keep it simple, because we are stupid

Simplicity, has it's advantage. As Einstein said , in effect, "that everything should be as simple as it can be, but…
On Vader, Valiance and the Art Of Leadership

2016年2月6日

On Vader, Valiance and the Art Of Leadership

Star Wars Well, there are only two sorts of people who watch movies. Those who watched Star Wars, and those who will…

2 条评论
Keyword Driven Test Automation, Works?

2016年1月29日

Keyword Driven Test Automation, Works?

Onto Keywords Industry is abuzz about something we experimented a decade back - called the Keyword Driven Testing…

12 条评论

See all articles

On P, NP, Partitions and Interviews

Nabarun Mondal

Engineering Leader - Precision Shooter (Not that precise)

Story : The Problem for Looters

Being Formal

Oh English, why you are so Terrible choice for Logic!

P is NOT (probably) NP

Exactness Reduces Bounds

Iterating by Counting via Encoding

Nabarun Mondal的更多文章

社区洞察

其他会员也浏览了

KAYA Global 'Inside Voices' Series: Kanchana Wickramasinghe, VP of Engineering

The Zen Interview for Data Scientists

Algorithmic Insights: A Deep Dive into Six Fundamental Problems.

Part 5: ML Interview Preparation – Key Skills and Competencies

Investigating quick sort partitioning

ChatGPT & Human-answered Collection of 50 Toughest Interview Questions

"Master Regression Interview Questions: Crush Assumptions, Unravel Significance, and Converge with Confidence!"

Socrates Interviews A.I.

Reverse Nodes in K-Groups: Can You Solve This Tricky Linked List Interview Question?

5 Questions To Prepare You For Your Next Data Science Interview

Story : The Problem for Looters

Being Formal

Oh English, why you are so Terrible choice for Logic!

P is NOT (probably) NP

Exactness Reduces Bounds

Iterating by Counting via Encoding

Nabarun Mondal的更多文章

Created by a Machine : On Machines and Creativity

Implication, Probability, Logic : IPL

Programming, Pragmatism, Nirvana

Re-Setting Expectations On Testing

Currency: Puzzles into Backtracking

Seldom been KISSED?

Performance Stats - Why Average is Wrong

Keep it simple, because we are stupid

On Vader, Valiance and the Art Of Leadership

Keyword Driven Test Automation, Works?

社区洞察

其他会员也浏览了

KAYA Global 'Inside Voices' Series: Kanchana Wickramasinghe, VP of Engineering

The Zen Interview for Data Scientists

Algorithmic Insights: A Deep Dive into Six Fundamental Problems.

Part 5: ML Interview Preparation – Key Skills and Competencies

Investigating quick sort partitioning

ChatGPT & Human-answered Collection of 50 Toughest Interview Questions

"Master Regression Interview Questions: Crush Assumptions, Unravel Significance, and Converge with Confidence!"

Socrates Interviews A.I.

Reverse Nodes in K-Groups: Can You Solve This Tricky Linked List Interview Question?

5 Questions To Prepare You For Your Next Data Science Interview