On P, NP, Partitions and Interviews
Roaming around various forums where IT interview questions are discussed, bears fruit sometimes! Like today : an example. This is what we are going to talk about.
Story : The Problem for Looters
Once upon a time there were 2 Dacoits. Once their loot was over, they pondered upon the problem of dividing their loots into two equal partitions. It ended up being a subtle problem:
- Not all the loots have the same worth G = { 1, 11, 23, ... }
- There were many looted goods : |G| ~ 100
Thus, the problem: how can one partitions the goods into two equal halves of same value! Obviously, they could not do it, and decided to kill each other, thus reducing the problem to 1 partition, hence immediately solvable. Not so much for a nice end.
But fear not, we are not here to mourn over the loss of both the dacoits. Powered by this modern device known as computer, we will be able to solve *every problem* ! Behold, AI and Machine Learning is here. All will be solved, because infinite number of humans are typing infinite code day in and day out. My friend Kiriti terms this as his infinite programmers theorem. But alas...reality differs entirely.
Being Formal
Given the dacoits are no more, Von Neumann in the heaven pondered upon the idea of combinatorics to solve this. The formal approach is as follows:
Given a multiset M ( a set that may contain repetition ) of integers ( note the absent of any sign check ) generate two partition (P1,P2) of M such that sum of all elements in one is equal to sum of all elements in the other: SUM(P1) = SUM(P2).
But being geeks of the newly found nation of Computer Science, we choose not to stop at 2 partitions, we insisted on variable: k number of partitions. What more, we found out that the exact equality is too boring and too mundane, because most probably we can never make them exactly equal. So what about sum of partitions which are close to one another?
Oh English, why you are so Terrible choice for Logic!
We probably have noticed this already, the last statement does not make any sense whatsoever. We want k partitions of a multiset. That part is certain. But what would be the property of the partitions?
"Given a sorted array with "n" elements, distributed them into "k" nearly equally weighing buckets"
Terrible, is not it? What is nearly equally? Ok, let's start being formal, by raising at least a thousand eyebrows (and our chance of flanking the interview in question) we propose:
Define sum(X) as the sum of all elements from the collection X
Define p_i as the partition indexed by i of a Multiset
Define abs( x ) as the absolute value of x
Define m = sum( abs( sum(p_i) - sum(p_j) )
Minimise this *m*.
When m is minimum, find the partitions p_i 's which caused it.
That has to be the answer. Well, we abused Einstein Notation a bit.
P is NOT (probably) NP
There are problems, then there are hard problems, and then there are harder problems. The problem class NP belongs to the HARD groups...well not entirely but you can take that. The key takeaway - given an optimisation problem is in NP, the validation of the solution also goes to NP. ( Thanks to Amleshwar to fix it here). You can also visit the Complexity Zoo - to see how much confusing things really are.
Take for example an optimisation problem that is clearly not in NP but P.
Find the maximum element of an array:
MAX = reduce(array) as def(inx,item,ctx,last){ last < item ? item : last }
To verify, we need to get the test result MAX, and compare it against all elements of the input array to check, in any case if the MAX is less than some element in the input array. Then, the MAX is wrong. One iteration. This validation is in P : Polynomial time:
// One iteration over elements of array is good enough to verify MAX
is_valid = ! exists( array ) where def(inx, item ) { item > MAX }
What about our current problem with the formulation? To verify that we need to, well, in the same way, iterate over all possible combinations of partition pairs. Somehow that does not smell like P, that actually smells like NP and moreover smells like CO-NP Complete. Software Testing is NP Hard and sometimes, not even computable ( how to test - RECAPTCHA? ).
Exactness Reduces Bounds
Hold heart, young padwans. There is really no need to switch to the Dark side immediately - because your loved problem can not be saved. If we look closely to the definition of *m* :
m = sum( abs( sum(p_i) - sum(p_j) )
It is obvious that the minimum m will be 0. Wow. That is something. That means, for exact solution, the validation is, well, all individual abs() pairs must produce 0! That is, all the sum(p_i) are equals to the average value! Now we are talking!
Define A := sum(M)/k
Verify that A is indeed an integer and the resulting sum(p_i) == A
This shows that force perhaps may be strong with P's. Indeed it is. Read carefully. There are subtle math involved, but the basic is - given there is an exact solution, the average holds the key. Even with an exact solution, sometimes even having exact solution can not be solved with any heuristics:
[771, 121, 281, 854, 885, 734, 486, 1003, 83, 62]
Why it happens? Because with average, the problem changes into subset sum problem! Just replace 0 with the average and you are good. Oh my! But then, there is a DP solution available. Cool.
Iterating by Counting via Encoding
What happens when the exact solution does not exist, by design? Anger, Fear follows. We have *surely* trespassed into the domain of NP. Albeit there are heuristics, the only way out is, iterating over all the partitions of the M, and minimising for m.
Suppose that is the problem to solve, to generate all possible partitions of a multiset M. How does one do it? Observe that to partition into k, we can imagine a number in base k whose length of digits is N, where N is the size of M.
If we go for k-base encoding for all numbers from 0 to k^N, such that all k digits are present in the resulting base k encoding, we have in fact generated all the partitions possible. How? By simply collecting for the partition 'i' the 'i'th digits indices from the multiset! Here is an example with k=2, and M = { -1,1,1,1,8, 10 } thus N = 6.
inx: 5 4 3 2 1 0 // indices
no: 0 0 0 0 0 1 // min no such that all 2 digits are present!
arr: -1 1 1 1 8 10 // p0 = { -1,1,1,1,8} ; p2 = { 10 }
Of course we are cheating, in this case, there is a perfect solution. :-). But... well, we are generalising it never the less! Thus a solution can be found simply by:
// https://www.careercup.com/question?id=5725965217955840
k = 2
M = [ -1,1,1,1,8,10 ]
N = size(M)
last = k ** N // the largest one k to the power N
min = num('inf') // for min use infinity
min_p = null // partitions store, as of now nothing
// now we iterate - recursion is only for the divine
for ( n : [0:last] ){
s = str(n,k) // representing number n in base k
// reject when all the k digits do not exist in base k rep
continue( size(set(s.value) ) != k )
// left pad with zeros thus making the string N size
s = '0' ** ( N - size(s)) + s
// collect into k partitions ( change the char into int as key )
p = mset( M ) as def(inx){ int(s[inx]) }
// now generate m - calculating all pairs between 0,k-1
m = sum ( comb( [0:k] , 2 ) ) as def(inx, pair){
// size(x) is abs(x)
size( sum(p[pair[0]]) - sum(p[pair[1]]))
}
// search for min m
if ( m < min ){
min = m
min_p = p
}
}
// wrap it up...
println(min)
println(min_p)
// Finally, Galaxy has peace! In less than 42 lines, of course
And finally, I am not going to ask for this in an coding interview, unless one is hiring a researcher, save alone poor SDE2s. Practical problems are non trivial! We end by solving the other partition problem 771 121 281 854 885 734 486 1003 83 62 - the code above generates :
{0=[771, 281, 854, 734], 1=[121, 885, 486, 1003, 83, 62]}
easily verifiable in Theta(n).
Thinknowlogy is the world's only naturally intelligent knowledge technology, based on Laws of Intelligence that are naturally found in the human language. Open souce software.
7 年In regard to "AI and Machine Learning is here": Scientists fail to define intelligence in a natural way (as a set of natural laws). Therefore, AI is limited to engineering: specific solutions to specific problems. And machine learning is limited to (perform tasks based on) pattern recognition. Here's a challenge that scientists fail to implement: I defy anyone to beat the simplest results of my Controlled Natural Language reasoner in a generic way, from natural language, through algorithms, to natural language: https://mafait.org/challenge/.