Skip to content

Balancing Partitions

21 January, 2011

I never liked Dynamic Programming.

In fact, in programming contests, where the pressure is always on winning, getting it, I would feel immensely stupid for not getting it. Anyway, after reading up a bit about how it works from the CLR, I ended up understanding a bit more about how it works.

This post is about a problem given in this month’s USACO contest (Silver Division), called divgold where you are tasked with solving the Balanced Partition problem. It’s a fairly known example, but I was unfamiliar with it. You can read all the problems by viewing the January contest on the USACO Contestgate. You need to register though.

Balanced-Partition

You are given N numbers, let’s call them a_1, a_2, \ldots, a_n, and you must find the number of ways to partition these numbers into two groups a_{11}, a_{12}, \ldots, a_{1k} and a_{21}, a_{22}, \ldots, a_{2l} such that:

  • l + k = N;
  • |S_1 - S_2| is minimized, where S_1 = \sum_{i=0}^{k}a_{1i} and S_2 = \sum_{j=0}^{l}a_{2j}.

And you must find the minimum difference itself, m = |S_1 - S_2|.

Let’s make some notes before actually talking about the solution (which – surprise! – involves dynamic programming). First of all, it is NP-Complete. This can be of course proven quite easily, and we will actually do it since it contains the key idea that helps solve the problem.

Warning! Theoretical stuff ahead. If you just want to learn how to solve the problem, read the official analysis!

So, when proving that Balanced-Partitions \in NP-Complete we’ll first prove that Balanced-Partitions \in NP by devising a non-deterministic algorithm that solves it.

Note however that the problem wants to find the number of ways to partition the numbers. It is not a decision problem. We’ll instead redefine it to ask whether you can get a difference of at exactly m between the two sets. So we’re completely ignoring the number of ways to get that difference, we’re only interested in whether getting it is possible or not. We’ll see how to actually compute the number of ways later.

Balanced-Partitions \in NP

We’ll use choice, success and fail to describe it. I think that the simplest way to build the actual partition is to generate all possible 0/1 assignments and get the sums of the two resulting sets of numbers (those who were given a 0, and those who were given a 1).


M-Balanced-Partition(A, N, m):
  S1 = S2 = 0
  for i = 1 .. N:
    if choice(0, 1) == 1:
      S1 += A[i]
    else
      S2 += A[i]
  if abs(S1 - S2) == m:
    success
  fail

So, what this simple algorithm does is assign each element of A a value of either 0 or 1. If A[i] is 1, it’s in the first set, otherwise, it’s in the second set. By generating all of the possibilities using choice we will clearly determine whether or not obtaining the difference we need is possible.

The actual complexity of the algorithm itself is \Theta(N) since we’ll partition the array A into one go (lines 3-7). Since \Theta(n) is a polynomial, we conclude that Balanced-Partitions \in NP.

Balanced-Partitions \in NP-Hard

Let’s suppose that we want to know whether we can get a certain difference, m. Let’s first note that if the two sets are S_1, S_2 (note: we’ll also use S_1, S_2 when referring to the sums themselves as well!) and we call S_1 the smaller sum, S_2 - S_1 = m and S_1 + S_2 = S so therefore S_2 = \frac{S+m}2 and S_1 = \frac{S-m}2. In fact, no matter what we call these two sets, if we can form a set of sum \frac{S+m}2, we’ll clearly have obtained the other set as well (it’s this set’s complement!).

So, in fact, finding whether or not Balanced-Partitions has a solution is kind of equivalent to wanting to find whether we can get a certain sum, let’s call it Q. This, however, is the Q-Sums problem which is known to be NP-Complete. So, we’ll try reducing Q-Sums to Balanced-Partitions. If we prove this to be true, then Balanced-Partitions \in NP-Hard because it means that for any problem K \in NP, K \le_p Q-Sums (Q-Sums is also $NP-Hard$!) So, by the transitive nature of the polynomial reduction relationship (that’s what the funny \le_p symbol is) it follows that Balanced-Partitions \in NP-Hard too.

Warning! Possible rambling ahead. Feel free to skip the following paragraph (Also, the picture is only chuckle-worthy if visiting from Facebook!) : )

Well, these people balance stuff all the time!

It’s important to note that the key element is reducing a problem that is already known to be $NP-Hard$ to our problem. That means intuitively that our problem is at least as hard as the original problem. If we’d do it in the opposite direction, it would be meaningless! Why? Well, consider the problem (not a decision problem, I know, but I can’t think of a better example, maybe a helpful comment someone?) — sort an array of N integers. This clearly has lots of great polynomial time algorithms that solve it (my favorite being quicksort with a randomized pivot). But, we could equally well sort an array of numbers by generating all possible permutations of the numbers and outputting first one that we generate that is sorted. This would run in exponential time though, as generating the permutations of a set is in EXPTIME (to be taken with a grain of salt, since it’s not actually a decision problem). So, reducing a problem we know nothing about to a difficult problem achieves nothing.

Ok, so, If you’re still following at this point, let’s get on with it and try to do the reduction itself. We must prove that we can solve Q-Sums using Balanced-Partitions and that there can be no false-positive, i.e. for an input i \in I, I being the set of inputs for Q-Sums and a polynomial time algorithm/function F:I\rightarrow I', where I' is the set of inputs for Balanced-Partitions, Q-Sums(i)=1 \Leftrightarrow Balanced-Partitions(F(i))=1.

If we want to find a set of sum Q, we’ve seen that there existing a difference m is equivalent to there being a set of sum \frac{S-m}2. So, we’ll just map Q\rightarrow \frac{S-m}2 \Rightarrow m \leftarrow S-2Q and solve Balanced-Partitons for that value of m. It’s pretty clear that the two problems are equivalent I would say, to prove it, we just“go back” and forth through the relationship between Q and m.

Now, we’re finally content. Balanced-Partition is both in NP and in NP-Hard, and we can finally conclude that it is NP-Complete.

Back to the initial problem

Why go through all the trouble proving that it’s $NP-Complete$ anyway? Well, as a nifty exercise for one, but it also provides the essential insight that can help solve the initial problem from the USACO contest, divgold. There, we were tasked with finding the minimum difference as well as the number of ways to obtain it. Well, this illustrates the connection between this kind of optimum problems and their associated decision problem pretty clearly. To find the minimum difference, call it d, we’ll essentially start with d = 0 and keep increasing the difference until we find that Balanced-Partition(A, n, d) = 1.

How far are we supposed to go anyway? Well, for one, the difference between the two sets can be at most S for a very loose upper bound, (but I really think it should something much tighter, like perhaps the maximum value from A, but I didn’t manage to prove this, not at this late hour anyway; I’m not even sure whether or not it’s true to be honest). Anyway, the point is we’ll stop at some point, in at most N steps.

How to find out whether we can get a particular difference though? Well, here the insight from the NP-Hardness proof  comes in handy since the transformation we’ve done between Q and m is bijective, and that means intuitively that we can also use it to solve Balanced-Partition using Q-Sums. In English, we want to know whether it’s possible to obtain a sum Q = \frac{S-d}2. What would happen if S-d is odd? Well, we’d certainly not be able to get a rational sum out of integers. That doesn’t bother us one bit though as we’ll soon see.

This is where dynamic programming comes in. Q-Sums is a problem with a pseudo-polynomial time algorithm. That simply means that there exists an algorithm that finds the answer in polynomial time dependent on the numeric value of the input. That numeric value is exponential in terms of bits however!

(neat fact: NP-Complete Problems that can be solved by such algorithms are called weakly NP-Complete. These are as far as I understand the only NP-Complete problems where dynamic programming can be applied. If however, the numbers in the array A had been real numbers, such a solution would not have been possible. And by real numbers I mean actual real numbers, not IEE754 floating point, which are in fact rational.)

To use dynamic programming, we need to define an optimal sub-solution’s structure. This structure will be used do determine the larger sub-solutions, until we finally have the whole solution! If I mess up the explanation, feel free to use the extremely helpful tutorials of Brian Dean. He actually has many more dynamic programming examples there you might find interesting.

So, we’re going to solve Q-Sums. We need the following structure (usually a matrix that adequately describes what exactly a sub-problem is with a couple of indices):

N[i][j] = the number of ways to get the sum j using the first i numbers. To actually solve it, we need a recurrence relation. We need to think how to get from one state to the next, either with a top-down or a bottom-up approach (i.e. either forward or backward). Let’s look “back”, at N[i-1][j]. We’re at the i-th number and we can either chose to add it to the sum or not.

If we don’t add it, then we need to get the same sum j, without the current number, A[i], so we add N[i-1][j]. Otherwise, to get j=(j-A[i])+A[i] in total, we’ll need N[i-1][j-A[i]]:

Recursion formula – N[i][j]=N[i-1][j]+N[i-1][j - A[i]]
The base case being – N[0][0]=1

Note that we actually count the number of ways to get the desired sum here. It could have been the same decision problem we talked about if we performed the same calculations modulo 2 (or used a Boolean data type and interpreted the ‘+’ operation to mean logical OR).

At any rate, for an array $A$, of length n, the answer to Q-Sums will be in N[n][Q]. We want to check for \frac{S-d}2 in a loop from d=0. Also, \frac{S-d}2) needs to be even.

for (j = 0; (s - j)/2 >= 0; ++ j)
 if ((s - j) % 2 == 0 && N[n][(s - j)/2] > 0)
 break;

And so, the minimum difference will be j and the number of ways to get will be N[n][(s - j)/2].

Final remarks

You can in fact optimize this solution space-wise a lot ( although this completely escaped my mind in the contest and I didn’t get max for this problem 😦 ). Notice that we don’t use all of the rows in the matrix at once, we only use the final two, and really, we only need one row since we’re only interested in N[n][\ldots] and all of the updates are done in place.

So, we can finally write a much better recurrence:

N[j]=N[j]+N[j - A[i]] where N[0] = 1

where N[j] is the number of ways to get a subset of sum j.

You can find all of the implementations here.

Also, dear reader, I also need your help with a couple of questions:

  1. What happens (in the NP-Hardness proof) when Balanced-Partition is taken to require the partitioning into two subsets whose sums have an absolute difference of at most m? Basically replace the equality with an inequality.
    I was thinking about calling Balanced-Partition twice with m and m-1 to see if you get different answers, If you do, then you might not get the Q in the Q-Sum you were looking for. But does that mean it doesn’t exist?
  2. Is it true that the maximum difference between the two partitions in the optimum case can be no larger than the maximum number in the array A?
    I feel that this is true, but can’t think of any good reason right now. Assistance appreciated 😉
Advertisements
3 Comments leave one →
  1. dafis permalink
    29 January, 2011 12:38 AM

    Re question 2: Yes, it’s true (assuming all numbers are non-negative, figure out what has to change if negative numbers are allowed).
    Suppose A is sorted in decreasing order. Put the first (largest) number in set 1.
    Put the next numbers in set 2 until either A is exhausted or the sum of set 2 exceeds the sum of set 1. In the first case, the difference is at most A[0], otherwise it’s at most A[k] if that’s the last number you put in set 2.
    If there are numbers left, put the next numbers in set 1 until either A is exhausted or the sum of set 1 exceeds the sum of set 2.
    If there are still numbers left, switch again and put the next numbers in set 2 until you-know-what.
    Iterate until A is exhausted. If the largest number is at most half the sum, the minimal difference is at most the smallest number in the array.

  2. dafis permalink
    29 January, 2011 12:41 AM

    Arrgh, not the smallest in the array; the smallest number you put in the set with the larger sum, d’oh.

  3. danf permalink*
    30 January, 2011 7:13 PM

    Good idea. I see how this works for a 3 number set, but I’ve failed to prove that it works on a more general case with induction.
    I’ll hopefully look into it at some point. For 3 numbers though, the resulting difference is indeed smaller than the smallest possible element in the array. It’s like interspersing pluses and minuses and asking how to do that so you get the smallest difference.

    This is the actual greedy algorithm for an approximation I think. If it’s true for this case, it surely applies for the optimal one, although I still need to figure out how to actually prove it.

    Thanks for the help dafis!

    Oh, and I gave up on Criterion for measuring the performance. It turns out that I just want the runtime of an operation and Data.Time can easily do that 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: