Solve Knapsack Problem with Heuristics in R

In this post I’ll show a way how to solve the Knapsack Problem applying a simple greedy improvement heuristic and how to implement it in R.

What is the Knapsack Problem?

On Wikipedia, you find the following definition: “The knapsack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.”

The Knapsack Problem Instance ‘KS_P08.txt’ can be downloaded here. The format is as follows:

  • Number of objects (n)
  • Total capacity of Knapsack
  • The weight of the n objects
  • An empty line
  • The value of the n objects

The complete R script explained in the following sections can be downloaded from my github account.

How to adress the Problem?

How would you start implementing a solution to fill a knapsack with a given capacity, in order to construct proposals for good solutions having a good total value of the items packed?

Heuristic 1: The first idea one could come up with would be to sort the items descending by value, then start with the first item, if it fits, pack it, then check the next in the list and so on, until no further item can be packed.

Let’s try to solve the following Knapsack instance. Let’s say we have a total capacity of 10 kilos and 4 items with the following weights and values:

  • Item 1: Weight 11 kg, Value 10 EUR
  • Item 2: Weight 9 kg, Value 9 EUR
  • Item 3: Weight 2 kg, Value 8 EUR
  • Item 4: Weight 3 kg, Value 7 EUR

Finding: With this instance we immediately realize a downside of our first try Heuristic 1. If we were to solve the knapsack instance above, we would pack item 2 and stop with a knapsack of a weight of 9 kg and a total value of 9 EUR.

The problem of the first heuristic is obviously that it considers the item’s value but it doesn’t consider weight. Now let’s try a second heuristic, considering value and weight:

Heuristic 2: Instead of sorting the items descending only by value, we sort them descending by value per weight. We check the first item and if it fits, pack it, then check the next in the list and so on, until no further item can be packed.

Let’s try to solve the following Knapsack instance. We again have a total capacity of 10 kilos and 5 items with the following values:

  • Item 1: Weight 6 kg, Value 36 EUR, Value per Weight 6 EUR / kg
  • Item 2, 3, 4: Weight 3 kg, Value 15 EUR, Value per Weight 5 EUR / kg

Finding: Obviously our second heuristic is better than the first one. However, with the 2nd instance above we can detect a drawback of our Heuristic 2. If we were to solve the knapsack instance above, we would pack item 1 and then stop with a stop with a knapsack of a weight of 6 kg and a total value of 36 EUR. This is suboptimal, since if we didn’t greedily pick the first item with the best value/weight ratio, we could have realized a total value of 45 EUR and a total weight of 9 kg by packing the items 2, 3 and 4 instead.

Heuristic 3: We can try to work arkound this issue by determining the maximum between the value obtained by Heuristic 2 and the most valuable item fitting into the empty knapsack.

Now let’s start the implementation:

#' return either best greedy combination of objects with highest value per weight 
#' or the most valuable single item fitting into the knapsack. 
#'
#' @param knapsack 
#' @param capacity 
#'
#' @return the solution
mod_greedy <- function(knapsack, capacity) {
  # 1) "Smarter greedy"
  # sort by value per weight
  knapsack <- knapsack[order(knapsack$vpw, decreasing = TRUE),]
  # now fill knapsack with all items taken in this sequence
  # and still fitting into the knapsack
  packed_items <- c()
  packed_weight <- 0
  packed_value <- 0
  for (id in 1:nrow(knapsack)) {
    item <- knapsack[id, ]
    left_capacity <- capacity - packed_weight
    # check if next item in the sequence still fits into knapsack
    if (left_capacity > 0 && item$weight <= left_capacity) {
      packed_items <- c(packed_items, item$item_id)
      packed_weight <- packed_weight + item$weight
      packed_value <- packed_value + item$value
    } 
  }

  # 2) Addition to "modified greedy"
  # determine most valuable single item fitting into knapsack
  knapsack <- knapsack[order(knapsack$value, decreasing = TRUE),]
  best_single_item <- 0
  best_single_value <- 0
  best_single_weight <- 0
  for (id in 1:nrow(knapsack)) {
    item <- knapsack[id, ]
    # check if next item in the sequence still fits into knapsack
    if (item$weight <= capacity) {
      best_single_item <- item$item_id
      best_single_value <- item$value
      best_single_weight <- item$weight
      break
    } 
  }
  
  # now return best configuration
  if (best_single_value < packed_value) {
    ret_val <- list(items_packed = packed_items, total_value = packed_value,
                    packed_weight = packed_weight)
  } else {
    ret_val <- list(items_packed = c(best_single_item), total_value = best_single_value,
                    packed_weight = best_single_weight)
  }
  return(ret_val)
}

# load knapsack instance KSP8
dat <- read.table(file = "KS_P08.txt")
n <- dat[1,1]
capacity <- dat[2,1]
weights <- dat[3:(n+2),1]
values <- dat[(n+3):nrow(dat),1]

knapsack <- data.frame(item_id=1:n, weight=weights, value=values, 
                       vpw=values/weights)

# solve it: mod greedy
a <- mod_greedy(knapsack = knapsack, capacity = capacity)
print(a)

Method mod_greedy does what we have described as Heuristic 3. It first identifies the best configuration of items to be packed based on the itemlist sorted by value per weight in descending order. Then, in a second step, it determines the single most valuable item and compares its value with the configuration found in first place and returns the best solution of both.

How good is the Solution we’ve found?

If you run the R script above with the problem instance KS_P08.txt, you will get the following output:

$items_packed
 [1]  6 22 24 16 11  1 10  2 20 17  4 23 13  7

$total_value
[1] 13415886

$packed_weight
[1] 6323699

One of the most important key questions to ask when developing heuristics is how well the solution will perform. How far from the optimum is the solution provided by a heuristic?

For manageable problem instances, one usually tries to find an answer to this question by solving it with linear programming. In this case we already know the optimal solution:

Value: 13549094
Items: 1 1 0 1 1 1 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 1 1 1

So we know, that our solution is only about 1 percent away from the optimal solution, which is extremely good, given the fact, that we only used a construction heuristic!

  • To improve the result we could now implement a local search and define allowed moves, to modify the existing solution.
  • We could then do greedy local search and only follow the path as long the solution can be improved or use some meta heuristic like randomized local search, simulated annealing, genetic algorithm, tabu search or the like, which allow to escape from local optima applying different techniques

For now, we’re happy with the result we got by just applying this simple heuristic. In a future post I’ll probably try to improve the solution by implementing an improvement heuristic.