In an industrial application, one has often to solve similar instances of the same mixed-integer linear programming (MIP) model. Furthermore, from one model to another, a large proportion of the data is similar. Because of the heuristic nature of MIP computation, these very similar instances could behave quite differently when one tries to solve them. This post explores the variance in solving times for otherwise very similar models; I use a lot-sizing case study.

### A simple experiment

For instance, let’s assume that we use optimization to support a company’s weekly production planning process. For those interested in theory, the actual model we might want to solve would be a capacitated lot-sizing problem with setup times. Although a production plan is implemented for the week, we would plan for two to three weeks in order to have some visibility on what to schedule. For one company, the actual demand for each products could change from day to day, but most of the problem data (setup, processing times and capacity) should not change much from time to time. I created three sets of 100 production planning models, with respectively 30, 40 and 50 products and 14 daily periods. Each of these sets of models have the same structure: capacities, setups and product data, while costs and demands vary from week to week although they are generated with the same probability distributions. I allowed an absolute maximum of 2 hours of wall clock time to solve these models. The cumulative proportion of models solved within a given time is shown in the Figure below. I use run times instead of ticks because an user would set up a time limit rather than a tick limit. Results shown are with CPLEX 12.5, but results with Gurobi 5.6 are very similar.

The vast majority of the models are solved within 1 minute. That being said, between 3 and 7 models of each set required more than one hour to solve. Very few models require more than a few minutes to solve, while some were still not solved after two hours. The curve is somewhat similar for 30, 40 and 50 products, except that the larger models naturally require more effort to solve. The table below lists the median, mean and standard deviation of run times, in seconds, as well as some information about problem size. The distribution of run times is in fact heavily skewed to the right. We can also see the huge standard deviation in run times as well as means that are driven high by extreme results.

I believe this huge variance of run times for instances of the same model with similar data is one phenomenon that makes difficult to draw significant conclusions about what are the “best” parameters (or algorithms, for that matter) to solve a general class of instances. Next time, we will see what happens when we use “tuned” parameters for the solvers or if we do it “the easy way” by enforcing a given time limit.

Nice post. It would be interesting to know the gap for those problems that can’t be solved in 2 hours. If it is small enough then it might be good enough for the purpose of the application you’re considering.

Another comment. Given these problems are similar, it might make sense to tune the solver. Default settings are the best we found when testing on a wide range of problems. They can probably be bettered when looking at a particular problem class. It would probably yield a reduction of running times, but what would be interesting is if it brings a reduction of the variability of these running times.

These are very good ideas! In fact, I’m running the tuning tool on the first 20 and checking what is the result on the last 80 on both solution time and variability of run times. For production scheduling, a gap of 0.1% or better would be quite acceptable considering the limited accuracy of data on production and setup times.