Facility location : not so difficult (with the proper tools)

In a previous post, I generated a few capacitated facility location instances (CFLP) and I ran these through MIP solvers. The instances were solved pretty quickly overall. In a comment, professor Matteo Fischetti suggested I compare my results with some instances from the literature, which I did. Overall, the results are quite fast; CPLEX takes only 21 minutes on average to solve the largest instances (1000 facilities, 1000 customers). When the paper was published (received for publication in 2007) these problems were intractable.

Experiments & Results

For this test, I used the 100 instances from Avella & Boccia (2009). These are grouped in 5 sets of 20 instances each, having between 300 and 1000 facilities and customers, respectively. The data for the instances are available here; I simply wrote a program to parse the input file and send them to the solver. For those who wonder, I did check that the reported optimal values in Avella & Boccia match with the values I got in my tests. Each instance is run on CPLEX 12.6.2 with default parameters on a recent 4-core machine. I computed the geometric means of solution times (in seconds) for each group of 20 instances:

CFLP results from Avella & Boccia (2009)

The 300 x 300 instances are solved pretty quickly, with 9 seconds on average. The 300 x 1000 and 500 x 500 are solved under a minute of run times and even the largest only take 21 minutes on average (the longest takes about 56 minutes). Interestingly, 35% of the 300×1000 instances are solved directly at the root node through a combination of implied bound cuts and heuristics.

What do these results mean?

If we refer to the more direct application of CFLP, that is, deciding the location of warehouses or production facilities in a supply chain, even 300 facilities is a pretty large number. Also, as facility location is considered a strategic problem, 20 minutes to get an optimal solution is very acceptable. For direct application purposes in the supply chain domain, the solvers are performing well enough that customized algorithms are not necessary. This being said, the CFLP is often embedded as a subproblem in larger supply chain network design problems, and as such, specialized algorithms may still be necessary in some specific contexts.

Of course, these results benefit from the tremendous advances in both hardware and software. It merely shows that the state-of-the-art tools used for optimization are getting better, and they are improving pretty fast. I also ran the tests on Gurobi and the results are pretty similar.

[1] P. Avella and M. Boccia. A cutting plane algorithm for the capacitated facility location problem. Computational Optimization and Applications, 43(1):39–65, 2009.

Facility location : mistake, issue and results

About two weeks ago, I generated a few capacitated facility location instances (CFLP) for some students to play with. When I ran these through the CPLEX and Gurobi solvers, all of them were solving very quickly. Gurobi in fact seemed to find the optimal solution during the presolve stage. I made a quick blog post about it, but I should have been more thorough in my validation of results.

I am commited to provide accurate results on this blog, so please accept my most sincere apologies for not validating sufficiently enough before posting it here. I also thank Professor Andrea Lodi and his students for spotting the source of the error in the data files. I also wish to thank Professor Matteo Fischetti for expressing doubts about the results of my previous post.

Source of the error

When preparing test problems like this, I usually create LP files instead of creating instances in memory as it is easier to distribute them to students and to run the models through different solvers. I also found that students have a much easier time reading LP files rather than MPS. The LP format has its disadvantages, however, as it can be interpreted differently between solvers. In particular, I forgot to put a space between the coefficient and the last variable in a constraint (for instance 1.255Y1 <= 1 instead of 1.255 Y1 <= 1).  CPLEX interprets it the way it was intended and separates the coefficient 1.255 from the variable Y1 even though there is no space, but Gurobi assumes 1.255Y1 is a new variable that is different from Y1. To be clear, the mistake is mine, not Gurobi’s, but differences in interpretation are sometimes difficult to spot.

When I generated the models, I ran them through CPLEX first, and checked the solutions. Then, out of curiosity, I ran the same set of models through Gurobi, and the results were much faster. I assumed wrongly that the models were fine, so I did not check the solutions out of Gurobi. What is very ironic is that I wrote a blog post about this very issue less than two years ago and I failed to follow my own advice this time. The corrected LP files are now available here.

Corrected results

Running the corrected moels yields a different picture. Most models are solved quite quickly and efficiently by the solvers, abeit not during the presolve stage.

  • 38% of the models were solved at the root node, usually by a combination of cuts and heuristics;
  • 68% of the models are solved in less than a second;
  • 78% of the models are solved in less than a minute.

The largest instance that is solved at the root node has 75 facilities and 250 customers, while the largest instances that are solved in less than a second are of sizes 75 x 250 and 80 x 150, respectively. The largest instances in the set (100 facilities and 2000 customers) take about 1 to 2 hours to solve on Gurobi. CPLEX is in the same ballpark in terms of solving times.

Is that fast?

The interesting question that arises is “are these solving times fast enough?” to be actually used in practice. There are many potential applications to the CFLP, but I will focus on supply chain management. While benchmark instances are sometimes much larger than the models solved in this test, I have never encountered or heard of a company who wanted to locate or relocate 100 or more facilities simultaneously. Usually, the number of potential locations is much smaller, but the problem is made complex by adding more types of decision such as alternate transport mode selection, alternative capacities or technologies and facilities, and so on. These interdependent decisions, more than sheer size of the instance, makes the problem much more difficult to solve.

Of course, in general, faster is always better, and there is place for faster customized algorithms that solve the biggest instances in a few seconds. But if you are working on a project to relocate 100 warehouses, my bet is that you can wait 90 minutes to get the optimal solution.

Facility location : presolved to optimality!

  ** IMPORTANT NOTICE ** This post has temporarily been suspended as some readers noticed potential problem with the model files. I will issue a corrected post shortly. I am sorry for any inconvenience. Marc-André … [Continue reading]

It’s time talk more about barrier in class

In today's post, I argue that optimization classes and textbooks should put a greater emphasis on interior point methods. During my many years of study, I have been exposed to quite a bit of simplex theory. I also had to perform many primal and … [Continue reading]

Just a LP, really?

In recent talks in scientific conferences, I saw some people dissmiss pure linear programming (LP) models as being easy and primitive. I often got said "oh, that's just a LP". Linear programming is put in the box of solved problems, like some network … [Continue reading]

OR in the field: two challenges

This post presents some thoughts about doing applied operations research together with companies. It's a lot of work, to be sure, but it's also a lot of fun. Some of my colleagues tease me about not being often at my office. While it's true that I … [Continue reading]

When optimal is not enough

Currently I work with Professor Mikael Rönnqvist from Université Laval on an inventory and transportation management problem from a forest products company.  As a large scale commodity products company, much of its focus is on reducing logistics … [Continue reading]

Exam design problem

For students, the final exam week is probably the most stressful time of a semester. They often have multiple exams over consecutive days and need to split their last-minute study and preparation time over these topics. I must confess that I also … [Continue reading]

The day I put my hand into the shredder

...or so to speak. Recently, I've been working (in collaboration with two people) on a production planning model for a network of sawmills. It is not yet customized for a particular company. We came to this particular model after a few meetings, and … [Continue reading]

Performance variability: consequences


If you work on difficult mixed-integer programming (MIP) models, you have experienced performance variability. This refers as the impact on performance of seemingly performance neutral changes induced in the model or the solver. Part 1 of this series … [Continue reading]