In-house testing of CPLEX distributed MIP

Distributed optimization is now part of standard CPLEX releases. In fact, it’s been there since release 12.5. I decided to do some experiments and post on this after seeing Daniel Junglas’ talk at the German OR Conference in September, but writing the post has been delayed until recently. I summarize the results of my experiments and compare those results with what has been discussed at these presentations. In today’s post, I talk about some tests I ran with their distributed MIP algorithm. In the past months I made similar experiments with Gurobi 6.0.

Distributed CPLEX in a nutshell

Over the last several years, the CPLEX development team has achieved huge successes in adapting its mixed-integer programming (MIP) algorithms to the multi-core architectures that are now prevalent in personal and office computers. It is now possible – and in fact quite easy – to use multiple machines to work on solving any particular problem. Two algorithmic approaches are provided:

  • In concurrent optimization, the problem is run independently on each machine, but with different parameter settings or random seeds;
  • In distributed mode, the search is divided in two phases. In the first phase, also called ramp-up phase, the problem is started on each machine with different settings. At the end of this phase, the concurrent execution is stopped on all machine, and the MIP search tree from the machine that has made the most progress is divided among the machines, which then work collaboratively.

Note that unless instructed to do otherwise, as of CPLEX 12.6.1, the concurrent optimization mode (also called “infinite ramp-up”) is used as default.

How to do it?

Setting up a distributed MIP environment isn’t very complex. You need 2 or more machines, ideally of similar performance. The following steps

  1. Install CPLEX on each machine;
  2. Start a cplex worker process on each machine;
  3. Make sure cplex has access through the firewall;
  4. Create a configuration file on the master machine defining the pool of workers;
  5. Start CPLEX on the master machine, load a model, then solve!

How well does it perform?

Short answer: it depends. In his GOR 2014 talk, Daniel Junglas report significant speed-ups: from 11% to 82% speed-up on a wide range of models that take more than 10 seconds to solve on a single machine. I have tested the distributed MIP on two types of problems. The first model I ran was a large-sized deterministic supply chain design problem. It is based on the data provided by one of our partner companies at FORAC, the research center where I do most of my research. Running this difficult model on four machines resulted in a reduction in about 3% of the wall clock time compared to a single-computer version.

I also tested a group of large lot-sizing models (set P256-T64-C2, which you can download here). These require processing a large number of nodes to solve, which typically a sweet spot for distributed algorithms. As it turns out, the single-computer algorithm is faster on average. Out of the 10 models, 6 are solved faster on a single computer than on the four-machine version. Geometric mean is also 12.7% higher for the 4-machine distributed MIP than for the single computer version.

This is by no means a thorough comparison, but it shows that distributed algorithms may or may not be beneficial to the type of models you are working on. There is no substitute to testing it yourself to see how it works out.

Many thanks to Daniel Junglas, Ed Klotz and Jean-François Juget for their help in setting up the distributed MIP environment.

In-house testing of the Gurobi 6 distributed MIP [dual]

Distributed optimization is now part of the Gurobi 6.0 release. In today’s post, I talk about some tests I ran with the distributed MIP algorithm. The associated primal post discusses results presented by Gurobi at the INFORMS 2014 Annual conference.  Don’t worry, the next set of posts will cover the same topic, this time with CPLEX.

Overall, it works. Three conclusions stand out from my early experiments:

  1. Use similar machines
  2. Don’t put slower machines in the worker pool
  3. Test your assumptions

The importance of similar machines

The first problem I ran was a large-sized deterministic supply chain design problem. It is based on the data provided by one of our partner companies. It is solvable by Gurobi 6 (and also by CPLEX), but it is difficult, as it takes about 17,5 hours (63,840 secs)on a 4-core machine. I set up a worker pool of 5 available machines : four recent 4-core machines and an older 8-core machine. After a few hours of computation, the 4 recent machines were idle while the slower machine was still working. I finally killed the process 24 hours later: the algorithm was still in the ramp-up phase, as the four fast machines were waiting for the slow counterpart to finish. I removed the old machine from the worker pool, and took the solving time down to 14,5 hours (52,300 secs), a decrease of about 3 hours. I ran some other tests on easy models, but the conclusion holds: you are better off with a single fast machine than with a fast machine teamed up with a slow machine.  If you feel you absolutely have to setup this worker pool, then use concurrent MIP so your faster machine does not get idle waiting for the turtle to finish. To be fair, Gurobi strongly advises in their distributed MIP documentation to use similar machines, but the drag on performance is stronger than what I anticipated by reading the documentation.

Test your assumptions

The supply chain design I tested requires the exploration of less than 5000 nodes. That is not much, so I tested a group of large lot-sizing models (set P256-T64-C2, which you can download here). These typically require processing a large number of nodes to solve, which makes them good candidates for distributed MIP. My hypothesis was that the speedup would be greater than with the supply chain model.

As it turns out, the opposite is true. 7 of the 10 models solved faster on a single machine with default settings than on the pool of 4 machines (including the aforementioned machine). Geometric mean is also smaller (1793 seconds) for a single machine than for the distributed 4 machines (2338 seconds), which is 18,4% slower. It seems the models are solved while the distributed algorithm is still in the ramp-up phase. It could be good if there was a way to tell the algorithm when to swicth from ramp-up to distributed solving mode. Maybe changing the high-level settings such as MIP Emphasis towards improving bounds help in this regard, but I haven’t tried yet.

Overall usability

If you are like me and you ever implemented some distributed optimization algorithm on your own before, using the Gurobi distributed MIP feels like a walk at the park. If you use Windows machines, installing Gurobi on the worker machines and making sure Gurobi has access through the firewall is what will take the most time, which is a few minutes per machine. If you ghost the machines, it is even easier to do. Just remember that the algorithm seems to be designed for clusters of identical machines, so now matter how geek it looks, it might not be a good idea to use your laptop, your gaming desktop and your lab’s server to run Gurobi 6 in distributed mode.

Distributed MIP solving reaches Gurobi [primal]

Distributed optimization has now reached the point where it is incorporated into general-purpose mixed-integer programming (MIP) solvers such as CPLEX and Gurobi. In today's post, I will summarize the implications of some results presented at the … [Continue reading]

Inventory routing when it matters

Leandro CC Seminar

Yesterday we had the pleasure of welcoming Prof. Leandro C. Coelho for a research seminar at the Forac research consortium. He presented his work on the inventory routing problem (IRP) which is a combination between the vehicle routing problem (VRP) … [Continue reading]

Academia and software neutrality

When I started my PhD, the only MIP solver that was used at the lab was Cplex. At that time, it was considered the best by the people I worked with. I say "considered", because I took this assumption as a fact without actually testing it. I could … [Continue reading]

Using the CPLEX tuning tool on difficult models

The question on how a tuning tool can help solve difficult optimization models has been intriguing to me. For those new to the concept, the tuning tool tries different parameter settings and seeks to find good parameters for the solver for a given … [Continue reading]

OR-Enabled Collaborative Logistics

Today's post is about collaborative logistics, one field of study in which I recently started to work in. It is a tricky field for many reasons, and I will try to explain why and how. Operations research is a key enabler of collaboration. Logistics … [Continue reading]

Seeds and parallel MIP solvers, extended edition

SLSM-2000seeds-CPLEX126-4t

In a previous post, I showed that changing the random seed on a given problem yielded varies solution times by between 20% to 40% on average, depending on the solver and the setting. I decided to push that experiment a little further and sample the … [Continue reading]

Dynamic pricing horror stories

The hotel industry has accepted these principles behind revenue management and dynamic prices. Many of these dynamic pricing strategies are driven by OR- or analytics-based models, too. Too much of a good idea can turn into the opposite of the … [Continue reading]

ILS 2014, a buffet of logistics and OR

ESTELLE_MAERSK

This post was originally written for OR Complete. I re-post it here in case you missed the original posting. ILS - a short name for the International Conference on Information Systems, Logistics and Supply Chain, is a bi-annual conference focusing … [Continue reading]