In-house testing of the Gurobi 6 distributed MIP [dual]

Distributed optimization is now part of the Gurobi 6.0 release. In today’s post, I talk about some tests I ran with the distributed MIP algorithm. The associated primal post discusses results presented by Gurobi at the INFORMS 2014 Annual conference.  Don’t worry, the next set of posts will cover the same topic, this time with CPLEX.

Overall, it works. Three conclusions stand out from my early experiments:

  1. Use similar machines
  2. Don’t put slower machines in the worker pool
  3. Test your assumptions

The importance of similar machines

The first problem I ran was a large-sized deterministic supply chain design problem. It is based on the data provided by one of our partner companies. It is solvable by Gurobi 6 (and also by CPLEX), but it is difficult, as it takes about 17,5 hours (63,840 secs)on a 4-core machine. I set up a worker pool of 5 available machines : four recent 4-core machines and an older 8-core machine. After a few hours of computation, the 4 recent machines were idle while the slower machine was still working. I finally killed the process 24 hours later: the algorithm was still in the ramp-up phase, as the four fast machines were waiting for the slow counterpart to finish. I removed the old machine from the worker pool, and took the solving time down to 14,5 hours (52,300 secs), a decrease of about 3 hours. I ran some other tests on easy models, but the conclusion holds: you are better off with a single fast machine than with a fast machine teamed up with a slow machine.  If you feel you absolutely have to setup this worker pool, then use concurrent MIP so your faster machine does not get idle waiting for the turtle to finish. To be fair, Gurobi strongly advises in their distributed MIP documentation to use similar machines, but the drag on performance is stronger than what I anticipated by reading the documentation.

Test your assumptions

The supply chain design I tested requires the exploration of less than 5000 nodes. That is not much, so I tested a group of large lot-sizing models (set P256-T64-C2, which you can download here). These typically require processing a large number of nodes to solve, which makes them good candidates for distributed MIP. My hypothesis was that the speedup would be greater than with the supply chain model.

As it turns out, the opposite is true. 7 of the 10 models solved faster on a single machine with default settings than on the pool of 4 machines (including the aforementioned machine). Geometric mean is also smaller (1793 seconds) for a single machine than for the distributed 4 machines (2338 seconds), which is 18,4% slower. It seems the models are solved while the distributed algorithm is still in the ramp-up phase. It could be good if there was a way to tell the algorithm when to swicth from ramp-up to distributed solving mode. Maybe changing the high-level settings such as MIP Emphasis towards improving bounds help in this regard, but I haven’t tried yet.

Overall usability

If you are like me and you ever implemented some distributed optimization algorithm on your own before, using the Gurobi distributed MIP feels like a walk at the park. If you use Windows machines, installing Gurobi on the worker machines and making sure Gurobi has access through the firewall is what will take the most time, which is a few minutes per machine. If you ghost the machines, it is even easier to do. Just remember that the algorithm seems to be designed for clusters of identical machines, so now matter how geek it looks, it might not be a good idea to use your laptop, your gaming desktop and your lab’s server to run Gurobi 6 in distributed mode.

Distributed MIP solving reaches Gurobi [primal]

Distributed optimization has now reached the point where it is incorporated into general-purpose mixed-integer programming (MIP) solvers such as CPLEX and Gurobi. In today’s post, I will summarize the implications of some results presented at the INFORMS 2014 annual conference by Dr. Ed Rothberg. You can see the full slide deck here. The associated dual post presents some of my own experiments with Gurobi 6.0.  Don’t worry, the next set of posts will cover the same topic, this time with CPLEX.

From parallel to distributed

Parallel MIP solvers have been around for several years. In my opinion, the ability to adapt mixed-integer programming algorithms to the multi-core processor architecture is one of the industry’s major successes, leading to huge average speedups [1][2] without needing any kind of parametrization on the user’s part. Starting from release 6.0, Gurobi now offers distributed algorithms, allowing to pool resources from multiple computers to solve difficult MIPs. Gurobi allows for two types of strategic:

  • In concurrent optimization, the problem is run independently on each machine, but with different parameter settings;
  • In distributed mode, after a ramp-up phase, the search tree is divided among the machines.

What is now possible?

Up to now, solving very difficult MIPs required either writing your own distributed algorithms or running the problem on a big machine with multiple processors and high amounts of memory. Unfortunately, high-performance workstations are quite expensive these days : a machine with 16 cores and 128 GB of RAM will cost you much more (about 12,000$) than four 4-core machines with 32 GB of ram each (6,000$ total). It is also much easier to scale up by adding low-cost machines than by adding processors on a single machine. For commercial users, the distributed MIP licence is more expensive than a multi-core single-machine licence, but it allows for up to 100 worker machines (which do not need a licence). Chances are that you won’t have more than 100 machines available anyway.

Which strategy works best?

Short answer: it depends. Concurrent works best on some problems, while distributed works better on others. Four 4-core machines and a single 16-core machine provide comparable average speed-ups, being approximately 2 times faster on average than a single 4-core machine for models that take longer than 100 seconds to solve. Experiencing longer solving times using this configuration is also unlikely (happens only 2 times out of 27 on their testbed for models that take longer than 100 secs). Using concurrent optimization on 4 machines yields a similar portrait, as speed is on average 1.5 times faster than a single 4-core machine. Using 4 machines is also faster 17 times out of 20. In each case, adding more machines lead to faster solving on average, albeit with diminishing returns. If you have a 16-core machine, results indicate that you are better off – on average – to run the one-machine algorithm on 16 cores than run the distributed code on 4 four-core virtual machines.

If you really need to solve a difficult MIP from time to time, using distributed algorithms on several machines is one more option in your arsenal. It is very easy to use, and the Gurobi team has done a good job at explaining what it is and how to use it. If you don’t own this kind of machines, you can also rent some time on Amazon or Google’s cloud platforms – it is relatively cheap and allows scaling up.

In the associated dual post, I will share my experiences with using Gurobi’s distributed algorithms on my own instances. Stay tuned!

Inventory routing when it matters

Leandro CC Seminar

Yesterday we had the pleasure of welcoming Prof. Leandro C. Coelho for a research seminar at the Forac research consortium. He presented his work on the inventory routing problem (IRP) which is a combination between the vehicle routing problem (VRP) … [Continue reading]

Academia and software neutrality

When I started my PhD, the only MIP solver that was used at the lab was Cplex. At that time, it was considered the best by the people I worked with. I say "considered", because I took this assumption as a fact without actually testing it. I could … [Continue reading]

Using the CPLEX tuning tool on difficult models

The question on how a tuning tool can help solve difficult optimization models has been intriguing to me. For those new to the concept, the tuning tool tries different parameter settings and seeks to find good parameters for the solver for a given … [Continue reading]

OR-Enabled Collaborative Logistics

Today's post is about collaborative logistics, one field of study in which I recently started to work in. It is a tricky field for many reasons, and I will try to explain why and how. Operations research is a key enabler of collaboration. Logistics … [Continue reading]

Seeds and parallel MIP solvers, extended edition

SLSM-2000seeds-CPLEX126-4t

In a previous post, I showed that changing the random seed on a given problem yielded varies solution times by between 20% to 40% on average, depending on the solver and the setting. I decided to push that experiment a little further and sample the … [Continue reading]

Dynamic pricing horror stories

The hotel industry has accepted these principles behind revenue management and dynamic prices. Many of these dynamic pricing strategies are driven by OR- or analytics-based models, too. Too much of a good idea can turn into the opposite of the … [Continue reading]

ILS 2014, a buffet of logistics and OR

ESTELLE_MAERSK

This post was originally written for OR Complete. I re-post it here in case you missed the original posting. ILS - a short name for the International Conference on Information Systems, Logistics and Supply Chain, is a bi-annual conference focusing … [Continue reading]

Uncertainty, Models, and the Chaos Out There

Future is dominated by uncertainty. In terms of supply chain management – my main field of application – it comes in terms of variation in lead times or demand levels, or in the form of severe events such as hurricanes which can severely disrupt a … [Continue reading]