In-house testing of the Gurobi 6 distributed MIP [dual]

Distributed optimization is now part of the Gurobi 6.0 release. In today’s post, I talk about some tests I ran with the distributed MIP algorithm. The associated primal post discusses results presented by Gurobi at the INFORMS 2014 Annual conference.  Don’t worry, the next set of posts will cover the same topic, this time with CPLEX.

Overall, it works. Three conclusions stand out from my early experiments:

  1. Use similar machines
  2. Don’t put slower machines in the worker pool
  3. Test your assumptions

The importance of similar machines

The first problem I ran was a large-sized deterministic supply chain design problem. It is based on the data provided by one of our partner companies. It is solvable by Gurobi 6 (and also by CPLEX), but it is difficult, as it takes about 17,5 hours (63,840 secs)on a 4-core machine. I set up a worker pool of 5 available machines : four recent 4-core machines and an older 8-core machine. After a few hours of computation, the 4 recent machines were idle while the slower machine was still working. I finally killed the process 24 hours later: the algorithm was still in the ramp-up phase, as the four fast machines were waiting for the slow counterpart to finish. I removed the old machine from the worker pool, and took the solving time down to 14,5 hours (52,300 secs), a decrease of about 3 hours. I ran some other tests on easy models, but the conclusion holds: you are better off with a single fast machine than with a fast machine teamed up with a slow machine.  If you feel you absolutely have to setup this worker pool, then use concurrent MIP so your faster machine does not get idle waiting for the turtle to finish. To be fair, Gurobi strongly advises in their distributed MIP documentation to use similar machines, but the drag on performance is stronger than what I anticipated by reading the documentation.

Test your assumptions

The supply chain design I tested requires the exploration of less than 5000 nodes. That is not much, so I tested a group of large lot-sizing models (set P256-T64-C2, which you can download here). These typically require processing a large number of nodes to solve, which makes them good candidates for distributed MIP. My hypothesis was that the speedup would be greater than with the supply chain model.

As it turns out, the opposite is true. 7 of the 10 models solved faster on a single machine with default settings than on the pool of 4 machines (including the aforementioned machine). Geometric mean is also smaller (1793 seconds) for a single machine than for the distributed 4 machines (2338 seconds), which is 18,4% slower. It seems the models are solved while the distributed algorithm is still in the ramp-up phase. It could be good if there was a way to tell the algorithm when to swicth from ramp-up to distributed solving mode. Maybe changing the high-level settings such as MIP Emphasis towards improving bounds help in this regard, but I haven’t tried yet.

Overall usability

If you are like me and you ever implemented some distributed optimization algorithm on your own before, using the Gurobi distributed MIP feels like a walk at the park. If you use Windows machines, installing Gurobi on the worker machines and making sure Gurobi has access through the firewall is what will take the most time, which is a few minutes per machine. If you ghost the machines, it is even easier to do. Just remember that the algorithm seems to be designed for clusters of identical machines, so now matter how geek it looks, it might not be a good idea to use your laptop, your gaming desktop and your lab’s server to run Gurobi 6 in distributed mode.

Comments

  1. Carlos Eduardo de Andrade says:

    Hi Marc,

    I have conducted several experiments using distributed CPLEX. I have observed that CPLEX has the same behavior you described for Gurobi: ramp up phase is too long and leads to idle machines. Although the documentation (of CPLEX) says that CPLEX should decide when stop the ramp up phase, it has not done this, at least in my experiments.

    My experiments consists of some hard scheduling problems. I have used 30 machines, 20 of then with 8 cores (16 threads) and 10 of then with 4 cores (8 threads). Using all 30 machines, I got the same behavior you observed. Using only identical machines, CPLEX never comes out of ramp up phase.

    Due to this, I set up a ramp up time limit. Particularly, I have use 10% of the total time allowed (2.4hr of 24 hr) and the results are far better than the default CPLEX setup.

    May you conducted the same experiment with Gurobi and report to us?

    My best regards,

    Carlos

    • Dear Carlos,
      Thank you for your comment. I am planning to post some results with distributed CPLEX soon. To my knowledge, there is no parameter in Gurobi 6 to limit or control duration of the ramp-up phase. On some of the models which take longer to solve with distributed MIP than single machine, it stays in concurrent mode until termination.

      Of course, you can do it by writing some more code, such as running the concurrent MIP algorithm for x seconds, then stop it. Controlling how the distributed algorithm runs from this point would require some use callbacks and fine-grain control of the solver, which was beyond the scope of the experiments I made.

    • Dear Carlos,

      CPLEX uses infinite rampup as the default. As you noticed, you can set wall-clock or deterministic time limits. But you can also let CPLEX decide when to switch by setting DistMIP.Rampup.Duration to CPX_RAMPUP_DYNAMIC (ie 1). Please refer to [1] for the details. I hope this helps.

      Best regards,
      Xavier

      [1] http://www-01.ibm.com/support/knowledgecenter/SSSA5P_12.6.1/ilog.odms.cplex.help/CPLEX/Parameters/topics/RampupDuration.html?lang=en

Trackbacks

  1. […] at the INFORMS 2014 annual conference by Dr. Ed Rothberg. You can see the full slide deck here. The associated dual post presents some of my own experiments with Gurobi 6.0.  Don’t worry, the next set of posts […]

  2. […] Distributed optimization is now part of standard CPLEX releases. In fact, it’s been there since release 12.5. I decided to do some experiments and post on this after seeing Daniel Junglas’ talk at the German OR Conference in September, but writing the post has been delayed until recently. I summarize the results of my experiments and compare those results with what has been discussed at these presentations. In today’s post, I talk about some tests I ran with their distributed MIP algorithm. In the past months I made similar experiments with Gurobi 6.0. […]

Speak Your Mind

*