Is it really possible to compare optimization algorithms?

In the general case, which solver/method is the best? Is a question that I must have been asked 50 times, in class or while discussing with colleagues. I can only imagine how many times Hans Mittelmann has heard the question. Nathan Brixius already made a very interesting post about this very issue, and I’m not going to repeat what he said.

Yet, I don’t believe that it’s possible to give an answer that is both fair to the different optimization algorithms and correct from a scientific point of view. Some theoretical [1] and empirical [2] evidence point in that direction. Nathan’s post brilliantly outlined the inherent difficulty of defining “better” solver. I also argue that it is very difficult to compare algorithms on a fair basis.

  • Designer experience. Let’s say I and François Soumis design column generation algorithms for a particular problem. François Soumis’ algorithm is likely to be faster since he’s an expert at CG and I’m not.
  • Programming language. If you code a specific optimization method in C while another uses an interpreted language like standard Python, the exact same algorithms won’t have the same performance.
  • Test environment.  Aside from the fact that two research teams will almost never have access to comparable testing environment (similar machines and conditions), modern processors use techniques such as HyperThread and TurboBoost that affect the order in which instructions are executed. So long for the deterministic run times.
  • Stochastic algorithms. Any algorithm that is not completely deterministic will yield different performance when running multiple times. Ask the Gurobi guys how challenging it is to do this for parallel algorithms nowadays.

What about solvers?

Theoretically, it might be possible to give a fair shot to different solver codes executed over a set of instances on a given machine, but the conclusions would only be valid for this set of instances, not for the general case. Model difficulty can vary considerably between same sized instances of the same problem. When a user interacts with the solver, experience matters. An experienced user can detect performance problems, will know how to tune parameters or refine his model, while less experienced or capable users won’t.

Of course, some algorithms and solvers tend to solve a lot of models quite fast while other codes don’t. In a given context, one may be strictly better. But general conclusions are hazardous at best.

[1] Wolpert, David H., and William G. Macready, 1997. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.
[2]Koch, T., T. Achterberg, et al. (2011). “MIPLIB 2010.” Mathematical Programming Computation 3(2): 103-163.


  1. Another factor is programming experience with a given language. Knowing which data structures are fastest, or how to copy one array to another in one gulp rather than with a loop (or nested loops) can make a difference. So can coding practices such as adding checks for data errors; I generally recommend defensive programming, but it does add drag.

    • So true. It reminds me of my first version of a code for my M.Sc. in which a function copied a large array into memory each time it was called. Just changing a copy to a memory reference changed everything.

  2. I agree with most of what you say except on determinism. Recent versions of CPLEX do offer totally reproducible behavior. I blogged about this and more her:

    • Hi JF,
      Thank you for your comment. I understand what you do say and I agree to some extent with you. From my modest experience CPLEX performance on 4+ threads is more stable in release 12.4 than it was with 12.1. I haven’t tried 12.5 yet. Providing the ability to input a seed will definitely help assessing that reproductibility of results.

      That being said, the comment on stochastic algorithms was more aimed at stochastic local search algorithms such as GRASP, Genetic Algorithms, and so on. I will update the post to make that clarification.

  3. I addressed the issue of computer language over 20 years in this paper “The Influence of Computer Language on Computational Comparisons: An Example from Network Optimization” . What was interesting back then was not only did the comparative results depend on the chosen language (there it was C versus Fortran), but the relative results changed based on what platform you did the testing on.

Leave a Reply to Irv Lustig Cancel reply


This site uses Akismet to reduce spam. Learn how your comment data is processed.