How problematic are your Big M constraints? an experiment [primal]

This post investigates whether it is still relevant to be careful about the coefficients used in so-called Big M constraints when formulating mixed-integer programming (MIP) models. I make some experiments with two-echelon supply chain design models to show that using too large values often cause the models to be harder to solve. The impact is much greater on less-performing solvers than on commercial solvers.  In the associated dual post, I explain what are Big M constraints and why they tend to cause problems for solving MIPs.

A bit of context

When I started learning about integer programming in the early 2000s, I often got the following advice regarding modelling using Big M constraints:

  1. Use as few Big M constraints as possible;
  2. If you need to use them, make the coefficients (the Ms) as small as possible.

I always kept this advice but I never really tested the assumptions in practice. It’s a fact that solvers are several orders of magnitude faster than they were when I learned modelling. So is this advice still relevant? I wanted to test it out.

The experiment

I put together three supply chain design instances. More precisely, they are Two-Echelon (hierarchical) Uncapacitated Facility Location models. These are of size 30 x 30 x 50 and are identical in structure to the models shown here except for the single source constraints which I did not use.  Each instance has thus 60 Big-M constraints, one for each facility, that ensures that the facility can process products only if it is built. I used three different levels of coefficients: (i) the smallest (tighest) possible value, (ii) a larger value but in the same order of magnitude as the smallest value, and (iii) a huge value that is at least 100 times larger than the best value.  The instances are identical except for these 60 coefficients. I run the models through three solvers: CPLEX, CBC and GLPK, with a time limit of 2 hours, and compute the geometric mean of run times (in seconds).
Keep in mind that these are the same supply chains! CBC is strongly affected by the change, needing on average 6.5 more time for option (ii) and a whopping 15.5 times for (iii). CPLEX is also affected but to a much lesser extent, by a factor of about 2 and 4. GLPK struggles with these models even when they are tightly formulated. When higher values are used for the constraints, GLPK simply can’t solve them within the allotted time, with a final gap between 5 and 18% depending on instances.

The commercial solver is not only faster, it is also less affected by the unnecessarily large coefficients. I didn’t post results with Gurobi but the performances are quite comparable with CPLEX. This also shows the canyon between free and commercial solvers in terms of performance.

If you found this post useful or insightful, please share it with your colleagues, friends or students!

Extra: Why choose this formulation?

For full disclosure, I mentioned in the dual post that the model I use is not the most efficient one for solving this problem. I used it anyways for two reasons:

  • The model is easy to understand, compared to more complex variants of supply chain design which have many types of binary variables.
  • The best (tightest) values of the Big M coefficients are very straightforward to obtain in this model.


What are Big M constraints? [dual]

This post presents a class of constraints that are used very often in mixed-integer programming (MIP) models. I explain what they are, why they are important and why using too large values for the big M is problematic. The associated primal post presents some experiments that outline how modern solvers handle unnecessarily large values of M.

What is a “Big M” constraint?

A big M constraint typically used to limit the value of a set of variables (usually continuous but sometimes integer) based on the value of a binary variable. It is very commonly used in MIPs to model if…then structures and relationships between decisions. Some examples include:

  • In supply chain design, a facility can only ship products (represented by continuous flow variables) if it is built (represented by a binary variable linked to a fixed cost).
  • In production planning, a lot of a given product can only be produced (the quantity being represented by a continuous or integer variable) if a machine setup is made to produce that product (represented by a binary variable).

The form of the constraints is usually the following, where X are the associated continuous or integer variables, Y a binary variable and M a large enough coefficient. The M must be large enough so as to let the model choose appropriate values for the X variables if Yj is set to 1.

Why are Big M constraints important?

Besides the fact that they are widely used in many classes of models, the mere presence of these constraints usually have the effect of making a model much more difficult to solve. The common mistake is just to put a very large number there – like 100 billion – without thinking about the consequences on the model or whether such a large value is necessary. This post shows that putting too big values can result in much longer solving times.

Furthermore, for many model classes, these constraints are difficult to avoid, if possible, and alternative formulations often involve either additional complexity or can’t be generalised into some more complex model structures (such is the case in supply chain design). Moreover, it can be difficult for solvers to spot unnecessarily large values of M, especially for real-world models with many odd constraints to cover exceptions and particular cases. I will elaborate on two reason.

Quality of Linear Relaxation

(This section assumes you are familiar with branch-and-bound (B&B) techniques. If you are not, please check this or more preferably this before.) Also, please bear in mind that the purpose here is vulgarisation and not generality or pureness of form.

Branch and bound (and branch-and-cut) usually solve the linear relaxation of a mixed-integer problem early in the solving process. Generally speaking, the more the solution of the relaxation is similar to the optimal solution of the real problem, and the more its optimal value is close to the value of the optimal solution of the real problem, the easier the model is to solve. In branch and bound, one of the bounds is provided by the relaxation, while the other is provided by a feasible solution. The closer those two values are, the easier it is to prune nodes of the search tree.

To illustrate this feature, let me use the warehouse location problem, where one wants to locate a set of warehouses to service a set of customers. The Big M constraints in this model prevents using a warehouse to service customers unless it is built. The binary variable (warehouse building) is usually associated with a high fixed cost. The model has thus a strong incentive to use as little as Y as possible. For the same amount of products shipped from a facility, the larger the big M value, the lower the corresponding value of Y that can be used in the relaxation for the Big M constraint to be satisfied. Thus, the larger the M, the more it is possible to artificially lower the value of Y, which will in turn result in a lower fraction of Y’s fixed cost being paid in the linear relaxation. We know that this purely artificial because in the real problem, location variables will either be equal to 1 (resulting in paying the full fixed cost and deactivating the constraint) or to 0 (resulting in no fixed cost and preventing to use the facility). Warehouse location is a cost minimisation problem, so it will result in a lower value of the linear relaxation, which is the opposite of what we are looking for in order to close the optimality gap.

(I am aware that better formulations exist for the SFLP than the one outlined above, but this model is easier to grasp than the fixed-charge network design).

Scaling and Numerical Stability

This problem occurs when you have both very small and very large numbers in the same constraint, which is common when using (too) large values in Big M constraints. For instance, the constraint 0.00000001 X1 + 0.00000001 X2 – 1000000000 Y <= 0 has a 10e+17 order of magnitude of difference between its coefficients. This has complex implications, but it usually forces the solver to either (1) ajust its numerical tolerance when performing floating point arithmetic, resulting in longer solving times, or (2) cause numerical problems, possibly resulting in an incorrect solution. When these coefficients are in different constraints, the solver can usually scale some constraints to circumvent this problem, but it is not so easy when the big and small numbers are in the same row.

My colleague Paul A. Rubin has written an excellent post on this issue, which I recommend you read if you are interested in the topic.

A relevant #orms resolution for 2017: Update your solvers!

Making resolutions during the New Year has become something of a tradition. Some are followed, many are not. Here is one resolution I took years ago that has been pretty relevant. I would encourage you to use it as well : Update your solvers … [Continue reading]

The recommendation game [dual]

People who teach or supervise students have a tremendous influence on what solvers get adopted by the community. Once they finish their studies, many students will continue to use the tools they have learned in school; it is simply more efficient. In … [Continue reading]

The recommendation game [primal]

People who teach or supervise students have a tremendous influence on what solvers get adopted by the operations research / industrial engineering community. Once they finish their studies, many students will continue to use the tools they have … [Continue reading]

Facility location : not so difficult (with the proper tools)

In a previous post, I generated a few capacitated facility location instances (CFLP) and I ran these through MIP solvers. The instances were solved pretty quickly overall. In a comment, professor Matteo Fischetti suggested I compare my results with … [Continue reading]

Facility location : mistake, issue and results

About two weeks ago, I generated a few capacitated facility location instances (CFLP) for some students to play with. When I ran these through the CPLEX and Gurobi solvers, all of them were solving very quickly. Gurobi in fact seemed to find the … [Continue reading]

Facility location : presolved to optimality!

  ** IMPORTANT NOTICE ** This post has temporarily been suspended as some readers noticed potential problem with the model files. I will issue a corrected post shortly. I am sorry for any inconvenience. Marc-André … [Continue reading]

It’s time talk more about barrier in class

In today's post, I argue that optimization classes and textbooks should put a greater emphasis on interior point methods. During my many years of study, I have been exposed to quite a bit of simplex theory. I also had to perform many primal and … [Continue reading]

Just a LP, really?

In recent talks in scientific conferences, I saw some people dissmiss pure linear programming (LP) models as being easy and primitive. I often got said "oh, that's just a LP". Linear programming is put in the box of solved problems, like some network … [Continue reading]