This post presents a class of constraints that are used very often in mixed-integer programming (MIP) models. I explain what they are, why they are important and why using too large values for the big M is problematic. The associated primal post presents some experiments that outline how modern solvers handle unnecessarily large values of M.
What is a “Big M” constraint?
A big M constraint typically used to limit the value of a set of variables (usually continuous but sometimes integer) based on the value of a binary variable. It is very commonly used in MIPs to model if…then structures and relationships between decisions. Some examples include:
- In supply chain design, a facility can only ship products (represented by continuous flow variables) if it is built (represented by a binary variable linked to a fixed cost).
- In production planning, a lot of a given product can only be produced (the quantity being represented by a continuous or integer variable) if a machine setup is made to produce that product (represented by a binary variable).
The form of the constraints is usually the following, where X are the associated continuous or integer variables, Y a binary variable and M a large enough coefficient. The M must be large enough so as to let the model choose appropriate values for the X variables if Yj is set to 1.
Why are Big M constraints important?
Besides the fact that they are widely used in many classes of models, the mere presence of these constraints usually have the effect of making a model much more difficult to solve. The common mistake is just to put a very large number there – like 100 billion – without thinking about the consequences on the model or whether such a large value is necessary. This post shows that putting too big values can result in much longer solving times.
Furthermore, for many model classes, these constraints are difficult to avoid, if possible, and alternative formulations often involve either additional complexity or can’t be generalised into some more complex model structures (such is the case in supply chain design). Moreover, it can be difficult for solvers to spot unnecessarily large values of M, especially for real-world models with many odd constraints to cover exceptions and particular cases. I will elaborate on two reason.
Quality of Linear Relaxation
(This section assumes you are familiar with branch-and-bound (B&B) techniques. If you are not, please check this or more preferably this before.) Also, please bear in mind that the purpose here is vulgarisation and not generality or pureness of form.
Branch and bound (and branch-and-cut) usually solve the linear relaxation of a mixed-integer problem early in the solving process. Generally speaking, the more the solution of the relaxation is similar to the optimal solution of the real problem, and the more its optimal value is close to the value of the optimal solution of the real problem, the easier the model is to solve. In branch and bound, one of the bounds is provided by the relaxation, while the other is provided by a feasible solution. The closer those two values are, the easier it is to prune nodes of the search tree.
To illustrate this feature, let me use the warehouse location problem, where one wants to locate a set of warehouses to service a set of customers. The Big M constraints in this model prevents using a warehouse to service customers unless it is built. The binary variable (warehouse building) is usually associated with a high fixed cost. The model has thus a strong incentive to use as little as Y as possible. For the same amount of products shipped from a facility, the larger the big M value, the lower the corresponding value of Y that can be used in the relaxation for the Big M constraint to be satisfied. Thus, the larger the M, the more it is possible to artificially lower the value of Y, which will in turn result in a lower fraction of Y’s fixed cost being paid in the linear relaxation. We know that this purely artificial because in the real problem, location variables will either be equal to 1 (resulting in paying the full fixed cost and deactivating the constraint) or to 0 (resulting in no fixed cost and preventing to use the facility). Warehouse location is a cost minimisation problem, so it will result in a lower value of the linear relaxation, which is the opposite of what we are looking for in order to close the optimality gap.
(I am aware that better formulations exist for the SFLP than the one outlined above, but this model is easier to grasp than the fixed-charge network design).
Scaling and Numerical Stability
This problem occurs when you have both very small and very large numbers in the same constraint, which is common when using (too) large values in Big M constraints. For instance, the constraint 0.00000001 X1 + 0.00000001 X2 – 1000000000 Y <= 0 has a 10e+17 order of magnitude of difference between its coefficients. This has complex implications, but it usually forces the solver to either (1) ajust its numerical tolerance when performing floating point arithmetic, resulting in longer solving times, or (2) cause numerical problems, possibly resulting in an incorrect solution. When these coefficients are in different constraints, the solver can usually scale some constraints to circumvent this problem, but it is not so easy when the big and small numbers are in the same row.
My colleague Paul A. Rubin has written an excellent post on this issue, which I recommend you read if you are interested in the topic.