In terms of research papers, a star is a masterpiece that has a profound and lasting impact in a field. It is well written, has deep implications, provokes thought and will likely to be worth reading again in several decades. It proposes either a sound technique that can be applied to many problems, or such tremendous results that are likely to remain unbeaten for a long time. I also put in that category some papers that thoroughly summarize a broad range of advances in a field and allows newcomers in the field to understand many key concepts without having to read tens and tens articles. Beethoven’s 9th Symphony (op.125) is a good music example.

*In my field (supply chain network design), there is a paper which perfectly fits that definition: A.M. Geoffrion and G.W. Graves’ Multicommodity Distribution System Design by Benders Decomposition, published in Management Science in 1974. More on this in an upcoming blog post.*

A comet is a bright, shiny object that generates a lot of attention. It is at the center of a hot topic or trend in research. However, its relevance is tied to a particular time and context, and its intrinsic value diminishes quickly over time. Either the method was replaced by something more effective or the discussion has evolved.

As a researcher, one usually remembers the papers that were comets at the time she was introduced to that particular topic. It’s important to know what are the current comets in the sky. What is worth reading in that category tends to change rather quickly, and it is a good idea to keep this list updated frequently if you don’t want your students to waste their time on outdated approaches.

*When I started working on SCND, international supply chains were a hot topic. Multinational could use transfer prices to affect the taxes they paid in each country. It also made the models more challenging to solve. However, the freedom to fix these prices was practically removed by governments to prevent tax evasion, thereby killing the practical relevance of these problems. *

Asteroids form the vast majority of research papers. Most of them are not relevant to you (or your students) unless they are very close to the particular topic or approach you are currently working on. In many broad fields (like vehicle routing), it’s impossible to even read 10% of the asteroids out there. Don’t waste your time by reading too many papers of that kind. Only read what’s very close to your research, unless you want to spend your life writing literature reviews.

*In SCND, there are hundreds of papers proposing heuristics to solve a particular sets of instances or a very specific formulation. Usually their algorithm is only slightly faster than the previous state-of-the-art, and the article’s value gets to near-zero as soon as someone publishes a faster algorithm.*

The more I think about this problem, the more I think that you can’t skip on the classics (aka stars), especially at the Ph.D. level. If these papers are particularly difficult to understand, it can be replaced by a book chapter or another reading that is easier to understand. In terms of research, it is common for people who don’t master the stars to mistake comets for stars and asteroids for comets.

]]>When I started learning about integer programming in the early 2000s, I often got the following advice regarding modelling using Big M constraints:

- Use as few Big M constraints as possible;
- If you need to use them, make the coefficients (the Ms) as small as possible.

I always kept this advice but I never really tested the assumptions in practice. It’s a fact that solvers are several orders of magnitude faster than they were when I learned modelling. So is this advice still relevant? I wanted to test it out.

I put together three supply chain design instances. More precisely, they are Two-Echelon (hierarchical) Uncapacitated Facility Location models. These are of size 30 x 30 x 50 and are identical in structure to the models shown here except for the single source constraints which I did not use. Each instance has thus 60 Big-M constraints, one for each facility, that ensures that the facility can process products only if it is built. I used three different levels of coefficients: (i) the smallest (tighest) possible value, (ii) a larger value but in the same order of magnitude as the smallest value, and (iii) a huge value that is at least 100 times larger than the best value. The instances are identical except for these 60 coefficients. I run the models through three solvers: CPLEX, CBC and GLPK, with a time limit of 2 hours, and compute the geometric mean of run times (in seconds).

Keep in mind that these are the same supply chains! CBC is strongly affected by the change, needing on average 6.5 more time for option (ii) and a whopping 15.5 times for (iii). CPLEX is also affected but to a much lesser extent, by a factor of about 2 and 4. GLPK struggles with these models even when they are tightly formulated. When higher values are used for the constraints, GLPK simply can’t solve them within the allotted time, with a final gap between 5 and 18% depending on instances.

The commercial solver is not only faster, it is also less affected by the unnecessarily large coefficients. I didn’t post results with Gurobi but the performances are quite comparable with CPLEX. This also shows the canyon between free and commercial solvers in terms of performance.

If you found this post useful or insightful, please share it with your colleagues, friends or students!

For full disclosure, I mentioned in the dual post that the model I use is not the most efficient one for solving this problem. I used it anyways for two reasons:

- The model is easy to understand, compared to more complex variants of supply chain design which have many types of binary variables.
- The best (tightest) values of the Big M coefficients are very straightforward to obtain in this model.

]]>

A big M constraint typically used to limit the value of a set of variables (usually continuous but sometimes integer) based on the value of a binary variable. It is very commonly used in MIPs to model if…then structures and relationships between decisions. Some examples include:

- In supply chain design, a facility can only ship products (represented by continuous flow variables) if it is built (represented by a binary variable linked to a fixed cost).
- In production planning, a lot of a given product can only be produced (the quantity being represented by a continuous or integer variable) if a machine setup is made to produce that product (represented by a binary variable).

The form of the constraints is usually the following, where X are the associated continuous or integer variables, Y a binary variable and M a *large enough *coefficient. The M must be large enough so as to let the model choose appropriate values for the X variables if Yj is set to 1.

Besides the fact that they are widely used in many classes of models, the mere presence of these constraints usually have the effect of making a model much more difficult to solve. The common mistake is just to put a *very *large number there – like 100 billion – without thinking about the consequences on the model or whether such a large value is necessary. This post shows that putting too big values can result in much longer solving times.

Furthermore, for many model classes, these constraints are difficult to avoid, if possible, and alternative formulations often involve either additional complexity or can’t be generalised into some more complex model structures (such is the case in supply chain design). Moreover, it can be difficult for solvers to spot unnecessarily large values of M, especially for real-world models with many odd constraints to cover exceptions and particular cases. I will elaborate on two reason.

(This section assumes you are familiar with branch-and-bound (B&B) techniques. If you are not, please check this or more preferably this before.) Also, please bear in mind that the purpose here is vulgarisation and not generality or pureness of form.

Branch and bound (and branch-and-cut) usually solve the linear relaxation of a mixed-integer problem early in the solving process. Generally speaking, the more the solution of the relaxation is similar to the optimal solution of the real problem, and the more its optimal value is close to the value of the optimal solution of the real problem, the easier the model is to solve. In branch and bound, one of the bounds is provided by the relaxation, while the other is provided by a feasible solution. The closer those two values are, the easier it is to prune nodes of the search tree.

To illustrate this feature, let me use the warehouse location problem, where one wants to locate a set of warehouses to service a set of customers. The Big M constraints in this model prevents using a warehouse to service customers unless it is built. The binary variable (warehouse building) is usually associated with a high fixed cost. The model has thus a strong incentive to use as little as Y as possible. For the same amount of products shipped from a facility, the larger the big M value, the lower the corresponding value of Y that can be used in the relaxation for the Big M constraint to be satisfied. Thus, the larger the M, the more it is possible to artificially lower the value of Y, which will in turn result in a lower fraction of Y’s fixed cost being paid in the linear relaxation. We know that this purely artificial because in the real problem, location variables will either be equal to 1 (resulting in paying the full fixed cost and deactivating the constraint) or to 0 (resulting in no fixed cost and preventing to use the facility). Warehouse location is a cost minimisation problem, so it will result in a lower value of the linear relaxation, which is the opposite of what we are looking for in order to close the optimality gap.

*(I am aware that better formulations exist for the SFLP than the one outlined above, but this model is easier to grasp than the fixed-charge network design).*

This problem occurs when you have both very small and very large numbers in the same constraint, which is common when using (too) large values in Big M constraints. For instance, the constraint 0.00000001 X1 + 0.00000001 X2 – 1000000000 Y <= 0 has a 10e+17 order of magnitude of difference between its coefficients. This has complex implications, but it usually forces the solver to either (1) ajust its numerical tolerance when performing floating point arithmetic, resulting in longer solving times, or (2) cause numerical problems, possibly resulting in an incorrect solution. When these coefficients are in different constraints, the solver can usually scale some constraints to circumvent this problem, but it is not so easy when the big and small numbers are in the same row.

My colleague Paul A. Rubin has written an excellent post on this issue, which I recommend you read if you are interested in the topic.

]]>Like, *every* time. It’s worth it. Two reasons:

Every major release of mathematical programming solver comes with pretty significant performance improvements. For instance, Gurobi reports 22% average improvements on mixed-integer programming (MIP) and 10% on pure linear programming models between versions 6.5 and 7. CPLEX is usually in the same ballpark of version-to-version improvements. Mosek reports 40% mean improvements for conic quadratic problems between versions 7 and 8. These are pretty significant, especially when compounded over multiple versions. I give a personal example here.

Over the last few years, solver developers have added some pretty significant new functionality. Just for 2016, CPLEX included automatic Benders Decomposition into its 12.7 release, and Gurobi added native support for multiobjective optimization, among others. Before you implement your own customised algorithm, be sure to check what’s new, you could discover that it is now included in a standard release!

You’d be surprised how often I see students running models on older workstations or servers using outdated solvers, like CPLEX 12.4 or Gurobi 6.0.5. I sometimes referee papers comparing the authors’ custom algorithms against even older versions.

Here are links on where to find new information about new versions and performance improvements (in alphabetical order). Those usually get presented in scientific conferences as well, the INFORMS Annual Conference in November being the biggest *rendez-vous* if you want to hear about what’s new.

- CPLEX : you can generally hear about what’s new and relevant on this blog.
- FICO Xpress : it’s a bit hard to find but they sometimes release a webinar about their new features here.
- Gurobi has a very convenient page about what’s in the latest release.
- Mosek : you can find links to copy of slides presented in conferences. They usually talk about their performance improvements there.

- In my decision support systems design class (undergrad, industrial engineering), I recommend that the students use Excel, VBA and CBC (CoinOR’s mixed-integer programming solver).
- Supply chain design class (masters level, industrial engineering & analytics), I recommend the students use CPLEX with a modeling language of their choice.

Most of our undergraduate students at my university come from nearby cities and will probably work in small or medium sized businesses. CPLEX with AMPL is a good mix of tools, but the sad truth is that many small manufacturing businesses will be very reluctant to spend that much on software licences for a new employee. The DSS class is focused on a “do it yourself” approach, so we use tools that are either free (CBC) or are present in almost every business (Excel). Besides, industrial engineering students learn AMPL & CPLEX in another set of two classes, so they still get in contact with these tools.

On the other hand, most masters in analytics are liklely to have more dedicated analyst/planning jobs in larger companies, where integration with other tools is important and where larger budgets are available for buying analytics software. Besides, many of them will end up working on more difficult models or in research, where it is important to use the fastest tools available.

Besides the talk about context and usefulness, I think it’s important to teach using the right tools. If your class is about hard optimization problems, then use a state-of-the-art solver, not GLPK. If you require to get data from a database or text file and send that to a solver for direct resolution, most people don’t need to use the C API, unless they are very proficient in C, a rather uncommon skill in undergrads. Use either a modeling language, or the solver’s appropriate API.

When teaching optimization, many professors will require the use of one solver or programming language. This is makes grading and supporting easier and is especially relevant if you have a teaching assistant (TA). In more advanced or project-based classes, you might want to give more freedom to students to select the most appropriate tool for their project. Since there is a lot of homework and exercices to turn in my supply chain class, I require the use of specific tools.

The tools I recommend using are software I have worked with in the past and that I know well enough to be able to support the students if they have a problem. Otherwise, the students who work with languages or software I know would be advantaged over those who work with technologies I don’t know. I also make it very clear to students in project-based classes that if they use other technologies, I might not be able to support them.

]]>In our rapidly changing technological environment, it is my strong belief that developing self-learning capabilities is a sound approach to maintain and enhance one’s employability. However, forcing students to learn how to use tools on their own does not seem to be a good strategy. It creates some frustration in the least technology-savvy students, creating a belief that using OR tools is difficult and unpleasant. This is exactly the opposite of what we wish to accomplish. Over the last two years, I found that creating a class environment that favours rather than forces self-learning and using multiple tools is much more effective. I also put much more effort so students have a good understanding of the technique, then learn using some tools to implement the technique.

What pleases me very much is that the situation is totally different than when I was an undergrad. At the time, we had to pay educational licences for every tool we wanted to learn: SPSS, Arena, Lindo or even CPLEX. Moreover, these solvers often came with severe limitations on problem size or functionalities such as data input/output from a database. Moreover, in order to reduce the cost to students, professors would coordinate in order to use the same solver – in my case it was Lindo – and also to make sure their exercices would fit in the limitations imposed by academic licenses. As I paid a couple hundred dollars for learning a specific statistical software, my first thought was not on buying another 6-month academic licence to self-learn SAS. To be fair, I’m not sure I was aware of the benefits of learning multiple technologies at that time; I was simply curious to learn about news tools and techniques.

Nowadays, while you might not find an ArcGIS licence around the corner, free academic licences have made teaching multiple technologies much easier. There is also a large number of open-source tools which students can use. The pressure from free tools have also pushed the (inflation-adjusted) cost of many licences down, for those who still charge. I believe the students greatly benefit from this environment. A research center can pay 5000$ to buy the software it needs to do research; most undergrads can’t. At the end of the bachelor’s degree, an industrial engineer from my university will have been exposed to at least the following OR software: CPLEX, AMPL, Lindo, Matlab, CBC, Baron, a couple algorithms for network optimization and at least two programming languages, more probably three or four.

Software vendors make presentations and offer tutorials, webinar, comprehensive and detailed examples for free so that more students can learn how to use their tools. While I understand that this change has lots to do with business models and strategy, I think this change is great, both for students and for the field of analytics and operations research in particular.

In the associated dual post, I will cover which criteria I use to determine the tools I cover in my classes.

SPOILER: I don’t use a multicriteria method to determine which multicriteria method to teach

]]>For this test, I used the 100 instances from Avella & Boccia (2009). These are grouped in 5 sets of 20 instances each, having between 300 and 1000 facilities and customers, respectively. The data for the instances are available here; I simply wrote a program to parse the input file and send them to the solver. For those who wonder, I did check that the reported optimal values in Avella & Boccia match with the values I got in my tests. Each instance is run on CPLEX 12.6.2 with default parameters on a recent 4-core machine. I computed the geometric means of solution times (in seconds) for each group of 20 instances:

The 300 x 300 instances are solved pretty quickly, with 9 seconds on average. The 300 x 1000 and 500 x 500 are solved under a minute of run times and even the largest only take 21 minutes on average (the longest takes about 56 minutes). Interestingly, 35% of the 300×1000 instances are solved directly at the root node through a combination of implied bound cuts and heuristics.

If we refer to the more direct application of CFLP, that is, deciding the location of warehouses or production facilities in a supply chain, even 300 facilities is a pretty large number. Also, as facility location is considered a strategic problem, 20 minutes to get an optimal solution is very acceptable. For direct application purposes in the supply chain domain, the solvers are performing well enough that customized algorithms are not necessary. This being said, the CFLP is often embedded as a subproblem in larger supply chain network design problems, and as such, specialized algorithms may still be necessary in some specific contexts.

Of course, these results benefit from the tremendous advances in both hardware and software. It merely shows that the state-of-the-art tools used for optimization are getting better, and they are improving pretty fast. I also ran the tests on Gurobi and the results are pretty similar.

[1] P. Avella and M. Boccia. A cutting plane algorithm for the capacitated facility location problem. Computational Optimization and Applications, 43(1):39–65, 2009.

]]>I am commited to provide accurate results on this blog, so please accept my most sincere apologies for not validating sufficiently enough before posting it here. I also thank Professor Andrea Lodi and his students for spotting the source of the error in the data files. I also wish to thank Professor Matteo Fischetti for expressing doubts about the results of my previous post.

When preparing test problems like this, I usually create LP files instead of creating instances in memory as it is easier to distribute them to students and to run the models through different solvers. I also found that students have a much easier time reading LP files rather than MPS. The LP format has its disadvantages, however, as it can be interpreted differently between solvers. In particular, I forgot to put a space between the coefficient and the last variable in a constraint (for instance 1.255Y1 <= 1 instead of 1.255 Y1 <= 1). CPLEX interprets it the way it was intended and separates the coefficient 1.255 from the variable Y1 even though there is no space, but Gurobi assumes 1.255Y1 is a new variable that is different from Y1. To be clear, the mistake is mine, not Gurobi’s, but differences in interpretation are sometimes difficult to spot.

When I generated the models, I ran them through CPLEX first, and checked the solutions. Then, out of curiosity, I ran the same set of models through Gurobi, and the results were much faster. I assumed wrongly that the models were fine, so I did not check the solutions out of Gurobi. What is very ironic is that I wrote a blog post about this very issue less than two years ago and I failed to follow my own advice this time. The corrected LP files are now available here.

Running the corrected moels yields a different picture. Most models are solved quite quickly and efficiently by the solvers, abeit not during the presolve stage.

- 38% of the models were solved at the root node, usually by a combination of cuts and heuristics;
- 68% of the models are solved in less than a second;
- 78% of the models are solved in less than a minute.

The largest instance that is solved at the root node has 75 facilities and 250 customers, while the largest instances that are solved in less than a second are of sizes 75 x 250 and 80 x 150, respectively. The largest instances in the set (100 facilities and 2000 customers) take about 1 to 2 hours to solve on Gurobi. CPLEX is in the same ballpark in terms of solving times.

The interesting question that arises is “are these solving times fast enough?” to be actually used in practice. There are many potential applications to the CFLP, but I will focus on supply chain management. While benchmark instances are sometimes much larger than the models solved in this test, I have never encountered or heard of a company who wanted to locate or relocate 100 or more facilities simultaneously. Usually, the number of potential locations is much smaller, but the problem is made complex by adding more types of decision such as alternate transport mode selection, alternative capacities or technologies and facilities, and so on. These interdependent decisions, more than sheer size of the instance, makes the problem much more difficult to solve.

Of course, in general, faster is always better, and there is place for faster customized algorithms that solve the biggest instances in a few seconds. But if you are working on a project to relocate 100 warehouses, my bet is that you can wait 90 minutes to get the optimal solution.

]]>** IMPORTANT NOTICE **

This post has temporarily been suspended as some readers noticed potential problem with the model files. I will issue a corrected post shortly.

I am sorry for any inconvenience.

Marc-André

]]>During my many years of study, I have been exposed to quite a bit of simplex theory. I also had to perform many primal and dual simplex pivots by hand, as it was considered a good way to understand how the algorithm works. The teaching was quite in line with the classical textbook used by our school, the Winston and Venkataramanan. I remember it was mentioned that “other methods exist, some of which are polynomial time”, but that in practice the simplex was more used. It wasn’t until I got to start working with mixed-integer programming solvers that I was exposed to interior-point methods, and then I had to do quite a bit of reading to understand what it was all about.

Nowadays, interior point methods are often the most powerful for solving many linear programming models. Many LP models just couldn’t be solved efficiently if it wasn’t for this class of algorithms. For instance, over the last weeks I had to solve a set of about 60 relatively large linear programming models (1.5 million variables and about 500k constraints). These models were quite effectively handled by the solver we used (in this case it was CPLEX, but some earlier tests revealed that Gurobi did just as good). It took between 60 and 90 minutes to solve each one of these models. Just out of curiosity, I forced the solver to solve the same model using the dual simplex, and it was still running after 3 days. If it wasn’t for barrier, we would have needed to drastically reduce our model size in order to be able to do anything in this project.

Many syllabus I have read on linear programming courses spend a large amount of time on both primal and dual simplex, and either cover interior point methods in 3 hours at the end of the semester or don’t cover it at all. This is especially true in business schools. Students in analytics and business intelligence get very little exposure to the algorithms that are embedded in the tools they use daily and they might only have one class who focuses linear and integer programming. I think it’s time to split class time more fairly between simplex and interior point methods.

If you are interested in the topic of what optimization algorithms should be taught in optimization classes, I recommend you read Bob Fourer’s excellent post on which simplex methods to teach to engineers and computer scientists.

]]>