- Some Problems
- General Techniques
- Constraint Programming
- Linear Constraints over Integers
- Global Constraints
- Other Kinds of Constraints
- Dual Modelling
- Arc consistency / domain consistency
- Binary Knapsack (impl w/ CP)
- Alldifferent Constraint (impl w/ CP)
- Search
- Variable-Value Labeling
- Value-Variable Labeling
- Domain Splitting
- Symmetry Breaking during Search
- Randomization and Restarts
- Local Search
- Swap Neighborhood
- x-OPT Neighborhood for TSP
- Local Search for Graph Coloring
- Connectivity
- General Local Search Procedure
- Max-Min Conflict
- Random Walk
- Iterated Local Search
- Metropolis Heuristic
- Simulated Annealing
- Tabu Search
- Other Metaheuristics
- Linear Programming

These are my lecture notes on Professor Pascal Van Hentenryck's excellent course on discrete optimization.

```
SEND
+ MORE
= MONEY
```

Many problems (like this one) naturally exhibit

For variable symmetries, impose an ordering on the variables (e.g. lexicographic).

- an actor plays in some of the scenes
- at most \(k\) scenes can be shot per day
- each actor is paid by the day

Days are symmetric: Only consider scheduling the first scene on the first day, the second scene on either the first day or the second day,..

For CP: Add redundant constraint to ensure optimizer doesn't wait to long to start scheduling a particular category (e.g. doesn't wait until there are only 8 slots left to schedule 15 cars).

Decision variable \(x=(x_1, \ldots, x_n)\) where \(x_i = 1\) means that the item is selected and \(x_1=0\) means that the item is not selected. The solution space size is therefore \(2^n\).

Express problem constraints in decision variables: \(\sum w_ix_i\le K\)

Objective function: \(\sum v_ix_i\) $$ \text{maximize } \sum v_ix_i\\\text{subject to } \sum w_ix_i\le K \text{ and } x_i\in\{0, 1\} $$

For the knapsack problem:

\(O(k, 0) = 0\)

\(O(k, j) = \max(O(k, j-1), v_j + O(k-w_j, j-1))\) if \(w_j \le K\)

\(O(k, j) = O(k, j-1)\) otherwise

→ then trace back to find decision variable.

Complexity: \(\mathcal{O}(Kn)\) but since \(\log(K)\) bits are needed to represent \(K\) the algorithm is actually exponential in \(K\) (pseudo polynomial).

Find such an optimistic estimate through a relaxation of the problem. For example for the Knapsack problem we could remove the weight constraint \(K\) or allow fractional items to be included \(0 \le x_i \le 1\) (the latter one is a linear relaxation).

- branch and prune methodology
- complete method not a heuristic (always find optimal solution)

How to:

- choose the decision variables
- express constraints in terms of these variables
- use constraints to reduce the set of values each variable can take
- make a choice when no more deduction can be performed

What does a constraint do?

**Feasibility checking:**a constraint checks if it can be satisfied given the values in the domains of its variables \(\exists v_1\in D(x_1), \ldots, v_n \in D(x_n): c(x_1=v_1, \ldots x_n=v_n) = \text{true}\)**Pruning:**if satisfiable, a constraint determines which values in the domains cannot be part of any solution- every constraint has its own pruning algorithm

$$ l = a_1\max(D(x_1)) + \ldots + a_n \max(D(x_n)) \\ r = b_1\min(D(y_1)) + \ldots + b_m\min(D(y_m)) $$ Then by bringing all terms apart from \(a_ix_i\) to the right side of the constraint we get $$ a_ix_i \ge r - (l-a_i\max(D(x_i))) $$ By dividing by \(a_i\) and conservatively rounding we get $$ x_i \ge \left \lceil \frac{r - (l-a_i\max(D(x_i)))}{a_i} \right \rceil\\ $$ and $$ y_j \le \left \lceil \frac{l - (r-b_j\max(D(y_j)))}{b_j} \right \rceil $$

- \(\text{alldifferent}(x_1, \ldots, x_n)\) Individual pairwise constraints may be satisfiable while all of them together are not, global constraint detects this earlier. Furthermore individual constraints may not able to prune as much of the search space as global constraints.
- Table constraints
- \(\text{atmost}(\cdots)\)

E.g. for

`Magic Series`

add constraint sum(series) = len(series)- combine feasibility with pruning through DP approach
- use dynamic programming to check for feasibility
- use the dynamic programming table for pruning (update dependency links to only keep feasibly values)

Starting from an initial (incomplete) matching use alternating paths to improve matching until it reaches the required size. Use basic property by Berge (1970) for pruning.

For the magic square problem: Instead of assigning a specific value to a variable, split that variables domain in half.

For car sequencing: Don't assign a specific configuration (boolean vector of options) to a slot, specify whether a specific option will be selected for this slot. This narrows down which car configurations can be selected for a slot. Choosing the option can be done by

- apply a heuristic but with randomization (e.g. one of the three best variables)
- limit the time in the search
- if the limit is reached, restart and possibly increase the limit

For satisfaction problems: Start with infeasible configuration, move towards feasibility (turn satisfaction problem into optimization problem by minimizing number of constraint violations)

Swaps ensure that some

In constrained optimization problems we can either choose to stay in feasible space and try to minimize the optimization objective or we can always consider feasible and infeasible configurations at the same time.

In cases where there are binary constraints that are unlikely to change when performing local moves it can be better to define a discrete loss function to drive the search (e.g. magic square problem: instead of the binary constraint \(\text{sum(row)} = n\) use the value of \(|\text{sum(row)}-n|\)).

(Instead of optimizing the number of violations, optimize the degree of violation)

- Find a greedy solution with \(k\) colors, then repeatedly remove a color and run local search to find a feasible solution with \(k-1\) colors. This way we force the objective function to decrease and use local search to move from infeasible to feasible space.
- Alternatively we can try to always remain in feasible space. One way to do that would be to start from a greedy solution and optimize \(\sum_i |C_i|^2\) where the color class \(C_i\) is the set of all nodes of color \(i\). Doing this encourages larger color classes which in turn makes it easier to reach a lower number of total colors.

Unfortunately restricting the search to the feasible space severely impacts the neighborhood size since any give configuration may only have very few feasible neighbors. Therefore it makes sense to think about larger more interesting neighborhoods.

**Kemp chain neighborhood:**when flipping the color of node \(v\) from red to blue, find all connected blue nodes and flip them red, then continue recursively from there (all nodes in the connected component in the picture below flip colors)

- A third way is to optimize feasibility and the objective function at the same time. To do this simply minimize \(\sum_i 2|B_i||C_i| - \sum_i |C_i|^2\) where \(B_i\) is the set of bad edges of color \(i\) and \(C_i\) is the color class for color \(i\) as above. This objective function furthermore has the special property that local minima are always feasible configurations.

→ Use heuristics to drive the search towards a local minimum using local information

→ Use metaheuristics to try to escape local minima & drive the search towards global minima

→ Select the variable with the most constraint violations. Then change its value to the value with the least constraint violations for that variable.

- Accept a move if it improves the objective value
- A degrading move is accepted with probability \(\exp(\frac{-\delta}{t})\) where \(\delta\) is the difference in the objective values and \(t\) is a temperature.

With a very slow schedule simulated annealing is guaranteed to converge to a global optimum in a connected neighborhood.

Possible to use restarts and reheats, also combine with tabu search

Always storing entire solutions in the tabu list maybe to expensive (to store & compare) → Instead store hash/abstraction/summary/relative change in solution (e.g. swaps)

Example: car sequencing Keep data structure

`tabu[i, j]`

which stores the next iteration when pair (i, j) can be swapped. A move is tabu as long as `tabu[i, j] >= current_iteration`

and after applying a move `tabu[i, j]`

is set to `current_iteration + L`

. (e.g. `L=100`

). This prevents the same configs to be swapped back and forth repeatedlyThe abstraction in the tabu list can be more or less specific. For example instead of just storing the swap \((a,b)\) we can store \((a, b, f_1, f_2)\) where \(f_1\) is the objective value before the swap and \(f_2\) is the objective value after the swap. This more specific tabu information would only prevent the exact same move from happening but would not prevent the same swap in a different configuration from happening.

- Variable neighborhood search
- Guided local search
- Ant colony optimization
- Hybrid evolutionary algorithms
- Scatter search
- Reactive search

All variables in a program can take only nonnegative values. We can however replace any variable \(x_i\) by \(x_i^+ - x_i^-\) to represent negative values.

Some notes on the (slightly different) conventions used in this course: A linear program generally minimizes the given optimization term but we can maximize by minimizing the negative of the term. Furthermore equality constraints can be represented as two inequalities. Refer to MIP for integer-valued variables.

Linear programs can be solved in polynomial time with interior point methods or in worst-case exponential time with the Simplex algorithm. Both algorithms however exhibit similar performance characteristics on "typical" matrices and the Simplex algorithm is very efficient in practice.

A

→ The set of solutions satisfying the constraints of a linear program is convex.

A

This reduces finding the optimal solution to a finite search problem (since there are only finite vertices). The number of vertices can however be extremely large.

- Express constraint inequalities as equalities by adding slack variables \(a \le b \longrightarrow a+s = b, s \ge 0\)
- Select \(m\) variables (the "basic variables")
- Re-express them in terms of the non-basic variables only (using Gaussian elimination)
- We have a basic feasible solution if the \(b\)'s (the \(x^0\) factors) are all nonnegative

Note: in the image below all variables on the RHS (the non-basic variables) are set to zero while the variables \(x_3\), \(x_4\) and \(x_5\) are set to the non-negative values \(1\), \(2\) and \(3\) respectively.

Naive optimization algorithm: Enumerate all subsets of the set of variables and find all basic feasible solutions with the above steps. This is highly inefficient, instead use the simplex algorithm.

Starting from one basic feasible solution the simplex algorithm iteratively moves from one BFS to the next until it arrives at the global optimum. The move (swapping a basic and non-basic variable) is as follows:

- Select a non-basic variable with a negative coefficient in the objective function (the entering variable)
- Select a basic variable (the leaving variable) to maintain feasibility $$ l = \arg \min_{i:a_{ie} <0} \frac{b_i}{-a_{ie}} $$
- Perform gaussian elimination (eliminate \(x_e\) from the right-hand side)

This operation is called pivoting

`pivot(e, l)`

.A BFS is optimal if its objective function after having eliminated all basic variables, is of the form: $$ c_0 + c_1x_1 + \ldots + c_nx_n \\ c_i \ge 0 \,\,(1 \le i \le n) $$

$$ \begin{alignat}{1} & \text{while } \exists 1 \le i \le n\colon c_i < 0 \text{ do} \\ & \quad \text{choose } e \text{ such that } c_e < 0; & \\ & \quad l = \arg \min_{i:a_{ie} <0} \frac{b_i}{-a_{ie}} & \\ & \quad \text{pivot} (e, l) & \\ \end{alignat} $$ Assume that during the execution all \(b_i\) are always strictly positive and that the objective function is bounded below.

If a basic variable \(x_i\) has a negative coefficient in the objective function but only positive coefficients in the constraints the objective is unbounded below. This is because by making \(x_i\) arbitrarily large the objective can be made arbitrarily small but the solution remains feasible since the non-basic variables remain positive.

If one of the \(b_i\) is zero during leaving variable selection we will always select the same variable since the arg-min ratio will always be zero. But selecting this variable will cause the objective function to remain the same so the algorithm never terminates. We can fix this by slightly changing our pivoting rule:

- Bland rule: always select the first entering variable lexicographically. This solves the non-termination issue but may not be very efficient since we generally want to select variables with large negative coefficents since by assigning a large positive value to those variables we can significantly decrease the objective.
- Lexicographic pivoting rule: break ties when selecting the leaving variable by using a lexicographic pivoting rule.
- Pertubation methods

Introduce artificial variables \(y_1\) to \(y_m\)

and optimize this problem with the simplex algorithm starting from a trivial initial BFS. If we find a solution with objective value 0 then the original problem is feasible since in this case all the \(y_i\) are zero (If a \(y_i\) is not zero we can perform a pivot to replace them with an \(x_i\)). From this solution we can create an initial BFS for our original problem.