# Multi-commodity Flow

The multi-commodity flow problem is a generalization of the maximum flow problem, where we need to find a maximum $ (s_i, t_i) $- flow through the network for all commodities $ i = 1,...,k. $ while keeping the sum of flow over all commodities on each arc below its capacity. The formulation for this problem is a linear program that can be solved in strongly polynomial time.

$ \begin{align*} \max & \sum_{i=1}^{k} \Big( \sum_{a\in\delta^+(s_i)} x_a^i - \sum_{a\in\delta^-(s_i)} x_a^i\Big) & \\ s.t. & \sum_{a\in\delta^+(v)} x_a^i - \sum_{a\in\delta^-(s_i)} x_a^i = 0 & \text{ for all } v\in V \ \{s_i,t_i\}, \text{ }i = 1,\dots,k \\ & \sum_{i=1}^{k}x_a^i \leq u_a & \text{ for all } a\in A \\ & x_a^i \geq 0 & \text{ for all } a\in A , \text{ }i = 1,\dots,k\\ \end{align*} $

However due to the fact that more than one commodity travel on the same arc the problem becomes $ NP $-hard when you look for an integer solution. Unlike standard max flow problem there is also no guarantee that an integer solution exists just because you have a feasible flow. Even if you are not looking for an integer solution it turns out in practice that while polynomial time algorithms exist to solve the multi-commodity flow problem, there is a larger issue to address. Say we have a graph with 1000 nodes and 10,000 edges, and we have 1 million commodities to send through this network (sending flow from each node to every other node). Then the number of variables in this seemingly reasonable problem becomes about 10 trillion, and at a measly 8 bytes per variable which is less than the space usually allocated to each variable in this type of problem, you have already used 80 gigabytes of storage space. Ultimately this leads us to decide that we need to use a method such as column generation to greatly reduce the size of the matrix needed to perform the necessary calculations to find our multi-commodity flow.

## Column Generation

Column generation is the dual to the cutting plane method. It allows us to consider only a subset of the variables while keeping all of the constraints of the problem. Our original problem will now be called the master LP. We then consider a restricted LP which contains all the constraints of the original LP and a subset of the variables. Each iteration we solve a pricing problem (see if there are variables we should add to improve our objective function value) until we reach the optimal solution which means there are no more variables to add that will improve the objective function value.

For this to work with the multi-commodity flow problem we need to start with our master LP being the path formulation of the multi-commodity flow problem.

$ \begin{align*} \max & \sum_{p\in P} f_p &\\ s.t. & \sum_{p\in P_a} f_p \leq u_a &\text{ for all } a\in A \\ & 0 \leq f_p & \text{ for all } p\in P \\ \end{align*} $

So essentially with this formulation each column you add represents a path through the network that a commodity is sent along. So our restricted master LP is the constraints over a small starting number of paths $ P'\subseteq P $ which makes our restricted master LP:

$ \begin{align*} \max & \sum_{p\in P'} f_p &\\ s.t. & \sum_{p\in P'_a} f_p \leq u_a &\text{ for all } a\in A \\ & 0 \leq f_p & \text{ for all } p\in P' \\ \end{align*} $

And the dual of the restricted master problem is:

$ \begin{align*} \min & \sum_{a\in A} u_a \mu_a &\\ s.t. & \sum_{a\in p} \mu_a \geq 1 & \text{ for all } p\in P'\\ & \mu_a \geq 0 & \text{ for all } a\in A \\ \end{align*} $

By the duality theorem we know that a solution $ \mu $ to the restricted dual is a lower bound on a solution to the dual which also provides a lower bound to the primal problem. If both the primal and dual problem are feasible finding the optimal solution for the dual master problem will result in the optimal solution for the primal master LP. Below is a basic algorithm for solving the multi-commodity flow problem using column generation. Once the algorithm terminates with no other columns to add you have the optimal solution to the masterLP. Since the largest calculation done every iteration is coming up with the new pricing vector, if this pricing problem is solved in polynomial time then the master problem is solvable in polynomial time. Plus you no longer have the storage problems discussed above because you are only using the variables necessary to reach the optimal solution.