# Data Piques

## Towards optimal personalization: synthesisizing machine learning and operations research

Last post I talked about how data scientists probably ought to spend some time talking about optimization (but not too much time - I need topics for my blog posts!). While I provided a basic optimization example in that post, that may have not been so interesting, and there definitely wasn't any machine learning involved.

Right now, I think that the most exciting industrial applications of optimization are those that synthesize machine learning and optimization in order to obtain optimal personalization at scale.

Here, I'll talk about a more concrete use case of this synthesis that you might see at a company.

## All the ML and nowhere to go¶

Let's say you start working at a Software-As-A-Service (SAAS) company, and you end up in a meeting with the Marketing team. Everybody's talking about churn. Marketing has been trying all sorts of things - they've sent coupons, they've called customers, they've sent emails, and everything else in order to decrease churn. Some things work, some things are expensive, and there are lots of questions. Nobody knows SQL, so you offer to look at the data.

It turns out that it seems like there might be some clear differences in customers who eventually churn and customers who do not. You offer to build an algorithm to predict customer churn broken out by intervention medium (e.g. email, phone call, no intervention, etc...).

You get the greenlight to hack away. Of course, this takes much longer than you or Marketing expects (because pretty much all machine learning does), but in the end you're left with multiple classification models that are well-tuned with a bunch of features.

You're in a great place. You actually built machine learning models that work.

But what now?

You can go the common route. You write a long script that will run the churn model every so often and populate a database with the results. You tell Marketing and everybody else that this information is now available, and you hope that they will use it.

And they might.

Or, those numbers will sit there.

Or, Marketing will randomly target the top X% of people most likely to churn with their expensive intervention (say, phone call) and email the rest.

None of this is optimal.

## Optimization to the rescue¶

When there's lots of decisions to make and there's a clear goal, then optimization is a great friend to have. The goal here is to prevent churn. We will have some constraints (mainly money). Let's make up some data and walk through how to solve this in python.

## Defining (making up) the problem¶

We'll assume that we have 4 different types of churn prevention messages at 4 different prices:

Media Price
Email 0.25
Text message 0.85
Phone Call 5.00


Total cost: $2000.00  In [13]: obj = np.sum([m[i, j].varValue * p[i, j] for i in customer_index for j in message_index]) print('Probability sum (i.e. the objective function): {}'.format(obj))  Probability sum (i.e. the objective function): 616.575578582  ## Tricks of the optimization trade¶ That was fun, but now I'd like to complicate things (and make them more realistic) by assuming that there are pricing plans for each of these message types that change with number of messages sent: Media <500 500-999 1,000-1,499 >1,500 Email 0.25 0.20 0.15 0.10 Push notification 0.30 0.25 0.225 0.20 Text message 0.85 0.65 0.50 0.30 Phone Call 5.00 5.00 5.00 5.00 You employ your own employees for the phones, so this cost does not benefit from economies of scale. The budget constraint will now have to be rewritten, and it will be much more difficult. Seriously, it's pretty tricky. With the tiered pricing, we introduce step functions to the cost. What follows is the best method I could think up to solve this problem, but I would love to know if anybody has better ideas. To start, we'll consider$<$500 messages to be our base tier. We will then call each subsequent tier Tier 1, Tier 2, and Tier 3 (500-1,000, 1,000-1,499, and$>$1,500 messages, respectively). We must create two new types of variables. The first will be indicator variables which indicate whether or not we have "activated" a particular tier for a particular message type. Mathematically, let us define$a_{jk} \in \{0, 1\}$to indicate whether or not we have sold at least the minimum amount of messages of type$j$in Tier$k$. The second type of variable will be integer variables$t_{jk} \in \mathbb{Z}_{\geq 0}$which tell us how many messages of type$j$that we have sent above the minimum amount of messages for tier$k$. The way that we will calculate the price for sending$N$messages of type$j$will be to calculate the base cost and then subtract "discounts" from the total cost as we move up in tiers. Okay, to summarize, we need to define both$a_{jk}$, our tier activation indicators, and$t_{jk}$, our tier message counters. Let's let$l_{k}$be the minimum (lower) amount of messages needed to be sent in order to move up in a Tier. With the following two constraints, the behavior of$a$will be fully defined $$\forall j, k \geq 1 \quad \sum\limits_{i}m_{ij} \leq l_{k} - 1 + X a_{jk}$$ $$\forall j,k \geq 1 \quad \sum\limits_{i}m_{ij} \geq l_{k} - X(1 - a_{jk})$$ where$X$is a sufficiently large number (like the maximum of$\sum\limits_{i}m_{ij}$). This technique of using this large number is called the Big M method from the common nomenclature of using an$M$instead of my choice of$X$(I already used an$M$earlier!). The two constraints above are quite confusing. I know this much is true because it took me a long time to make sure that I had them right. I would suggest walking through an example to make sure that you understand what's going on. You can pick a single$k$, like the first Tier where$l_{1} = 500$. Now, imagine if 499 messages of some type$j$had been sent such that$\sum\limits_{i}m_{ij} = 499$. What do the constraints say that$a_{j1}$must be? Now, perform the same test for 500 and 501 messages and assure yourself that the constraints never disagree with each other and that$a_{j1}$flips from 0 to 1 when it is supposed to. With$a$defined, it is one more set of mental gymnastics in order to define$t$: $$\forall j,k \geq 1 \quad t_{jk} \leq Xa_{jk}$$ $$\forall j,k \geq 1 \quad t_{jk} + l_{jk} \geq \sum\limits_{i}m_{ij} - X (1 - a_{jk})$$ $$\forall j,k \geq 1 \quad t_{jk} + l_{jk} \leq \sum\limits_{i}m_{ij} + X (1 - a_{jk})$$ The first constraint ensures that if$a$is 0 then$t$must be 0, as well. The other two constraints guarantee that when$a$is 1, then$t_{jk} = \sum\limits_{i}m_{ij} - l_{jk}$which is exactly the number of messages greater than the minimum threshold for Tier$k$. With all of these god forsaken constraints defined, we can finally write our budget constraint. We will take$c_{jk}$to be the cost for sending a single message from Tier$k$where$k=0$is the base cost. For$k>0$, we will take$c_{jk}$to be the extra discount obtained by reaching that tier (it is positive, and we subtract it from the base cost). $$\sum\limits_{j}\Big(\sum\limits_{i}m_{ij}c_{j0} - \sum\limits_{k \geq 1}t_{jk}c_{jk}\Big) \leq 2,000$$ Phew! ## Math$\Rightarrow$Code Redux¶ Finally, we get back to the code. We can reuse some of the old variables, but we'll need to rewrite our cost matrix and add some tier information. In [14]: # New Indices tier_index = range(4) # k # New matrices of constants # Tier pricing c = np.array([[0.25, 0.20, 0.15, 0.10], [0.30, 0.25, 0.225, 0.20], [0.85, 0.65, 0.50, 0.30], [5.00, 5.00, 5.00, 5.00]]) # Recall that we must transform c # to get the marginal discount. # There's probably a fancy numpy way to do this # but ain't nobody got time for that. # Note we must slice in reverse if we # want to do this in place. for j in range(c.shape[1] - 1, 0, -1): c[:, j] -= c[:, j-1] c = np.abs(c) # Tier thresholds l = [0, 500, 1000, 1500]  We solve the problem similarly to before but need our two new variables along with their associated constraints. In [15]: # Create variables prob = LpProblem('Tiered Churn Prevention Problem', LpMaximize) m = {} a = {} t = {} for i in customer_index: for j in message_index: m[i, j] = LpVariable('m_{}_{}'.format(i, j), lowBound=0, upBound=1, cat='Binary') for j in message_index: for k in tier_index[1:]: a[j, k] = LpVariable('a_{}_{}'.format(j, k), lowBound=0, upBound=1, cat='Binary') t[j, k] = LpVariable('t_{}_{}'.format(j, k), lowBound=0, upBound=len(customer_index) - l[k], cat='Integer') # Add the objective functions prob += (lpSum([m[i, j] * p[i, j] for i in customer_index for j in message_index]), 'Total retention probability')  In [16]: # Simple constraint for i in customer_index: prob += (lpSum([m[i, j] for j in message_index]) == 1, 'One message for cust {}'.format(i))  In [17]: # Hard pricing tier constraints X = len(customer_index) # Max value of sum_{i}(m_{ij}) for j in message_index: m_sum = lpSum([m[i, j] for i in customer_index]) for k in tier_index[1:]: prob += (m_sum <= l[k] - 1 + X*a[j, k], 'hard constraint 1 {}_{}'.format(j, k)) prob += (m_sum >= l[k] - X*(1 - a[j, k]), 'hard constraint 2 {}_{}'.format(j, k)) prob += (t[j, k] <= X * a[j, k], 'hard constraint 3 {}_{}'.format(j, k)) prob += (t[j, k] + l[k] >= m_sum - X*(1 - a[j, k]), 'hard constraint 4 {}_{}'.format(j, k)) prob += (t[j, k] + l[k] <= m_sum + X*(1 - a[j, k]), 'hard constraint 5 {}_{}'.format(j, k))  In [18]: # Max budget constraint prob += (lpSum([lpSum([m[i, j] * c[j, 0] for i in customer_index]) - lpSum([t[j, k] * c[j, k] for k in tier_index[1:]]) for j in message_index]) <= budget, 'Budget')  Even with our relatively small problem of 2000 customers, GLPK has a bit of difficulty in converging when using the default optimization parameters. I'll add a slightly smaller mipgap (difference between upper bound and current feasible solution) in order to lower the tolerance for this example. Note that this is not necessarily GLPK's fault. There are other constraints that I could have written that do help the solver e.g. $$\forall j, k \geq 1 \quad t_{j, k-1} \geq t_{j, k}$$ In [19]: glpk_options = ['--log', 'glpk.log', '--mipgap', '0.0001'] prob.solve(GLPK_CMD(msg=3, options=glpk_options))  Out[19]: 1 In [20]: with open('glpk.log', 'r') as f: for line in f: print(line.rstrip('\n'))  GLPSOL: GLPK LP/MIP Solver, v4.52 Parameter(s) specified in the command line: --cpxlp /tmp/21270-pulp.lp -o /tmp/21270-pulp.sol --log glpk.log --mipgap 0.0001 Reading problem data from '/tmp/21270-pulp.lp'... 2061 rows, 8024 columns, 112105 non-zeros 8024 integer variables, 8012 of which are binary 28352 lines were read GLPK Integer Optimizer, v4.52 2061 rows, 8024 columns, 112105 non-zeros 8024 integer variables, 8012 of which are binary Preprocessing... 48 constraint coefficient(s) were reduced 2061 rows, 8024 columns, 112105 non-zeros 8024 integer variables, 8012 of which are binary Scaling... A: min|aij| = 2.500e-02 max|aij| = 2.000e+03 ratio = 8.000e+04 GM: min|aij| = 4.076e-01 max|aij| = 2.454e+00 ratio = 6.020e+00 EQ: min|aij| = 1.662e-01 max|aij| = 1.000e+00 ratio = 6.016e+00 2N: min|aij| = 1.250e-01 max|aij| = 1.466e+00 ratio = 1.173e+01 Constructing initial basis... Size of triangular part is 2061 Solving LP relaxation... GLPK Simplex Optimizer, v4.52 2061 rows, 8024 columns, 112105 non-zeros 0: obj = 9.052573077e+02 infeas = 3.002e+04 (0) 500: obj = 7.318330422e+02 infeas = 1.729e+04 (0) 1000: obj = 7.313162448e+02 infeas = 1.727e+04 (0) 1500: obj = 5.770586355e+02 infeas = 8.593e+03 (0) 2000: obj = 5.768708977e+02 infeas = 8.558e+03 (0) 2500: obj = 4.002909225e+02 infeas = 1.818e+03 (0) 3000: obj = 4.002909225e+02 infeas = 1.818e+03 (0) * 3194: obj = 3.519623224e+02 infeas = 0.000e+00 (0) * 3500: obj = 4.194004898e+02 infeas = 1.628e-14 (0) * 4000: obj = 4.763028465e+02 infeas = 6.725e-15 (0) * 4500: obj = 5.312186300e+02 infeas = 9.320e-13 (0) * 5000: obj = 5.841207596e+02 infeas = 0.000e+00 (0) * 5500: obj = 6.213808386e+02 infeas = 1.005e-15 (0) * 6000: obj = 6.556284187e+02 infeas = 4.441e-16 (0) * 6291: obj = 6.722113467e+02 infeas = 4.686e-28 (0) OPTIMAL LP SOLUTION FOUND Integer optimization begins... + 6291: mip = not found yet <= +inf (1; 0) Solution found by heuristic: 608.911217118 + 10112: mip = 6.089112171e+02 <= 6.672873502e+02 9.6% (60; 1) + 10372: mip = 6.089112171e+02 <= 6.672873502e+02 9.6% (154; 1) + 11536: mip = 6.089112171e+02 <= 6.633982957e+02 8.9% (202; 7) Solution found by heuristic: 612.579768421 + 11865: mip = 6.125797684e+02 <= 6.633982957e+02 8.3% (302; 7) + 16301: mip = 6.125797684e+02 <= 6.518788965e+02 6.4% (184; 394) Solution found by heuristic: 613.936474108 + 16607: mip = 6.139364741e+02 <= 6.518788965e+02 6.2% (296; 394) + 16787: mip = 6.139364741e+02 <= 6.518788965e+02 6.2% (418; 394) + 18177: mip = 6.139364741e+02 <= 6.485283998e+02 5.6% (301; 740) + 18567: mip = 6.139364741e+02 <= 6.485283998e+02 5.6% (427; 740) + 18902: mip = 6.139364741e+02 <= 6.485283998e+02 5.6% (567; 740) + 20030: mip = 6.139364741e+02 <= 6.485283998e+02 5.6% (688; 740) Solution found by heuristic: 622.714526489 Time used: 60.0 secs. Memory used: 20.9 Mb. + 21481: mip = 6.227145265e+02 <= 6.366592257e+02 2.2% (795; 743) + 21623: mip = 6.227145265e+02 <= 6.366592257e+02 2.2% (870; 743) Solution found by heuristic: 624.961551511 Solution found by heuristic: 625.000663038 Solution found by heuristic: 625.049770205 Solution found by heuristic: 625.098838944 Solution found by heuristic: 625.147751412 + 22462: mip = 6.251477514e+02 <= 6.251976645e+02 < 0.1% (52; 2498) RELATIVE MIP GAP TOLERANCE REACHED; SEARCH TERMINATED Time used: 71.6 secs Memory used: 24.5 Mb (25716666 bytes) Writing MIP solution to /tmp/21270-pulp.sol'...  In [21]: print("Status: {}".format(LpStatus[prob.status]))  Status: Optimal  Took a little longer, huh? In [22]: for j, m_type in enumerate(message_types): print('Total {} messages: {}'.format( m_type, np.sum([m[i, j].varValue for i in customer_index])))  Total email messages: 243 Total push messages: 691 Total text messages: 848 Total phone messages: 218  In [23]: total_cost = np.sum([np.sum([m[i, j].varValue * c[j, 0] for i in customer_index]) - np.sum([t[j, k].varValue * c[j, k] for k in tier_index[1:]]) for j in message_index]) print('Total cost:${:.2f}'.format(np.round(total_cost, 2)))

Total cost: \$1999.70

In [24]:
obj = np.sum([m[i, j].varValue * p[i, j]
for i in customer_index
for j in message_index])
print('Probability sum (i.e. the objective function): {}'.format(obj))

Probability sum (i.e. the objective function): 625.147751412


Note that we get to a higher objective value with these pricing tier discounts.