We are a sharing community. So please help us by uploading **1** new document or like us to download:

OR LIKE TO DOWNLOAD IMMEDIATELY

1 >> Kamal Jain: Hello. It's my pleasure to introduce Rohit Khandekar . He's from IBM. He's been a researcher there for the last couple of years. He took his Ph.D. from IT Delhi from where I did my undergrad, so he got more out of it there. And also at IBM I just did an internship there. And here it is. >> Rohit Khandekar: Thank you, Kamal. Really glad to be here. So I'll be talking about some recent work that I did with Baruch Awerbuch. It's on stateless distributed gradient descent for positive linear programs. And for those that don't like complicated titles, I'm essentially going to give a very simple and fast converging algorithm for solving certain families of linear programs. So let me start with a motivating example. It's one of distributed flow control. So you're given a flow network, it's a graph with edge capacities. And you're given a bunch of commodities, where each commodity corresponds to a fixed part in the network. So the red commodity wants to flow along this fixed part, from this node to that node; the blue commodity wants to flow along the blue part and so on. And you're objective is to maximize the total profit you accrue by routing the flow subject to its capacity constraints. More precisely, think of BP as profit you accrue by sending unit flow along path B and FP is the flow you're going to send along path B. So you want to maximize the total profit subject to the constraint that the total flow through any edge is at most its capacity. Now, this is an instance of a packing linear program. It's a linear program, and in fact explicitly given there are many algorithms to solve this. But the model that we are going to look at today is somewhat more restrictive, and let me begin by describing it. So we assume that there is an agent associated with every path. And that agent decides how much flow is going to get routed along that path. But these agents have limited information about the instance. They don't know the entire network, they don't know the entire -- they don't know how much other parts are there in the network, they don't know what are their individual flows. All that they can see at any point is the edge congestions on the edges on their own path. And just based on this information they are supposed to update the flow values. So our question is ->>: But what is the goal of the individual agents? >> Rohit Khandekar: Okay. So these individual edges are going to follow some prescribed protocol that you give them, and we want to design a simple protocol that when these agents act according to the protocol, they together converge to a global optimum solution. So there are no [inaudible] issues here. Right. So these agents are not greedy, they don't have their own objective function. >>: [inaudible] to follow the protocol.

2 >> Rohit Khandekar: Right. So that will be another -- another question. Yeah. But here we are not looking at [inaudible] issues. We just want to -- these agents are faithful to follow whatever you tell them to. The main question is ->>: Can you tell me where we can find these agents? >> Rohit Khandekar: We'll take that -- we'll discuss that offline. So the question is can these agents achieve these global optimum solution just by looking at this local feedback that they get from the system. And, moreover, we want the protocol or the algorithm to be stateless and self-stabilizing. So we really want the routing -- the decisions that these agents take just to just depend on the current flow situation. They should not store any history about the past execution. And such a thing is useful, especially in a distributed system, where things are not robust. If the edge capacities change over time or the instance changes because some new agents come into the system, you don't want the algorithm to start from zero or start from a well-defined state. You just want to start from the current situation, and these agents should be able to update just from the current solution and converge to a near optimum solution to the new instance. Now, as I said, the flow control problem was really an instance of a packing linear program, and we can in fact talk about -- talk like in more generality about packing linear programs. So here is a linear program where we want to maximize a linear function C vertex subject to some packing constraints, X less than equal to B. And all the coefficients in A, C, and B are nonnegative. And we work with the same model. In particular, we assume that there is an agent J associated with every variable XJ, and that agent knows the column corresponding to that variable, he knows the column J in the constraint matrix, and he also knows its coefficient in the objective function, CJ. And this agent is allowed to get some feedback from the constraints that he's present in. So, for instance, in this picture this agent corresponding to this variable is present in these four constraints. So at any point he's allowed to read off the so-called condition values, namely AIX over BI for the constraints for which AIJ is nonzero. And the question is again the same: Just based on this local feedback, can the update -- is XJ value such that these agents together converge to a near optimum solution quickly. So we'll be interested in a fast converging stateless algorithm, and that's the -- that's the main topic of this talk. So in the rest of the talk hopefully I'll be able to give you enough intuition and enough details about our algorithm. So before I start describing the algorithm, let me make some simplification. This is just to simplify the notation and make the presentation a little easier to follow. So this was the original LP that we were working with. It's maximize C dot X subject to

3 some packing constraints. I'm going to assume that the vector C and B are vectors of all 1s. All of the entries are 1. And that can be done without loss of generality by scaling. To make all Bs 1, you have to scale the rows; to make all Cs 1, you have to scale the columns. And we also assume that all the nonzero entries in A lie in the range 1 to A max, where A max is some real number. And we assume that these agents know the values of M, N, and A max. M is the number of constraints; N is the number of variables. They need not know the exact values, they may -- it's enough to know some upper bounds. These values are used to set some parameters in the algorithm. Okay. So our algorithm is based on a primal dual complemented slackness conditions and how they imply near optimality. So let me briefly mention that -- so this is the primal LP that we're trying to solve. Let's consider it's dual. So we also see the variable YI with every ith constraint. And the object in the dual is minimize 1 dot Ys so that A transpose Y is at least 1. Now, let's recall primal and dual complemented slackness conditions. So the primal complemented slackness conditions state that if Jth primal variable is positive, then the Jth dual constraint is tight. And similarly, dual complemented slackness conditions state that if ith dual variable is positive, then the ith primal constraint is tight. And it's very easy to prove that if X and Y form feasible solutions to the primal and dual respectively, and then they satisfy -- and if they satisfy both the complemented slackness conditions, then they in fact form optimum solutions to the corresponding linear programs. So our algorithm is going to try and make the solution X and Y satisfy complemented slackness condition. So that's the general goal. And eventually we'll be able to show that -- the optimum, near optimum solutions. Okay. So with this introduction, let me describe the algorithm. The algorithm is really very simple to -- it fits on a single slide and it's very simple to follow. So we start with any solution X that satisfies primal constraints. So start with any X square and equal to 0, so that X is less than equal to 1. And now I'm going to repeat -every agent is going to repeat some execution. So in every round we associate dual variables YI corresponding to the primal constraints, and essentially YI capture how tightly the corresponding primal constraints are satisfied. So if ith constraint is very tightly satisfied -- namely, AIX is very close to 1 -- then the value of YI is going to be large. >>: [inaudible] >> Rohit Khandekar: So let's understand this. So let's, on the other hand, assume that AIX is very small as compared to 1. If the ith constraint is very well satisfied, then this quantity is going to be negative. And I'll give you the values later, but mu is going to be -- mu is a large constraint. So if this quantity is negative, then YI is going to be very close to zero.

4

On the other hand, if AIX is equal to 1, it's tight, then this is 0, so Y is going to be 1. So the whole point here that YI capture how tightly the corresponding constraints are satisfied in the primal. Larger the YI, tighter is the constraint. So you can think of YI is also importance of the primal constraints. So if some constraint is very tight, then it's very important for me to focus on it. So YI gives a high weight for that constraint. And then every agent J is going do some simple update on its variables. So what agent J does is as follows. So it computes the value of AJ transpose Y. Remember, the dual constraints where agent transpose Y greater than or equal to 1. So the Jth agent is essentially computing the value of Jth dual constraint. And it's going to compare that value to 1. >>: [inaudible] the row or the column? >> Rohit Khandekar: So the primal agents are indexed by J. And so we have a primal variable XJ, and there is an agent J ->>: So you have agents for each ->> Rohit Khandekar: Each column ->>: [inaudible] >> Rohit Khandekar: No. We have agent just with primal variables. There are no agents with duals. We only get some feedback from the system corresponding to the constraints. >>: [inaudible] does the agents -- the ith agent know the ith role or the [inaudible] column of the matrix? >> Rohit Khandekar: So ith agent know it is ith column of matrix A. >>: Don't -- aren't you using both the column and the rows there? >> Rohit Khandekar: No. So -- okay. So agent transpose Y really indicates the Jth entry in A transpose Y. So this is -- agent transpose J -- sorry. Agent transpose is really the Jth column of A. >>: It's like a dual constraint corresponding to ->> Rohit Khandekar: Right. So that's the dual constraint corresponding to the Jth primal variable. So if you -- let's go back to the ->>: AIX is also -- AIX is not the ith row times X? >> Rohit Khandekar: AIX denotes -- yes. So there is -- yeah. So AIX denotes the ith -AI denotes the I through --

5 >>: [inaudible] AIX [inaudible] know something about [inaudible]. >> Rohit Khandekar: Right. Right. So he know it is values of AIX over BI for the constraints that he's present at. And he knows the -- he know it is Jth column. All right. So what does agent J do. He computes the value of the Jth constraint, Jth dual constraint; namely, agent transpose Y. So this is essentially the dot product of J column of A with Y. And he's going to compare that with 1. If this value is much smaller than 1, then he's going to increase his variable by a small multiplicative factor. On the other hand, if this value is much larger than 1, then he's going to decrease his value by a small multiplicative factor. And we need this somewhat -- we need to have some additive change here, because if XJ zero to begin with, then just multiplicative change will not be enough to change XJ. So just to bootstrap that multiplicative change, we have this additive small delta. So what's happening here? Let's try to understand. In the flow maximization problem, we have agents associated with paths. And YIs are associated with the edges. So AIX denotes the total flow on edge I. And 1 corresponds to the capacity of edge E -- edge I. So if the flow is very close to capacity, then the dual variable associated with an edge is going the be very large. So think of YI as the length of edge I. All right. So if the flow is very close to the capacity, then the length of edge I is going to be large. Right? Now, what does agent corresponding to path J do? He computes the length of path J under this length metric, and if this length is small, then he thinks that there is enough space for pushing more flow. So in that case he updates his flow by multiplicative factor. On the other hand, the length of his own path is much larger than target length to 1. Then there have perhaps too much congestion on that part, so he reduces his own flow. So just -- it's easy to verify that an agent J just needs to know the values of YI for the constraints that he's present in, for the constraints for which AIJ is nonzero. Because he just needs to know the value of this product, agent [inaudible] plus Y. >>: Do you maintain that physical flow during the process [inaudible]? >> Rohit Khandekar: Yes. We will prove in a minute that the flow always remains feasible, so under these updates, for appropriate values of this constraints mu [inaudible] delta, the solution X, the primal solution X always remains feasible. You always satisfy X less than equal to 1. So that's the entire algorithm. So start from any feasible point and just make these updates. And we will show some nice properties about this algorithm. So any questions regarding the algorithm? Okay. So what do we show? What is the main result of our paper? We show that this algorithm starts from any -- starting from any feasible solution X, the algorithm always maintains feasibility, primal feasibility, X less than equal to 1, and converges to a 1 plus epsilon approximation in number of rounds that the polylog in the size of the LP and 1

6 over poly in epsilon. Yes. >>: The math looks very much like a summarization of Sutherland's paper [inaudible] mark in computer time. Is it? >> Rohit Khandekar: I'm not aware of that [inaudible]. >>: It's a classic. He divided up time on a PDP-1 computer, and guess how long time ago [inaudible]. >> Rohit Khandekar: Um-hmm. >>: By playing with prizes per hour. >> Rohit Khandekar: I see. >>: Look it up on the Web. >> Rohit Khandekar: Sure. Thank you. All right. So starting from any feasible solution ->>: Didn't prove anything about that. He didn't prove anything, but in one university where I put this article on the boss's desk, it was implemented on Monday and by Wednesday it was as stable as a rock. Of course, that's anecdotal evidence. It didn't prove anything. Sorry. >> Rohit Khandekar: So we show that this algorithm converges to a near optimum solution in polylog number of [inaudible]. And as you can see, this algorithm just depends on this local feedback that every agent needs to know from the system, and that's why it can be distributed -- it can be implemented in a distributed manner, if there is such a feedback available. And, moreover, this is self-stabilizing, because we start from any feasible solution. So as soon as the instance changes, there's a very easy rule to make the current solution feasible for the new instance by -- in a single step. And from then on you can just run with this algorithm and converge to a near optimum solution quickly. >>: So if you're nonfeasible, you cannot -- you have to do a [inaudible]? >> Rohit Khandekar: Yes. It's actually very easy to do preprocessing step. So let's say if you know -- if any variable is present in an infeasible constraint, then he can just set its value to zero in one step. >>: Or zero is feasible. >> Rohit Khandekar: Right. >>: No, but then [inaudible].

7 >> Rohit Khandekar: Right. You don't want to start from [inaudible] zero. So you should set your value to zero only if you're part of infeasible constraint. So that's why -- I mean, if -- the whole point here is that if the instance changes in some part of the system, you don't want everyone to start from zero, you want the system to kind of self-adjust itself and the effects should travel to the rest of the system slowly. >>: [inaudible] how you get delta? >> Rohit Khandekar: Delta is set to -- delta is sufficiently small that increasing variables by [inaudible] delta should not affect system too much. And to set these parameters, the agents need to know the values of M, N, and A max. That's the only place where they use the information about size of the network. So I like to think of this algorithm as a [inaudible] interior point method, because starting from any point inside the feasible region, the algorithm takes an internal path and converges to a near optimum solution in polylog number of [inaudible]. Okay. So let me say a few words about previous work on [inaudible] algorithms for solving linear programs. So there has been a lot of work on designing [inaudible] algorithms for packing and covering linear programs. And most of those algorithms are not stateless. So, for instance, in Plotkins, Shmoys, Tardos [inaudible], the algorithm set starts all the variables from zero and then increments some -- it maintains some global information and increments some selected set of variables. The only work that I know of which is truly stateless is the work of Gurgan Young and Fox [phonetic] 2002 where they designed an algorithm for a multi-commodity flow problem with fixed paths, so essentially the flow control problem that I mentioned. And they thought of flow as injecting -- they thought of flow as a stream of packets injected at the source. And the only feedback that they could receive was the total packet loss rate end to end. So the packets traveled through the network, and if some edge is more congested than it should be, then a random fraction of the packets were dropped. And the agents can read off the values of the total end-to-end drop rate. And just based on that information, they updated the condition rate, the packet injection rate. So they showed convergence to 1 plus epsilon approximate solutions in time proportional to C max, where C max is the maximum objective coefficient. And it was not clear how to generalize this algorithm for arbitrary positive linear programs. So we can think of our algorithm as somewhat extending this work to arbitrary linear programs, and also improving the convergence rate. >>: Feedback [inaudible]. >> Rohit Khandekar: Feedback ->>: [inaudible] >> Rohit Khandekar: Feedback is somewhat related, because there -- so the end-to-end

8 drop rate is actually related to the total length of your path of your associated exponential length function. So there is some connection, although it's not -- it's not an explicit ->>: As you stated, you need to get feedback from each edge [inaudible]. >> Rohit Khandekar: As I -- no. Actually a path just needs to know the total length of the path. We don't need separate lengths from the edges. Because we compare the total length of the path with 1, and if the path length is small, we increase the flow. If path length is larger than 1, we decrease the flow. So we just need to know the aggregate path length. All right. So on this slide I've briefly mentioned the key differences from previous work. So most of the previous algorithms, as I mentioned, start from zero. And they only increase the variables. And they maintain some global threshold information and increase the variables that are better than the threshold. On the other hand, we start from arbitrary feasible solutions. And since we start from arbitrary solutions, we need to have an ability to both increase and decrease the variables if we want to achieve near optimality. And we are stateless. We don't maintain any state information or we don't need any initialization. But they have better convergence, dependence on epsilon. I think it's 1 over epsilon squared, YLR convergence is only -we could show only a 1 over epsilon to the 5 convergence. Okay. So let me go and give you some intuition about the analysis. So let me quickly argue that algorithm always maintains feasibility, primal feasibility throughout the execution. So to prove this, we show that all the YIs are at most 1. And if all the YIs are at most 1, then it follows immediately that AIX is less than equal to 1. Now suppose not. Suppose some YI increases to more than 1. Now, for that to happen, some XJ with AIJ greater than zero must increase to make Y larger than 1. But for XJ to increase, the value of AJ transpose Y must be less than 1 minus alpha from the definition of the algorithm. Only then can XJ increase. But all the nonzero entries in A are at least 1. And in particular AIJ is at least 1. Therefore, Y1 must be less than1 minus alpha before this increase. Right? So YI is sufficiently smaller than 1. And now comes the trick. So we set the parameters [inaudible] namely, the step length parameter, the multiplicative increase by which we update the variables [inaudible] small enough so that any YI changes by very small multiplicative factor. More precisely, we prove that YI changes by most a factor of alpha over 4 in any single round. So if YI is less than 1 minus alpha to begin with ->>: [inaudible] >> Rohit Khandekar: No. Multiplicative. 1 plus alpha over 4. Right. Changes by at most a multiplicative factor of alpha over 4. So if YI is less than 1 minus alpha, then in a single round it cannot be more than 1. So, in a sense, the point here is that we have to set the step length parameter small enough if you want to maintain feasibility. But you should not set it too small, because you want

9 fast convergence. So that will come next. Now, under the quick point that we should verify before proving that the algorithm converges to near optimality is that the fixed points of this algorithm are [inaudible] near optimum. Because the fixed points are fixed points and the [inaudible] near optimum if your theorem is true. So let's quickly show that. So let's take a pair of solutions X and Y. X is the solution and Y is the corresponding setting of lower variables. And assume that X comma Y has entered a fixed point of our algorithm. We'll show that X and Y both correspond to near optimum solutions to primal and dual respectively. So first let's show approximate feasibility. Since X is a fixed point, no XJ is changing; therefore the value of AJ transpose Y for every J is between 1 minus alpha and 1 plus alpha. So all the dual constraints are not only approximately satisfied, they are very tight. All the dual constraints are tightly satisfied. And in fact in the previous slide, we already showed that X is always feasible. So primal feasibility is already established. So we have that both X and Y are approximate feasible. Now we argue that the complemented slackness -- both the complemented slackness conditions are satisfied in approximate sense. More precisely, if XI -- if XJ is large, then the Jth dual constraint is approximately tight. Well, we just argued that all the dual constraints are approximately tight, so the primal complemented slackness conditions are trivially satisfied. On the other hand, if YI is large, then from the definition of YI, the ith primal constraint is approximately satisfied. Because we called it, YI was a very fast growing function of how tightly the constraint was -- corresponding constraint was satisfied. >>: Is this true for all solutions [inaudible]? >> Rohit Khandekar: Yes. Yes. This point is actually true for all X and Y. And if you put these two things together, we can argue that X and Y actually form near optimum solutions. So the fixed points are actually near optimum solutions to both primal and dual. But since our goal is to not just analyze fixed points but to show fast convergence, we need some measure of some potential that we can track and prove that we are making significant progress toward optimality in every step. So, to this end, we ->>: The only place where this fixed point is the -- is ability of the dual or [inaudible] where is the property that the fixed point [inaudible] as being the solution? >> Rohit Khandekar: I guess we only use it here. That's right. Approximate dual feasibility. That's right. So to show fast convergence, we have to have some [inaudible] property of the algorithm. Because the variables are going up and down, so it's very hard to track what's happening in the system. So that's why we looked at this particular potential function, which is essentially the primal objective function, submission XJ, minus a penalty for violating

10 primal constraints. So recall YI was -- think of YI as penalty for violating the ith constraint. So you get profit by routing more slow, but you get penalized for violating its capacity constraints. And, in fact, to give some intuition about this potential, I tried to plot -- so let's consider a very simple linear program, which has two variables, X1 and X2, and the linear program is maximize X1 plus X2, subject to the constraint that X1 is less than equal to 1, and X2 is less than equal to 1. So what's happening here is that I have plotted -- so this is X1, this is X2, and along the Z axis I have plotted the potential function. So the potential function increases with X1 and X2. But as we come close to the boundary of the [inaudible] the penalty shoots up and the potential function dips to minus infinity. So as you get closer and closer to the boundary, the values of YI increase so rapidly that you -- that it starts penalizing. So the maximum point of the potential function, which is here, is actually very close to the optimum solution, which is at this vertex. Because as long as you are far from the boundary, you are doing the same thing as the objective function does. But when you go close to the boundary, your potential function dips. So what we show in our analysis is that our algorithm actually is working in hindsight with this potential function, and this potential function only [inaudible] increases doing the algorithm. So analytically this potential function has some nice properties. And this is what the next equation shows. So first of all know that this potential function is a differentiable function of variables XJ. It's not only continuous, but since YI are exponential functions of AIX minus 1, it's actually very smooth and you can differentiate. So if you look at the partial [inaudible] of the potential function with respect to Jth variable, and if you plug in this expression and simplify, then it turns out to be equal to 1 minus AJ transpose Y. So this is the gap -- this captures how tightly the J constraint is satisfied. And if you recall, the decision of whether to increase XJ or decrease XJ was actually based on this quantity. So if you approximate the change in the potential by the first order of terms -- namely, the change in XJ values times the derivatives -- then you can show that this [inaudible] is in fact always positive, nonnegative. Because you increase XJ only if this is positive. And you decrease XJ only if this is negative. So this is the basic reason why the potential only increases. And, moreover, we can say something stronger. We call that -- we increase XJ only if this is significantly larger than zero. This is at least alpha. And we decrease XJ if this is at most minus alpha. So overall the increase in the potential is at least alpha times the absolute change in the X values. So if X values are changing rapidly, no matter whether they're going up or down, they're changing rapidly, you're making progress with respect to the potential. And that's why you can think of this algorithm as doing distributed gradient ascent on this potential. Okay. So how do we show fast convergence? So we just notice that a significant change in X values leads to a significant increase in the potential. So, similarly, you can show

11 that a significant change in Y values leads to significant increase in the potential. Because, intuitively speaking, Y values are [inaudible] by X values. So if Y values are changing rapidly, then X values are also changing. And that's why the potential function increases. But the potential function is bounded. It cannot always increase. I mean, it cannot increase without bound. So such a thing cannot happen forever. So it cannot happen that X and Y values are changing a lot for a long time. And then we show this important [inaudible] that says that if X and Y values do not change significantly in an interval of appropriate length, then we have near optimality. >>: [inaudible] >> Rohit Khandekar: Epsilon is the error parameter that -- so we have [inaudible] 1 plus epsilon approximation. So more precisely if you take any interval of logarithmic number of rounds, and if you notice that throughout the interval the X and Y values did not change by too much, then we show that X and Y -- X, solution X throughout the interval is actually a near optimum solution to the primal. So let's call this a stationary interval, an interval of logarithmic length where X and Y values do not change. And on this slide I'm -- I've tried to show you an intuition of the proof Y stationary interval implies near optimality. So consider a stationary interval and consider any X in that interval. And if X is not near optimal, then we show that there exist a variable XJ such that AJ transpose Y is consistently less than 1 minus alpha throughout the interval. And if such a thing holds, then from the definition of the algorithm, XJ will be increased multiplicatively. But since length of the interval is logarithmic, XJ will become more than 1 soon, which would contradict the feasibility that we have already shown. So the key point here is that assuming that we are not near optimal and assuming that the X and Y values are not changing rapidly, [inaudible] exists an XJ for which this property holds. And the proof of that factor is also easy. It's essentially four lines. But the intuitive reason is that if we're very far away from optimality, then there must be some variable XJ that optimum solution is exploiting that you are missing on. And that is the variable that -- that is the variable that you should focus on. So that's the intuition. But, more precisely, let me just walk you through this chain of equations. So let's say X star is the optimum solution. From our assumption that X is not near optimum, we know that 1 dot X star is much larger than 1 dot X. This is the objective values. Now, since X values are stationary, all the AJ transpose Ys are approximately equal to 1. So this is roughly equal to this. But since Y is a fast growing function, most of the mass of Y is concentrated on those constraints for which AIX is roughly equal to 1. So that's why this is approximately equal to this. And since X star is at most 1, X star is the optimum solution, so it satisfies the constraints, this is at least Y transpose X star. So in a nutshell what we get is that 1 dot X star is much larger than Y transpose X star.

12

Now, think of this quantity as an average of values AJ transpose Y under the weight X star J. So this divided by this is the average value of AJ transpose Y, which is much larger -- much less than 1. So in particular there will be a variable XJ for which AJ transpose Y is much less than 1. So of course this is just an outline. The details are given in the paper. So let me summarize. So we just solve very simple stateless and gradient descent based algorithm for positive -- for packing linear programs. And a very similar algorithm also works for covering linear programs, where the object is to minimize C dot X, where X is the X squared and equal to B. Mix packing and covering, the positive linear program where we have both packing and covering constraints. And some convex programs where the objective is -- let's say we want to maximize some concave function of Xs subject to some packing constraints. Like in flow control, instead of maximizing the flow control, if you want to maximize a fair objective function, like log on the flows, then you can use similar ideas. The main open questions that I would like to know answers to is what happens if the linear programs are not given explicitly. Let's say in multi-commodity flows if the paths are not fixed, then it's an implicitly given LP because there is one variable for every path. So can you implement this algorithm in a compact way so that the running time remains polynomial. And of course look for other applications of this simple update rule, let's say from market equilibrium questions, can we compute market equilibria using such algorithms. I'll stop here. Thank you for your attention. [applause] >>: So you talked about the [inaudible] convex program [inaudible] the objective functions enough for additive, objective function is multiplicative. >> Rohit Khandekar: So what we use crucially in this analysis is the positiveness of the packing linear programs. So, for instance, I don't know how to generalize this to arbitrary linear programs even. >>: No, but the constraint [inaudible] packing constraints. >> Rohit Khandekar: Are packing constraints. >>: Yeah. The only thing different is the objective function, instead of saying objective function it is summation AIXI, it is summation AI log XI or you can interpret it as a multiplication XI [inaudible]. >> Rohit Khandekar: Right. So if the objective function is -- so we are trying to maximize a concave objective function. >>: [inaudible]

13 >> Rohit Khandekar: Subject to packing constraints. Yes. So I think for that it should work too. Although, we have to be careful regarding what kind of approximate solutions are we computing, because we may be computing a weak approximate [inaudible]. >>: Let's say you're approximating the convex program. >> Rohit Khandekar: Right. I think some algorithm like this should work. Because all that we're really using is convexity of some potential function. And basically doing gradient descent on that. >>: [inaudible] taking a constraint optimization and converting to [inaudible]. >> Rohit Khandekar: Right. Right. Adding a penalty function for violating constraints. >>: But the other part is somehow you're -- by doing this you're also getting this [inaudible]. >> Rohit Khandekar: Right. So I really think that that is good for extending this work to the more general mathematical programs. >>: Could you go back one slide. >> Rohit Khandekar: This? >>: No, no. Yeah. The open [inaudible]. So what does it mean [inaudible]? >> Rohit Khandekar: So here -- so our algorithm maintained a separate agent for every variable. And we are -- and an agent is updating his own variable value. So here you would like to associate an agent with a commodity, a sourcing pair, and not with every path. So if you -- if you try to -- if you kind of try to implicitly update all the flow variables by not doing this exponential amount of work, but just try to do [inaudible] amount of work, then at least the initial calculations run into some problems. So another question is can we modify the update rule to make it more smooth and then give an implicit way of updating all the flows along the exponential paths implicitly. >>: [inaudible] your update rule different on the length of the [inaudible] so you can [inaudible] ->>: Or could the exponential number of paths ->> Rohit Khandekar: But that exponential number of paths, so -- and you don't want to maintain flow values for every path. So we have been able to show some results in this direction. More precisely, if the path lengths are bounded -- if let's say I allow you to send flows along paths of length at most edge, then we can do -- we can get convergence which is polynomial in edge. Since sort of polylog convergence, we can show that with convergent like edge square number of

14 [inaudible]. So there you don't have to maintain -- you just maintain the flow, not the flow path decomposition. And then you update the flow along short paths in your flow. But then -but I don't know how to show this for general -- and to get -- normal importantly to get polylogarithmic in N conversions. >>: [inaudible] since you allow access to start from [inaudible] then we should be able to move access -- act quickly to the new optimal. Did you look at that? >> Rohit Khandekar: That's a good question. Yeah. I've been asked that question before. I don't know how to show fast -- I mean, convergence which is better than this for starting from a near optimum solution. All right. That's a nice -- interesting question. I don't know how to show that here. >> Kamal Jain: Let's thank the speaker. >> Rohit Khandekar: Thank you. [applause]

View more...
2 >> Rohit Khandekar: Right. So that will be another -- another question. Yeah. But here we are not looking at [inaudible] issues. We just want to -- these agents are faithful to follow whatever you tell them to. The main question is ->>: Can you tell me where we can find these agents? >> Rohit Khandekar: We'll take that -- we'll discuss that offline. So the question is can these agents achieve these global optimum solution just by looking at this local feedback that they get from the system. And, moreover, we want the protocol or the algorithm to be stateless and self-stabilizing. So we really want the routing -- the decisions that these agents take just to just depend on the current flow situation. They should not store any history about the past execution. And such a thing is useful, especially in a distributed system, where things are not robust. If the edge capacities change over time or the instance changes because some new agents come into the system, you don't want the algorithm to start from zero or start from a well-defined state. You just want to start from the current situation, and these agents should be able to update just from the current solution and converge to a near optimum solution to the new instance. Now, as I said, the flow control problem was really an instance of a packing linear program, and we can in fact talk about -- talk like in more generality about packing linear programs. So here is a linear program where we want to maximize a linear function C vertex subject to some packing constraints, X less than equal to B. And all the coefficients in A, C, and B are nonnegative. And we work with the same model. In particular, we assume that there is an agent J associated with every variable XJ, and that agent knows the column corresponding to that variable, he knows the column J in the constraint matrix, and he also knows its coefficient in the objective function, CJ. And this agent is allowed to get some feedback from the constraints that he's present in. So, for instance, in this picture this agent corresponding to this variable is present in these four constraints. So at any point he's allowed to read off the so-called condition values, namely AIX over BI for the constraints for which AIJ is nonzero. And the question is again the same: Just based on this local feedback, can the update -- is XJ value such that these agents together converge to a near optimum solution quickly. So we'll be interested in a fast converging stateless algorithm, and that's the -- that's the main topic of this talk. So in the rest of the talk hopefully I'll be able to give you enough intuition and enough details about our algorithm. So before I start describing the algorithm, let me make some simplification. This is just to simplify the notation and make the presentation a little easier to follow. So this was the original LP that we were working with. It's maximize C dot X subject to

3 some packing constraints. I'm going to assume that the vector C and B are vectors of all 1s. All of the entries are 1. And that can be done without loss of generality by scaling. To make all Bs 1, you have to scale the rows; to make all Cs 1, you have to scale the columns. And we also assume that all the nonzero entries in A lie in the range 1 to A max, where A max is some real number. And we assume that these agents know the values of M, N, and A max. M is the number of constraints; N is the number of variables. They need not know the exact values, they may -- it's enough to know some upper bounds. These values are used to set some parameters in the algorithm. Okay. So our algorithm is based on a primal dual complemented slackness conditions and how they imply near optimality. So let me briefly mention that -- so this is the primal LP that we're trying to solve. Let's consider it's dual. So we also see the variable YI with every ith constraint. And the object in the dual is minimize 1 dot Ys so that A transpose Y is at least 1. Now, let's recall primal and dual complemented slackness conditions. So the primal complemented slackness conditions state that if Jth primal variable is positive, then the Jth dual constraint is tight. And similarly, dual complemented slackness conditions state that if ith dual variable is positive, then the ith primal constraint is tight. And it's very easy to prove that if X and Y form feasible solutions to the primal and dual respectively, and then they satisfy -- and if they satisfy both the complemented slackness conditions, then they in fact form optimum solutions to the corresponding linear programs. So our algorithm is going to try and make the solution X and Y satisfy complemented slackness condition. So that's the general goal. And eventually we'll be able to show that -- the optimum, near optimum solutions. Okay. So with this introduction, let me describe the algorithm. The algorithm is really very simple to -- it fits on a single slide and it's very simple to follow. So we start with any solution X that satisfies primal constraints. So start with any X square and equal to 0, so that X is less than equal to 1. And now I'm going to repeat -every agent is going to repeat some execution. So in every round we associate dual variables YI corresponding to the primal constraints, and essentially YI capture how tightly the corresponding primal constraints are satisfied. So if ith constraint is very tightly satisfied -- namely, AIX is very close to 1 -- then the value of YI is going to be large. >>: [inaudible] >> Rohit Khandekar: So let's understand this. So let's, on the other hand, assume that AIX is very small as compared to 1. If the ith constraint is very well satisfied, then this quantity is going to be negative. And I'll give you the values later, but mu is going to be -- mu is a large constraint. So if this quantity is negative, then YI is going to be very close to zero.

4

On the other hand, if AIX is equal to 1, it's tight, then this is 0, so Y is going to be 1. So the whole point here that YI capture how tightly the corresponding constraints are satisfied in the primal. Larger the YI, tighter is the constraint. So you can think of YI is also importance of the primal constraints. So if some constraint is very tight, then it's very important for me to focus on it. So YI gives a high weight for that constraint. And then every agent J is going do some simple update on its variables. So what agent J does is as follows. So it computes the value of AJ transpose Y. Remember, the dual constraints where agent transpose Y greater than or equal to 1. So the Jth agent is essentially computing the value of Jth dual constraint. And it's going to compare that value to 1. >>: [inaudible] the row or the column? >> Rohit Khandekar: So the primal agents are indexed by J. And so we have a primal variable XJ, and there is an agent J ->>: So you have agents for each ->> Rohit Khandekar: Each column ->>: [inaudible] >> Rohit Khandekar: No. We have agent just with primal variables. There are no agents with duals. We only get some feedback from the system corresponding to the constraints. >>: [inaudible] does the agents -- the ith agent know the ith role or the [inaudible] column of the matrix? >> Rohit Khandekar: So ith agent know it is ith column of matrix A. >>: Don't -- aren't you using both the column and the rows there? >> Rohit Khandekar: No. So -- okay. So agent transpose Y really indicates the Jth entry in A transpose Y. So this is -- agent transpose J -- sorry. Agent transpose is really the Jth column of A. >>: It's like a dual constraint corresponding to ->> Rohit Khandekar: Right. So that's the dual constraint corresponding to the Jth primal variable. So if you -- let's go back to the ->>: AIX is also -- AIX is not the ith row times X? >> Rohit Khandekar: AIX denotes -- yes. So there is -- yeah. So AIX denotes the ith -AI denotes the I through --

5 >>: [inaudible] AIX [inaudible] know something about [inaudible]. >> Rohit Khandekar: Right. Right. So he know it is values of AIX over BI for the constraints that he's present at. And he knows the -- he know it is Jth column. All right. So what does agent J do. He computes the value of the Jth constraint, Jth dual constraint; namely, agent transpose Y. So this is essentially the dot product of J column of A with Y. And he's going to compare that with 1. If this value is much smaller than 1, then he's going to increase his variable by a small multiplicative factor. On the other hand, if this value is much larger than 1, then he's going to decrease his value by a small multiplicative factor. And we need this somewhat -- we need to have some additive change here, because if XJ zero to begin with, then just multiplicative change will not be enough to change XJ. So just to bootstrap that multiplicative change, we have this additive small delta. So what's happening here? Let's try to understand. In the flow maximization problem, we have agents associated with paths. And YIs are associated with the edges. So AIX denotes the total flow on edge I. And 1 corresponds to the capacity of edge E -- edge I. So if the flow is very close to capacity, then the dual variable associated with an edge is going the be very large. So think of YI as the length of edge I. All right. So if the flow is very close to the capacity, then the length of edge I is going to be large. Right? Now, what does agent corresponding to path J do? He computes the length of path J under this length metric, and if this length is small, then he thinks that there is enough space for pushing more flow. So in that case he updates his flow by multiplicative factor. On the other hand, the length of his own path is much larger than target length to 1. Then there have perhaps too much congestion on that part, so he reduces his own flow. So just -- it's easy to verify that an agent J just needs to know the values of YI for the constraints that he's present in, for the constraints for which AIJ is nonzero. Because he just needs to know the value of this product, agent [inaudible] plus Y. >>: Do you maintain that physical flow during the process [inaudible]? >> Rohit Khandekar: Yes. We will prove in a minute that the flow always remains feasible, so under these updates, for appropriate values of this constraints mu [inaudible] delta, the solution X, the primal solution X always remains feasible. You always satisfy X less than equal to 1. So that's the entire algorithm. So start from any feasible point and just make these updates. And we will show some nice properties about this algorithm. So any questions regarding the algorithm? Okay. So what do we show? What is the main result of our paper? We show that this algorithm starts from any -- starting from any feasible solution X, the algorithm always maintains feasibility, primal feasibility, X less than equal to 1, and converges to a 1 plus epsilon approximation in number of rounds that the polylog in the size of the LP and 1

6 over poly in epsilon. Yes. >>: The math looks very much like a summarization of Sutherland's paper [inaudible] mark in computer time. Is it? >> Rohit Khandekar: I'm not aware of that [inaudible]. >>: It's a classic. He divided up time on a PDP-1 computer, and guess how long time ago [inaudible]. >> Rohit Khandekar: Um-hmm. >>: By playing with prizes per hour. >> Rohit Khandekar: I see. >>: Look it up on the Web. >> Rohit Khandekar: Sure. Thank you. All right. So starting from any feasible solution ->>: Didn't prove anything about that. He didn't prove anything, but in one university where I put this article on the boss's desk, it was implemented on Monday and by Wednesday it was as stable as a rock. Of course, that's anecdotal evidence. It didn't prove anything. Sorry. >> Rohit Khandekar: So we show that this algorithm converges to a near optimum solution in polylog number of [inaudible]. And as you can see, this algorithm just depends on this local feedback that every agent needs to know from the system, and that's why it can be distributed -- it can be implemented in a distributed manner, if there is such a feedback available. And, moreover, this is self-stabilizing, because we start from any feasible solution. So as soon as the instance changes, there's a very easy rule to make the current solution feasible for the new instance by -- in a single step. And from then on you can just run with this algorithm and converge to a near optimum solution quickly. >>: So if you're nonfeasible, you cannot -- you have to do a [inaudible]? >> Rohit Khandekar: Yes. It's actually very easy to do preprocessing step. So let's say if you know -- if any variable is present in an infeasible constraint, then he can just set its value to zero in one step. >>: Or zero is feasible. >> Rohit Khandekar: Right. >>: No, but then [inaudible].

7 >> Rohit Khandekar: Right. You don't want to start from [inaudible] zero. So you should set your value to zero only if you're part of infeasible constraint. So that's why -- I mean, if -- the whole point here is that if the instance changes in some part of the system, you don't want everyone to start from zero, you want the system to kind of self-adjust itself and the effects should travel to the rest of the system slowly. >>: [inaudible] how you get delta? >> Rohit Khandekar: Delta is set to -- delta is sufficiently small that increasing variables by [inaudible] delta should not affect system too much. And to set these parameters, the agents need to know the values of M, N, and A max. That's the only place where they use the information about size of the network. So I like to think of this algorithm as a [inaudible] interior point method, because starting from any point inside the feasible region, the algorithm takes an internal path and converges to a near optimum solution in polylog number of [inaudible]. Okay. So let me say a few words about previous work on [inaudible] algorithms for solving linear programs. So there has been a lot of work on designing [inaudible] algorithms for packing and covering linear programs. And most of those algorithms are not stateless. So, for instance, in Plotkins, Shmoys, Tardos [inaudible], the algorithm set starts all the variables from zero and then increments some -- it maintains some global information and increments some selected set of variables. The only work that I know of which is truly stateless is the work of Gurgan Young and Fox [phonetic] 2002 where they designed an algorithm for a multi-commodity flow problem with fixed paths, so essentially the flow control problem that I mentioned. And they thought of flow as injecting -- they thought of flow as a stream of packets injected at the source. And the only feedback that they could receive was the total packet loss rate end to end. So the packets traveled through the network, and if some edge is more congested than it should be, then a random fraction of the packets were dropped. And the agents can read off the values of the total end-to-end drop rate. And just based on that information, they updated the condition rate, the packet injection rate. So they showed convergence to 1 plus epsilon approximate solutions in time proportional to C max, where C max is the maximum objective coefficient. And it was not clear how to generalize this algorithm for arbitrary positive linear programs. So we can think of our algorithm as somewhat extending this work to arbitrary linear programs, and also improving the convergence rate. >>: Feedback [inaudible]. >> Rohit Khandekar: Feedback ->>: [inaudible] >> Rohit Khandekar: Feedback is somewhat related, because there -- so the end-to-end

8 drop rate is actually related to the total length of your path of your associated exponential length function. So there is some connection, although it's not -- it's not an explicit ->>: As you stated, you need to get feedback from each edge [inaudible]. >> Rohit Khandekar: As I -- no. Actually a path just needs to know the total length of the path. We don't need separate lengths from the edges. Because we compare the total length of the path with 1, and if the path length is small, we increase the flow. If path length is larger than 1, we decrease the flow. So we just need to know the aggregate path length. All right. So on this slide I've briefly mentioned the key differences from previous work. So most of the previous algorithms, as I mentioned, start from zero. And they only increase the variables. And they maintain some global threshold information and increase the variables that are better than the threshold. On the other hand, we start from arbitrary feasible solutions. And since we start from arbitrary solutions, we need to have an ability to both increase and decrease the variables if we want to achieve near optimality. And we are stateless. We don't maintain any state information or we don't need any initialization. But they have better convergence, dependence on epsilon. I think it's 1 over epsilon squared, YLR convergence is only -we could show only a 1 over epsilon to the 5 convergence. Okay. So let me go and give you some intuition about the analysis. So let me quickly argue that algorithm always maintains feasibility, primal feasibility throughout the execution. So to prove this, we show that all the YIs are at most 1. And if all the YIs are at most 1, then it follows immediately that AIX is less than equal to 1. Now suppose not. Suppose some YI increases to more than 1. Now, for that to happen, some XJ with AIJ greater than zero must increase to make Y larger than 1. But for XJ to increase, the value of AJ transpose Y must be less than 1 minus alpha from the definition of the algorithm. Only then can XJ increase. But all the nonzero entries in A are at least 1. And in particular AIJ is at least 1. Therefore, Y1 must be less than1 minus alpha before this increase. Right? So YI is sufficiently smaller than 1. And now comes the trick. So we set the parameters [inaudible] namely, the step length parameter, the multiplicative increase by which we update the variables [inaudible] small enough so that any YI changes by very small multiplicative factor. More precisely, we prove that YI changes by most a factor of alpha over 4 in any single round. So if YI is less than 1 minus alpha to begin with ->>: [inaudible] >> Rohit Khandekar: No. Multiplicative. 1 plus alpha over 4. Right. Changes by at most a multiplicative factor of alpha over 4. So if YI is less than 1 minus alpha, then in a single round it cannot be more than 1. So, in a sense, the point here is that we have to set the step length parameter small enough if you want to maintain feasibility. But you should not set it too small, because you want

9 fast convergence. So that will come next. Now, under the quick point that we should verify before proving that the algorithm converges to near optimality is that the fixed points of this algorithm are [inaudible] near optimum. Because the fixed points are fixed points and the [inaudible] near optimum if your theorem is true. So let's quickly show that. So let's take a pair of solutions X and Y. X is the solution and Y is the corresponding setting of lower variables. And assume that X comma Y has entered a fixed point of our algorithm. We'll show that X and Y both correspond to near optimum solutions to primal and dual respectively. So first let's show approximate feasibility. Since X is a fixed point, no XJ is changing; therefore the value of AJ transpose Y for every J is between 1 minus alpha and 1 plus alpha. So all the dual constraints are not only approximately satisfied, they are very tight. All the dual constraints are tightly satisfied. And in fact in the previous slide, we already showed that X is always feasible. So primal feasibility is already established. So we have that both X and Y are approximate feasible. Now we argue that the complemented slackness -- both the complemented slackness conditions are satisfied in approximate sense. More precisely, if XI -- if XJ is large, then the Jth dual constraint is approximately tight. Well, we just argued that all the dual constraints are approximately tight, so the primal complemented slackness conditions are trivially satisfied. On the other hand, if YI is large, then from the definition of YI, the ith primal constraint is approximately satisfied. Because we called it, YI was a very fast growing function of how tightly the constraint was -- corresponding constraint was satisfied. >>: Is this true for all solutions [inaudible]? >> Rohit Khandekar: Yes. Yes. This point is actually true for all X and Y. And if you put these two things together, we can argue that X and Y actually form near optimum solutions. So the fixed points are actually near optimum solutions to both primal and dual. But since our goal is to not just analyze fixed points but to show fast convergence, we need some measure of some potential that we can track and prove that we are making significant progress toward optimality in every step. So, to this end, we ->>: The only place where this fixed point is the -- is ability of the dual or [inaudible] where is the property that the fixed point [inaudible] as being the solution? >> Rohit Khandekar: I guess we only use it here. That's right. Approximate dual feasibility. That's right. So to show fast convergence, we have to have some [inaudible] property of the algorithm. Because the variables are going up and down, so it's very hard to track what's happening in the system. So that's why we looked at this particular potential function, which is essentially the primal objective function, submission XJ, minus a penalty for violating

10 primal constraints. So recall YI was -- think of YI as penalty for violating the ith constraint. So you get profit by routing more slow, but you get penalized for violating its capacity constraints. And, in fact, to give some intuition about this potential, I tried to plot -- so let's consider a very simple linear program, which has two variables, X1 and X2, and the linear program is maximize X1 plus X2, subject to the constraint that X1 is less than equal to 1, and X2 is less than equal to 1. So what's happening here is that I have plotted -- so this is X1, this is X2, and along the Z axis I have plotted the potential function. So the potential function increases with X1 and X2. But as we come close to the boundary of the [inaudible] the penalty shoots up and the potential function dips to minus infinity. So as you get closer and closer to the boundary, the values of YI increase so rapidly that you -- that it starts penalizing. So the maximum point of the potential function, which is here, is actually very close to the optimum solution, which is at this vertex. Because as long as you are far from the boundary, you are doing the same thing as the objective function does. But when you go close to the boundary, your potential function dips. So what we show in our analysis is that our algorithm actually is working in hindsight with this potential function, and this potential function only [inaudible] increases doing the algorithm. So analytically this potential function has some nice properties. And this is what the next equation shows. So first of all know that this potential function is a differentiable function of variables XJ. It's not only continuous, but since YI are exponential functions of AIX minus 1, it's actually very smooth and you can differentiate. So if you look at the partial [inaudible] of the potential function with respect to Jth variable, and if you plug in this expression and simplify, then it turns out to be equal to 1 minus AJ transpose Y. So this is the gap -- this captures how tightly the J constraint is satisfied. And if you recall, the decision of whether to increase XJ or decrease XJ was actually based on this quantity. So if you approximate the change in the potential by the first order of terms -- namely, the change in XJ values times the derivatives -- then you can show that this [inaudible] is in fact always positive, nonnegative. Because you increase XJ only if this is positive. And you decrease XJ only if this is negative. So this is the basic reason why the potential only increases. And, moreover, we can say something stronger. We call that -- we increase XJ only if this is significantly larger than zero. This is at least alpha. And we decrease XJ if this is at most minus alpha. So overall the increase in the potential is at least alpha times the absolute change in the X values. So if X values are changing rapidly, no matter whether they're going up or down, they're changing rapidly, you're making progress with respect to the potential. And that's why you can think of this algorithm as doing distributed gradient ascent on this potential. Okay. So how do we show fast convergence? So we just notice that a significant change in X values leads to a significant increase in the potential. So, similarly, you can show

11 that a significant change in Y values leads to significant increase in the potential. Because, intuitively speaking, Y values are [inaudible] by X values. So if Y values are changing rapidly, then X values are also changing. And that's why the potential function increases. But the potential function is bounded. It cannot always increase. I mean, it cannot increase without bound. So such a thing cannot happen forever. So it cannot happen that X and Y values are changing a lot for a long time. And then we show this important [inaudible] that says that if X and Y values do not change significantly in an interval of appropriate length, then we have near optimality. >>: [inaudible] >> Rohit Khandekar: Epsilon is the error parameter that -- so we have [inaudible] 1 plus epsilon approximation. So more precisely if you take any interval of logarithmic number of rounds, and if you notice that throughout the interval the X and Y values did not change by too much, then we show that X and Y -- X, solution X throughout the interval is actually a near optimum solution to the primal. So let's call this a stationary interval, an interval of logarithmic length where X and Y values do not change. And on this slide I'm -- I've tried to show you an intuition of the proof Y stationary interval implies near optimality. So consider a stationary interval and consider any X in that interval. And if X is not near optimal, then we show that there exist a variable XJ such that AJ transpose Y is consistently less than 1 minus alpha throughout the interval. And if such a thing holds, then from the definition of the algorithm, XJ will be increased multiplicatively. But since length of the interval is logarithmic, XJ will become more than 1 soon, which would contradict the feasibility that we have already shown. So the key point here is that assuming that we are not near optimal and assuming that the X and Y values are not changing rapidly, [inaudible] exists an XJ for which this property holds. And the proof of that factor is also easy. It's essentially four lines. But the intuitive reason is that if we're very far away from optimality, then there must be some variable XJ that optimum solution is exploiting that you are missing on. And that is the variable that -- that is the variable that you should focus on. So that's the intuition. But, more precisely, let me just walk you through this chain of equations. So let's say X star is the optimum solution. From our assumption that X is not near optimum, we know that 1 dot X star is much larger than 1 dot X. This is the objective values. Now, since X values are stationary, all the AJ transpose Ys are approximately equal to 1. So this is roughly equal to this. But since Y is a fast growing function, most of the mass of Y is concentrated on those constraints for which AIX is roughly equal to 1. So that's why this is approximately equal to this. And since X star is at most 1, X star is the optimum solution, so it satisfies the constraints, this is at least Y transpose X star. So in a nutshell what we get is that 1 dot X star is much larger than Y transpose X star.

12

Now, think of this quantity as an average of values AJ transpose Y under the weight X star J. So this divided by this is the average value of AJ transpose Y, which is much larger -- much less than 1. So in particular there will be a variable XJ for which AJ transpose Y is much less than 1. So of course this is just an outline. The details are given in the paper. So let me summarize. So we just solve very simple stateless and gradient descent based algorithm for positive -- for packing linear programs. And a very similar algorithm also works for covering linear programs, where the object is to minimize C dot X, where X is the X squared and equal to B. Mix packing and covering, the positive linear program where we have both packing and covering constraints. And some convex programs where the objective is -- let's say we want to maximize some concave function of Xs subject to some packing constraints. Like in flow control, instead of maximizing the flow control, if you want to maximize a fair objective function, like log on the flows, then you can use similar ideas. The main open questions that I would like to know answers to is what happens if the linear programs are not given explicitly. Let's say in multi-commodity flows if the paths are not fixed, then it's an implicitly given LP because there is one variable for every path. So can you implement this algorithm in a compact way so that the running time remains polynomial. And of course look for other applications of this simple update rule, let's say from market equilibrium questions, can we compute market equilibria using such algorithms. I'll stop here. Thank you for your attention. [applause] >>: So you talked about the [inaudible] convex program [inaudible] the objective functions enough for additive, objective function is multiplicative. >> Rohit Khandekar: So what we use crucially in this analysis is the positiveness of the packing linear programs. So, for instance, I don't know how to generalize this to arbitrary linear programs even. >>: No, but the constraint [inaudible] packing constraints. >> Rohit Khandekar: Are packing constraints. >>: Yeah. The only thing different is the objective function, instead of saying objective function it is summation AIXI, it is summation AI log XI or you can interpret it as a multiplication XI [inaudible]. >> Rohit Khandekar: Right. So if the objective function is -- so we are trying to maximize a concave objective function. >>: [inaudible]

13 >> Rohit Khandekar: Subject to packing constraints. Yes. So I think for that it should work too. Although, we have to be careful regarding what kind of approximate solutions are we computing, because we may be computing a weak approximate [inaudible]. >>: Let's say you're approximating the convex program. >> Rohit Khandekar: Right. I think some algorithm like this should work. Because all that we're really using is convexity of some potential function. And basically doing gradient descent on that. >>: [inaudible] taking a constraint optimization and converting to [inaudible]. >> Rohit Khandekar: Right. Right. Adding a penalty function for violating constraints. >>: But the other part is somehow you're -- by doing this you're also getting this [inaudible]. >> Rohit Khandekar: Right. So I really think that that is good for extending this work to the more general mathematical programs. >>: Could you go back one slide. >> Rohit Khandekar: This? >>: No, no. Yeah. The open [inaudible]. So what does it mean [inaudible]? >> Rohit Khandekar: So here -- so our algorithm maintained a separate agent for every variable. And we are -- and an agent is updating his own variable value. So here you would like to associate an agent with a commodity, a sourcing pair, and not with every path. So if you -- if you try to -- if you kind of try to implicitly update all the flow variables by not doing this exponential amount of work, but just try to do [inaudible] amount of work, then at least the initial calculations run into some problems. So another question is can we modify the update rule to make it more smooth and then give an implicit way of updating all the flows along the exponential paths implicitly. >>: [inaudible] your update rule different on the length of the [inaudible] so you can [inaudible] ->>: Or could the exponential number of paths ->> Rohit Khandekar: But that exponential number of paths, so -- and you don't want to maintain flow values for every path. So we have been able to show some results in this direction. More precisely, if the path lengths are bounded -- if let's say I allow you to send flows along paths of length at most edge, then we can do -- we can get convergence which is polynomial in edge. Since sort of polylog convergence, we can show that with convergent like edge square number of

14 [inaudible]. So there you don't have to maintain -- you just maintain the flow, not the flow path decomposition. And then you update the flow along short paths in your flow. But then -but I don't know how to show this for general -- and to get -- normal importantly to get polylogarithmic in N conversions. >>: [inaudible] since you allow access to start from [inaudible] then we should be able to move access -- act quickly to the new optimal. Did you look at that? >> Rohit Khandekar: That's a good question. Yeah. I've been asked that question before. I don't know how to show fast -- I mean, convergence which is better than this for starting from a near optimum solution. All right. That's a nice -- interesting question. I don't know how to show that here. >> Kamal Jain: Let's thank the speaker. >> Rohit Khandekar: Thank you. [applause]

Download "1 >> Kamal Jain: Hello. It\'s my pleasure to... He\'s been a researcher there for the last couple of..."

We are a sharing community. So please help us by uploading **1** new document or like us to download:

OR LIKE TO DOWNLOAD IMMEDIATELY

Thank you for interesting in our services. We are a non-profit group that run this website to share documents. We need your help to maintenance this website.

To keep our site running, we need your help to cover our server cost (about $400/m), a small donation will help us a lot.