PDE approach to the problem of online prediction with expert advice: a construction of potential-based strategies

We consider a sequence of repeated prediction games and formally pass to the limit. The supersolutions of the resulting non-linear parabolic partial differential equation are closely related to the potential functions in the sense of N.\,Cesa-Bianci, G.\,Lugosi (2003). Any such supersolution gives an upper bound for forecaster's regret and suggests a potential-based prediction strategy, satisfying the Blackwell condition. A conventional upper bound for the worst-case regret is justified by a simple verification argument.


Introduction
Let B be any set. In the problem of online prediction with expert advice a forecaster predicts a sequence (b t ) n−1 t=0 , b t ∈ B on the basis of expert opinions f i t ∈ A, i = 1, . . . , N , where A is a convex subset of a vector space. More precisely, at round t ∈ {0, . . . , n − 1} forecaster's guess a t is a convex combination of expert advices: based on the available history and current advices: p t = p t ((b s ) t−1 s=0 , (f s ) t s=0 ). Let l : A × B → [0, 1] be a loss function. Forecaster's aim is to keep the regret small. The regret measures the quality of predictions by comparing the cumulative loss of the forecaster with that of a best expert, chosen in hindsight. We refer to [4] for more information on this problem. The basic result (see, e.g, [4,Theorem 2.2]) guarantees the existence of a prediction strategy p * achieving the uniform bound for any (b t ) n−1 t=0 , (f t ) n−1 t=0 under the assumption that l is convex in its first argument. Moreover, this bound cannot be improved without further assumptions: [4,Theorem 3.7]. The inequality (1) implies that in the long run on average the forecaster predicts as well as a best expert: R n /n → 0, n → ∞.
There are plenty of strategies achieving the bound (1). In [3] it was shown that for a rather general class of online learning problems the construction of such strategies can be based on the notion of potential function. More recently [8] proposed a systematic way for the construction of potentials in the case of randomized prediction, mentioning that "The origin/recipe for "good" potential functions has always been a mystery (at least to the authors)." The authors of [8] considered a recurrence relation, for the value function of a repeated game, determining the optimal regret, and showed that potential functions are related to relaxations of this function, which are consistent with the mentioned recurrence relation. To obtain such relaxations they used upper bounds, developed in the theory of online learning and capturing the complexity of the problem.
In this paper we show that for the problem of prediction with expert advice there is another "natural" way for the appearance of potential-based algorithms. As in [8], we consider a repeated game, determining the optimal regret, and the correspondent recurrence relation for the value functions v n . Further, in contrast to [8], we simply pass to the limit as n → ∞ and get a non-linear parabolic Bellman-Isaacs type partial differential equation in [0, 1] × R N . A rigorous justification of this procedure can be performed within the theory of viscosity solutions. However, being interested only in the construction of prediction strategies, we need not do it! As usual, a Bellman-type equation at least formally produces optimal strategies. More precisely, we consider the strategies, generated by appropriate smooth supersolutions, and then directly check the inequality (1), using the argumentation similar to that of the verification method from the theory of optimal control.
The described approach is mainly inspired by the paper [6], where there was studied a link between fully non-linear second order (parabolic and elliptic) PDE and repeated games. Its application to the problems of online learning theory was initiated in [10], where an asymptotics of the sequential Rademacher complexity (the last notion was introduced in [7]) of a finite function class was related to the viscosity solution of a G-heat equation. In turn, the result of [10] is based on the central limit theorem under model uncertainty, studied within the same approach in [9].

Prediction Game and the Limiting PDE
The worst-case regret is a result of the repeated game between the predictor, an adversary and experts. In this game the adversary has an informational advantage over the predictor and experts, since b t is chosen after the sequences (p j ) t j=0 , (f j ) t j=0 are revealed. Furthermore, the predictor has an informational advantage over the experts, since the choice of p t can be based on (f j ) t j=0 , (b j ) t−1 j=0 . Finally, experts can use only the information contained in (p j ) t−1 j=0 , (b j ) t−1 j=0 . The adversary and experts play against the predictor, trying to maximize his regret.
To get a recurrent formula for R n let us introduce the family of state processes Summing up the increments From the dynamic programming theory it is known that the value functions t ≤ n − 1, r = (r 1 , . . . , r N ). We stress that we need not rigorously justify this and subsequent claims, since our goal is to formally construct prediction strategies. Their verification is delayed to the last step.
For a moment imagine that v n is a smooth function, satisfying (3) on [0, 1 − 1/n] × R N . Then, by Taylor's formula we get where v n x , v n xx are the gradient vector and the Hessian matrix. We will say that the loss function l satisfies the Blackwell condition if for all (γ, f ) ∈ R N + × A N . Clearly, Γ(0, f ) = ∆. The Blackwell condition (5) is satisfied if l is convex in its first argument. In this case p = γ/ γ, by Jensen's inequality. By the nature of v n these functions are non-decreasing in each x i . Indeed, v n (t/n, x) is the optimal worst-case regret if the initial regret with respect to i-th expert at time moment t equals to x i . From (4) we get So, we expect that the limiting function v satisfies the inequality is a fully non-linear parabolic equation (see [5]). Along with (7) we consider the boundary condition The functions v n are defined on Q n = {0, 1/n, . . . , (n − 1)/n, 1} × R N . To describe their limiting behavior in a rigorous way, one can consider the Barles-Perthame half-relaxed (weak) upper limit: and v n k (t k , x k ) converges}.
From the results of [1,2,6] and the above calculations we expect that v is a viscosity subsolution of (7), (8). Fortunately, we need not care about the correctness of this conclusion, which is not evident in the present context. Note, that by the definition, lim sup

Smooth Supersolutions and Induced Weighted Average Forecasting Strategies
Take a smooth supersolution w of (7), (8): which is non-decreasing in each variable x i . Assuming a comparison result: v ≤ w, we conclude that the inequality (9) holds true for w(0, 0) instead of v(0, 0). We also expect that a strategy p t (x, f ) ∈ Γ(w x (t, x), f ) will produce the regret, satisfying this bound.

Let us look for supersolutions of the form w(t, x)
and Φ is non-decreasing in each variable. The differential inequality (10) implies the condition G(Φ x (x), Φ xx (x)) ≤ c. This condition is satisfied if By the Blackwell condition (5) there exists a vector-function If l is convex in its first argument and Φ is strictly increasing in each variable, then, according to the remark after the formula (5), one can take Proof. For w(t, x) = c(1 − t) + Φ(x) by Taylor's formula we get for some ξ t , where the last inequality is implied by (16) and (12). Now the assertion of the theorem follows from the condition (11): [3,4] we call Φ a potential function. The most natural smooth upper bound for max{x 1 , . . . , x N }, and hence a candidate for a potential, is a soft-maximum function where ψ and φ are assumed to be concave and convex respectively, was considered in [3,4]. The following inequality is also taken from [3,4]: For p * t generated by (18), in accordance with Theorem 1 we have for an "optimal" choice η = √ 2 ln N (cf. [4,Corollary 2.2]). The formula (17) reduces to where L i t = t−1 s=0 l(f i t , b s ) is the cumulative loss of i-th expert. This is a basic version of the exponentially weighted average forecaster: see [4, Chapter 2].

Randomized Prediction
Assume that the forecaster randomly chooses a prediction by taking a sample I t from a probability distribution p t = (p 1 t , . . . , p N t ) over {y 1 , . . . , y N }. His cumulative loss is compared with the cumulative loss of a best fixed prediction: and the regret is defined as the expectation of this quantity with respect to the induced artificial probability measure: The game, where the forecaster knows the previous moves: p t = p t (b 0 , . . . , b t−1 ), and the adversary knows the prediction algorithm: b t = b t (p 0 , . . . , p t ) but not the predictions I t itself, corresponds to the case of an oblivious adversary: [4,Chapter 2]. However, the case of non-oblivious adversary is not interesting for the problem of this form: see [4, Lemma 4.1]. The described game is simpler than that considered above, since the "experts", corresponding to fixed predictions, do not play against the forecaster. Moreover, the condition (5) is satisfied regardless of the convexity of l. Repeating the reasoning of Section 2, we get the inequality (6) So, a prediction strategy satisfying where Φ meets the conditions of Theorem 1, and X * t is defined by the recursion of the form (15), produces the regret R n ≤ C/ √ n. In particular, C = √ 2 ln N for the exponentially weighted average forecaster, discussed after Theorem 1.
Finally, we note that the case of the internal regret (see [4,Section 4.4]) can be considered in the same way.

Acknowledgements
The research is supported by the Russian Science Foundation, project 17-19-01038.