PRIMAL-DUAL ALGORITHMS FOR SEMIDEFINIT OPTIMIZATION PROBLEMS BASED ON GENERALIZED TRIGONOMETRIC BARRIER FUNCTION

Recently, M. Bouafoa, et al. [5] (Journal of optimization Theory and Applications, August, 2016),investigated a new kernel function which differs from the self-regular kernel functions. The kernel function has a trigonometric Barrier Term. In this paper we generalize the analysis presented in the above paper for Semidefinit Optimization Problems (SDO). It is shown that the interior-point methods based on this function for large-update methods, the iteration bound is improved significantly. For small-update interior point methods the iteration bound is the best currently known bound for primal-dual interior point methods. The analysis for SDO deviates significantly from the analysis for linear optimization. Several new tools and techniques are derived in this paper. AMS Subject Classification: 90C22, 90C31


Introduction
We consider the standard semidefinite optimization problem (SDO) where C and A i are symmetric n × n matrices, b, y ∈ R m , and X 0 means that X is symmetric positive semidefinite and Tr(A) denotes the trace of A (i.e., the sum of its diagonal elements).Without loss of generality the matrices A i are assumed to be linearly independent.Recall that for any two n × n matrices, A and B their natural inner product is given by IPMs provide a powerful approach for solving SDO problems.A comprehensive list of publications on SDO can be found in the SDO homepage maintained by Alizadeh [1].Pioneering works are due to Alizadeh [1,2] and Nesterov et al [11].Most IPMs for SDO can be viewed as natural extensions of IPMs for linear optimization (LO), and have similar polynomial complexity results.However, to obtain valid search directions is much more difficult than in the LO case.In the sequel we describe how the usual search directions are obtained for primal-dual methods for solving SDO problems.Our aim is to show that the kernel-function-based approach that we presented for LO in [7] can be generalized and applied also to SDO problems.

Classical search direction
We assume that (SDP) and (SDD) satisfy the interior-point condition (IPC), i.e., there exists X 0 ≻ 0 and (y 0 , S 0 ) with S 0 ≻ 0 such that X 0 is feasible for (SDP) and (y 0 , S 0 ) is feasible for (SDD).Moreover, we may assume that X 0 = S 0 = E, where E is the n × n identity matrix [12].Assuming the IPC, one can write the optimality conditions for the primal-dual pair of problems as follows. (1) The basic idea of primal-dual IPMs is to replace the complementarity condition XS = 0 by the parameterized equation XS = µE; X, S ≻ 0, where µ > 0. The resulting system has a unique solution for each µ > 0. This solution is denoted by (X(µ), y(µ), S(µ)) for each µ > 0; X(µ) is called the µ-center of (SDP ) and (y(µ), S(µ)) is the µ-center of (SDD).The set of µcenters (with µ > 0) defines a homotopy path, which is called the central path of (SDP ) and (SDD) [12,13].The principal idea of IPMs is to follow this central path and approach the optimal set as µ goes to zero.Newton's method amounts to linearizing the system (1), thus yielding the following system of equations.
This so-called Newton system has a unique solution (∆X, ∆y, ∆S).Note that ∆S is symmetric, due to the second equation in (2).However, a crucial point is that ∆X may be not symmetric.Many researchers have proposed various ways of 'symmetrizing' the third equation in the Newton system so that the new system has a unique symmetric solution.All these proposals can be described by using a symmetric nonsingular scaling matrix P and by replacing (2) by the system Now ∆X is automatically a symmetric matrix.

Nesterov-Todd direction
In this paper we consider the symmetrization schema of Nesterov-Todd [14].So we use where the last equality can be easily verified.Let D = P 1 2 , where P 1 2 denotes the symmetric square root of P .Now, the matrix D can be used to scale X and S to the same matrix V , namely [12,15]: Obviously the matrices D and V are symmetric, and positive definite.Let us further define Āi : and We refer to D X and D S as the scaled search directions.Now (3) can be rewritten as follows: In the sequel, we use the following notational conventions.Throughout this paper, • denotes the 2-norm of a vector.The nonnegative and the positive orthants are denoted as R n + and intR n + , respectively, and S n , S n + , and intS n + denote the cone of symmetric, symmetric positive semidefinite and symmetric positive definite n×n matrices, respectively.For any V ∈ S n , we denote by λ(V ) the vector of eigenvalues of V arranged in increasing order, λ 1 (V ) ≤ λ 2 (V ) ≤ , . . ., λ n (V ).For any square matrix A, we denote by denotes the diagonal matrix with the diagonal elements v i .

New search direction
In this section we introduce the new search direction.But we start with the definition of a matrix function [16,17].Definition 1.Let X be a symmetric matrix, and let be an eigenvalue decomposition of X, where λ i (X), 1 ≤ i ≤ n denotes the ith eigenvalue of X, and Q X is orthogonal.If ψ(t) is any univariate function whose domain contains {λ i (X); 1 ≤ i ≤ n} then the matrix function ψ(X) is defined by and the scalar function Ψ(X) is defined as follows [13]: The univariate function ψ is called the kernel function of the scalar function Ψ.
In this paper, when we use the function ψ(•) and its first three derivatives ψ ′ (•), ψ ′′ (•), and ψ ′′′ (•) without any specification, it denotes a matrix function if the argument is a matrix and a univariate function (from R to R) if the argument is in R.
Analogous to the case of LO, the kernel-function-based approach to SDO is obtained by modifying Nesterov-Todd direction [13].
The observation underlying our approach is that the right-hand side V −1 −V in the third equation of ( 6) is precisely −ψ ′ (V ) if ψ(t) = (t 2 − 1)/2 − log t, the latter being the kernel function of the well-known logarithmic barrier function.Note that this kernel function is strictly convex and nonnegative, whereas its domain contains all positive reals and it vanishes at 1.As we will now show any continuously differentiable kernel function ψ(t) with these properties gives rise to a primal-dual algorithm for SDO.
Given such a kernel function ψ(t) we replace the right-hand side V −1 − V in the third equation of (6) by −ψ ′ (V ), with ψ ′ (V ) defined according to Definition 1. Thus we use the following system to define the (scaled) search directions D X an D S : Having D X and D S , △X and △S can be calculated from (5).Due to the orthogonality of △X and △S, it is trivial to see that D X ⊥D S , and so The algorithm considered in this paper is described in Figure 1.The inner Generic Primal-Dual Algorithm for SDO Input: a kernel function ψ(t); a threshold parameter τ ≥ 1; an accuracy parameter ǫ > 0; a barrier update parameter θ, Find search directions by solving system (8); Determine a step size α; X = X + α△X; y = y + α∆y; S = S + α△S; Compute V from (4); end end end while loop in the algorithm is called inner iteration and the outer while loop outer iteration.So each outer iteration consists of an update of the barrier parameter and a sequence of one or more inner iterations.Note that by using the embedding strategy [12], we can initialize the algorithm with X = S = E. Since then XS = µE for µ = 1 it follows from (4) that V = E at the start of the algorithm, whence Ψ(V ) = 0. We then decrease µ to µ := (1 − θ)µ, for some θ ∈ (0, 1).In general this will increase the value of Ψ(V ) above the threshold value τ .To get this value smaller again, and coming closer to the current µ-center, we solve the scaled search directions from (8), and unscale these directions by using (5).By choosing an appropriate step size α, we move along the search direction, and construct a new triple (X + , y + , S + ) with If necessary, we repeat the procedure until we find iterates such that Ψ(V ) no longer exceed the threshold value τ , which means that the iterates are in a small enough neighborhood of (X(µ), y(µ), S(µ)).Then µ is again reduced by the factor 1−θ and we apply the same procedure targeting at the new µ-centers.This process is repeated until µ is small enough, i.e. until nµ ≤ ǫ.At this stage we have found an ǫ-solution of (SDP ) and (SDD).Just as in the LO case, the parameters τ, θ, and the step size α should be chosen in such a way that the algorithm is 'optimized' in the sense that the number of iterations required by the algorithm is as small as possible.Obviously, the resulting iteration bound will depend on the kernel function underlying the algorithm, and our main task becomes to find a kernel function that minimizes the iteration bound.The rest of the paper is organized as follows.In Section 3 we introduce the kernel function ψ(t) considered in this paper and discuss some of its properties that are needed in the analysis of the corresponding algorithm.In Section 4 we derive the properties of the barrier function Ψ(V ).The step size α and the resulting decrease of the barrier function are discussed in Section 5.The total iteration bound of the algorithm and the complexity results are derived in Section 6.Finally, some concluding remarks follow in Section 7.

Our kernel function and some of its properties
Recently in [5] investigated new kernel functions with trigonometric barrier for linear optimization.The extension to P * (κ)-linear complementarity problem was also presented in [10].The obtained complexity for large-update improve significantly the complexity obtained in [6,7].In this paper we consider kernel functions of the form with h(t) = π 2t+2 , and to show that the interior-point methods for SDO based on these function have favorable complexity results.
Note that the growth term of our kernel function is quadratic.However, this function (11) deviates from all other kernel functions since its barrier term is trigonometric as 4  pπ tan p π 2t+2 − 1 .In order to study the new kernel function, several new arguments had to be developed for the analysis.
This section is started by technical lemma, and then some properties of the new kernel function introduced in this paper are derived.

Some technical results
In the analysis of the algorithm based on ψ(t) we need its first three derivatives.These are given by with and the first three derivatives of h are given by The next lemma serves to prove that the new kernel function ( 11) is eligible.Lemma 2 (Lemma 3.2 in [5]).Let ψ be as defined in (11) and t > 0.Then, -c) It follows that ψ(1) = ψ ′ (1) = 0 and ψ ′′ (t) ≥ 0, proving that ψ is defined by ψ ′′ (t).
The second property (17-b) in Lemma 2 is related to Definition 2.1.1 and Lemma 2.1.2 in [13].This property is equivalent to convexity of the composed function z → ψ(e z ) and this holds if and only if ψ( )) for any t 1 , t 2 ≥ 0. Following [3], we therefore say that ψ is exponentially convex, or shortly, e-convex, whenever t > 0.
At some places below we apply the function Ψ to a positive vector v.The interpretation of Ψ(v) is compatible with Definition 1 when identifying the vector v with its diagonal matrix diag (v).When applying Ψ to this matrix we obtain

Properties of Ψ(V ) and δ(V )
In this section we extend Theorem 4.9 in [4] to the cone of positive definite matrices.The next theorem gives a lower bound on the norm-based proximity measure δ(V ), defined by in terms of Ψ(V ).Since Ψ(V ) is strictly convex and attains its minimal value zero at V = E, we have We denote by Theorem 5. Let ̺ be as defined in (20).Then Proof.If V = E then δ(V ) = Ψ(V ) = 0. Since ̺(0) = 1 and ψ ′ (1) = 0, the inequality holds with equality if V = E. Otherwise, by the definitions of δ(V ) in (19) and Ψ(V ) in (7), we have δ(V ) > 0 and Ψ(V ) > 0. Let Since ψ(t) satisfies (17-d) we may apply Theorem 4.9 in [4] to the vector v.This gives the proof of the theorem is complete.
Proof.The proof of this lemma uses Theorem 5 and Lemma 4. Putting s = Ψ(V ), we obtain from Theorem 5 that Putting t = ̺(s), we have by (20), We derive an upper bound for t, as this suffices for our goal.One has from (18) and ψ ′′ (t) ≥ 1, Assuming s ≥ 1, we get t = ̺ (s) ≤ √ s + √ 2s ≤ 3s 1 2 .Now applying Lemma 4 we may write This proves the lemma.Note that since τ ≥ 1 we have at the start of each inner iteration that Ψ(V ) ≥ 1. Substitution in (21) gives

Analysis of the algorithm
In the analysis of the algorithm the concept of exponential convexity [4,8] is again a crucial ingredient.In this section we derive a default value for the step size and we obtain an upper bound for the decrease in Ψ(V ) during a Newton step.

Three technical lemmas
The next lemma is cited from [16, Lemma 3.3.14(c)].Lemma 7. Let A, B ∈ S n be two nonsingular matrices and f (t) be given real-valued function such that f (e t ) is a convex function.One has where η i (A), and η i (B) i = 1, 2, ..., n denote the singular values of A and B respectively Lemma 8. Let A, A + B ∈ S n + , then one has . By the Rayleigh-Ritz theorem (see [18]), there exists a nonzero X 0 ∈ R n , such that We therefore may write This completes the proof of the lemma.A consequence of condition (17-b) is that any eligible kernel function is exponentially convex [13, Eq. (2.10)]: This implies the following Lemma, which is crucial for our purpose.

The decrease of the proximity in the inner iteration
In this subsection we are going to compute a default value for the step size α in order to yield a new triple (X + , y + , S + ) as defined in (10).After a damped step, using (5) we have Denoting the matrix V after the step as V + , we have Note that V 2 + is unitarily similar to the matrix 1 µ X 1 2 + and hence also to This implies that the eigenvalues of V + are the same as those of the matrix .
The definition of Ψ(V ) implies that its value depends only on the eigenvalues of V .Hence we have Ψ Ṽ+ = Ψ (V + ) .
Our aim is to find α such that the decrement is as small as possible.Due to Lemma 9, it follows that From the definition (25) of f (α), we now have f (α) ≤ f 1 (α), where Taking the derivative with respect to α, we get Using the last equality in (8) and also (19), this gives Differentiating once more, we obtain In the sequel we use the following notation: Lemma 10.One has Proof.The last equality in ( 8) and (19) Using Lemma 8 and V + αD X 0, As a consequence we have, for each i, Due to (17-d), ψ ′′ is monotonically decreasing.So the above inequalities imply that Substitution into (26) gives Now, using that D X and D S are orthogonal, by (9), and also D X + D S 2 = 4δ 2 , by (19), we obtain . This proves the lemma.
Using the notation v i = λ i (V ), 1 ≤ i ≤ n, again, we have which is exactly the same inequality as Lemma 3.1 in [7].This means that our analysis closely resembles the analysis of the LO case in [7].From this stage on we can apply similar arguments as in the LO case.In particular, the following two lemmas can be stated without proof.
For future use we define By Lemma 11 this step size satisfies (27).
Lemma 12.If the step size α is such that α ≤ α then Using the above lemmas from [7] we proceed as follows.

A uniform upper bound for Ψ
In this subsection we extend Theorem 3.2 in [4] to the cone of positive definite matrices.As we will see the proof of the next theorem easily follows from Theorem 3.2 in [4].Theorem 14.Let ̺ be as defined in (20).Then for any positive vector v and any β > 1 we have: Due to the fact that ψ(t) satisfies (17-c), at this stage we may use Theorem 3.2 in [4], which gives the theorem follows.
Before the update of µ we have Ψ(V ) ≤ τ , and after the update of µ to (1 − θ)µ we have Therefore we define In the sequel the value L(n, θ, τ ) is simply denoted as L. A crucial (but trivial) observation is that during the course of the algorithm the value of Ψ(V ) will never exceed L, because during the inner iterations the value of Ψ always decreases.

Complexity
We are now ready to derive the iteration bounds for large-update methods.An upper bound for the total number of (inner) iterations is obtained by multiplying an upper bound for the number of inner iterations between two successive updates of µ by the number of barrier parameter updates.The last number is bounded above by (cf.[19, Lemma II.17, page 116]) To obtain an upper bound K for the number of inner iterations between two successive updates we need a few more technical lemmas.
The following lemma is taken from Proposition 1.3.2 in [13].Its relevance is due to the fact that the barrier function values between two successive updates of µ yield a decreasing sequence of positive numbers.We will denote this sequence as Ψ 0 , Ψ 1 , . ... Lemma 15.Let t 0 , t 1 , • • • , t K be a sequence of positive numbers such that where κ > 0 and 0 < γ ≤ 1.Then K ≤ t γ 0 κγ .Lemma 16.If K denotes the number of inner iterations between two successive updates of µ, then Proof.The definition of K implies Ψ K−1 > τ and, according to Theorem 13, Ψ K ≤ τ and with κ = 1 7920p and γ = 2+p 2(1+p) .Application of Lemma 15, with t k = Ψ k yields the desired inequality.
Using ψ 0 ≤ L, where the number L is as given in (30), and Lemma 16 we obtain the following upper bound on the total number of iterations: 7920pL  Thus the right-hand side expression is then O √ n log n ǫ .

Concluding Remarks
In this paper we extended the results obtained for kernel-function-based IPMs in [5] for LO to semidefinite optimization problems.The analysis in this paper is new and different from the one using for LO.Several new tools and techniques are derived in this paper.The proposed function has a trigonometric barrier term but the function is not logarithmic and not self-regular.
For this parametric kernel function, we have shown that the best result of iteration bounds for large-update methods and small-update can be achieved, namely O log n √ n log n ǫ , for large-update and O √ n log n ǫ for small-update methods.
and its dual problem (SDD) (SDD) d * = sup y,S b T y : m i=1 y i A i + S = C, S 0 ,