The Modified BAPGs Method for Support Vector Machine Classifier with Truncated Loss

Kexin Ren

doi:10.4236/am.2024.154015

Applied Mathematics > Vol.15 No.4, April 2024

The Modified BAPG_s Method for Support Vector Machine Classifier with Truncated Loss

Kexin Ren
School of Information Science and Technology, Jinan University, Guangzhou, China.
DOI: 10.4236/am.2024.154015 PDF HTML XML 28 Downloads 146 Views

Abstract

In this paper, we modify the Bregman APG_s (BAPG_s) method proposed in (Wang, L, et al.) for solving the support vector machine problem with truncated loss (HTPSVM) given in (Zhu, W, et al.), we also add an adaptive parameter selection technique based on (Ren, K, et al.). In each iteration, we use the linear approximation method to get the explicit solution of the subproblem and set a function to apply the Bregman distance. Finally, numerical experiments are performed to verify the efficiency of BAPG_s.

Keywords

HTPSVM, Bregman Distance, BAPG_s Algorithm

Share and Cite:

Ren, K. (2024) The Modified BAPG_s Method for Support Vector Machine Classifier with Truncated Loss. Applied Mathematics, 15, 267-278. doi: 10.4236/am.2024.154015.

1. Introduction

SVM (Support Vector Machine) [1] is a supervised learning algorithm commonly used for classification tasks and has been successfully applied to many technological fields, such as text categorization [2] , financial forecast [3] , image classification [4] and so on. This paper focuses on a binary classification problem. Given training samples ${(x_{i}, y_{i}), i = 1, \dots, m}$ , where $x_{i} \in ℝ^{n}$ , $y_{i} \in {- 1,1}$ , the objective of SVM is to identify an optimal separating hyperplane to separate data points into two classes. Scholars have proposed some classic SVM models based on convex loss functions, such as the hinge loss (also called L₁ loss) in classic SVM [5] , the least square loss in LSSVM [6] and the huberized pinball loss in HPSVM [7] . However, in practice, the real dataset often contain noise. Since convex loss functions are generally unbounded, convex losses are highly sensitive to outliers and potentially influenced by outliers. Therefore, some nonconvex loss functions are proposed to improve robustness compared with the convex loss functions [8] . For example, [9] proposed the ramp loss based on hinge loss, the truncated pinball loss was proposed by [10] . Recently, a noise insensitive and robust support vector machine classifier with huberied truncated pinball loss (HTPSVM) was proposed in [11] , this loss is smooth and nonconvex loss function. The HTPSVM can be transformed into format of “Loss + Penalty”, in which the penalty is a hybrid of $l_{1}$ norm and $l_{2}$ norm penalty.

Here, the HTPSVM model and algorithm of literature [11] are briefly introduced. Consider a classification problem with training samples ${x_{i}, y_{i}}_{i = 1}^{m} \subset ℝ^{d} \times {- 1,1}$ . The HTPSVM seeks to solve the following regularization problem:

$\min_{b \in ℝ, w \in ℝ^{d}} \frac{1}{m} \sum_{i = 1}^{m} l_{h t p} (y_{i} (b + w^{T} x_{i})) + λ {‖ w ‖}_{1} + \frac{{‖ w ‖}_{2}^{2}}{2} + \frac{b^{2}}{2},$ (1)

the huberied truncated pinball loss $l_{h t p} (\cdot)$ function is defines as

$l_{h t p} (u) = {\begin{array}{l} 1, & u \leq - \frac{2}{5} \\ \frac{4}{5} - u - \frac{5}{4} u^{2}, & - \frac{2}{5} < u \leq 0, \\ \frac{4}{5} - u, & 0 < u \leq \frac{3}{5}, \\ \frac{5}{4} {(1 - u)}^{2}, & \frac{3}{5} < u \leq 1, \\ \frac{5}{8} {(1 - u)}^{2}, & 1 \leq u < \frac{7}{5}, \\ - \frac{1}{2} (\frac{6}{5} - u), & \frac{7}{5} \leq u < \frac{8}{5}, \\ - \frac{1}{2} (\frac{6}{5} - u) - \frac{5}{8} {(u - \frac{8}{5})}^{2}, & \frac{8}{5} \leq u < 2, \\ \frac{3}{10}, & u \geq 2, \end{array}$ (2)

which is a nonconvex and smooth function. The HTPSVM combine the benefits of both $l_{1}$ and $l_{2}$ norm regularizers and and it has been demonstrated in [11] that it can reduce the effects of noise in the training sample. Therefore, we consider that studying the HTPSVM model is meaningful. The APG algorithm was used to solve the model in [11] . [12] applied the APG_s method (first proposed in [13] ) to solve problem (1) and obtain better convergence behavior. However we find that the proximal operator for computing the $l_{1}$ norm causes the subproblem to be solved slowly in APG and APG_s algorithms, we attempt to accelerate the solution process for this model. Recently, [14] propose the Bregman APG_s (BAPG_s) method, which avoids the restrictive global Lipschitz gradient continuity assumption. In this paper, we improve BAPG_s algorithm to solve the problem (1) and replace the Lipschitz constant by an appropriate positive definite matrix and obtain better results after we perform numerical experiments on 10 datasets to test our method.

The rest of this paper is organized as follows. In the next section, we provide preliminary materials used in this work. In Section 3, we introduce the BAPG_s algorithm proposed by [14] and present our algorithm based on the BAPG_s method for solving the HTPSVM model (1). The convergence of our method is also discussed. Section 4 performs some experiments.

2. Preliminaries

In this paper, we let $ℝ$ denote the set of real numbers. We work in the Euclidean space $ℝ^{n}$ , and the standard Euclidean inner product and the induced norm on $ℝ^{n}$ are denoted by $〈 \cdot, \cdot 〉$ and $‖ \cdot ‖$ . The domain of the function $f : ℝ^{n} \to (- \infty, + \infty]$ is defined by $dom f = {x \in ℝ^{n} : f (x) < + \infty}$ . We say that f is proper if $dom f \neq \emptyset$ . A proper function f is said to be closed if it is lower semicontinuous at any $x \in dom f$ , i.e. $f (x) \leq \lim \inf_{z \to x} f (z)$ .

Definition 1. [ [15] , Definition 8.3] For a proper closed function f, the regular subdifferential of $f : ℝ^{n} \to ℝ \cup {+ \infty}$ at $x \in dom f$ is defined by

$\hat{\partial} f (x) : = {\hat{x} \in ℝ_{n} : \underset{z \to x, z \neq x}{\lim \inf} \frac{f (z) - f (x) - 〈 \hat{x}, z - x 〉}{‖ z - x ‖} \geq 0} .$ (3)

The (general) subdifferential of f at $x \in dom f$ is defined

$\partial f (x) : = {\hat{x} : \exists x^{k} \overset{f}{\to} x, {\hat{x}}^{k} \to \hat{x} with {\hat{x}}^{k} \in \hat{\partial} f (x^{k}) for each k},$ (4)

where $x^{k} \overset{f}{\to} x$ means both $x^{k} \to x$ and $f (x^{k}) \to f (x)$ . Note that if f is also convex, then the general subdifferential and regular subdifferential of f at $x \in dom f$ reduce to the classical subdifferential [ [15] , Proposition 8.12], that is $\partial f (x) = {\hat{x} : f (y) \geq f (x) + 〈 \hat{x}, y - x 〉 for all y} .$ (5)

Definition 2. (Kernel Generating Distances and Bregman Distances [16] [17] [18] ) Let C be a nonempty, convex and open subset of $ℝ^{n}$ . Associated with C, a function $ϕ : ℝ^{n} \to (- \infty, + \infty]$ is called a kernel generating distance if it satisfies the following:

1) $ϕ$ is proper, lower semicontinuous and convex, with $dom ϕ \subset \bar{C}$ and $dom \partial ϕ = C$ ;

2) $ϕ$ is continuously differentiable on $int dom ϕ \equiv C$ .

We denote the class of kernel generating distances by $G (C)$ . Given $ϕ \in G (C)$ , the Bregman distance $D_{ϕ} : dom ϕ \times int dom ϕ \to (0, + \infty]$ is defined by

$D_{ϕ} (x, y) : = ϕ (x) - ϕ (y) - 〈 \nabla ϕ (y), x - y 〉 .$

For exmple, when $ϕ (x) = {‖ x ‖}^{2}$ , then $D_{ϕ} (x, y) = {‖ x - y ‖}^{2}$ . If $ϕ (x) = x^{T} A x$ , then $D_{ϕ} (x, y) = {(x - y)}^{T} A (x - y)$ . In this article, the gradient Lipschitz continuity condition of the function f is no longer required, instead it is replaced by the L-smooth adaptive function of pair $(f, ϕ)$ . The definition of L-smooth adaptable as follows.

Definition 3. A pair of functions $(f, ϕ)$ , $ϕ \in G (C)$ , $f : ℝ^{n} \to (- \infty, + \infty]$ is a proper and lower semicontinuous function with $dom ϕ \subset dom f$ and f is continuously differentiable on $C = int dom ϕ$ , is called L-smooth adaptable (L-smad) on C if there exists $L > 0$ such that $L ϕ - f$ and $L ϕ + f$ are convex on C.

Lemma 1. (Full Extended Descent Lemma [19] ) A pair of functions $(f, ϕ)$ is L-smad on $C = int dom ϕ$ if and only if: $| f (x) - f (y) - 〈 \nabla f (y), x - y 〉 | \leq L D_{ϕ} (x, y)$ , $\forall x, y \in int dom ϕ$ .

Definition 4. $f : ℝ^{n} \to ℝ \cup {+ \infty}$ is called μ-relative weakly convex to $ϕ$ on C if there exists $μ > 0$ such that $f + μ ϕ$ is convex on C [14] .

3. The Modified BAPG_s Method for HTPSVM

In this section, we first describe the BAPG_s method proposed in [14] , then the modified BAPG_s method with adaptive parameter is given for HTPSVM.

3.1. BAPG_s Method

Consider the following optimization problem:

$\min_{x \in ℝ^{n}} F (x) : = f (x) + P_{1} (x) - P_{2} (x),$ (6)

where f is a μ-relative weakly convex continuously differentiable function, P₁ is a proper, lower semicontinuous convex function and P₂ is continuous and convex. Besides, F is level-bounded i.e., for every $α \in ℝ$ , the set ${x \in ℝ^{n} | F (x) \leq α}$ is bounded; F is bounded below i.e., $\inf_{x \in ℝ^{n}} F (x) > - \infty$ . The iterative scheme of BAPG_s [14] for solving probelm (6) is shown in Algorithm 1, where $D_{ϕ}$ is a Bregman distance defined in Section 2.

We see that when $D_{ϕ} (x, y) = \frac{1}{2} {‖ x - y ‖}^{2}$ , BAPG_s reduces to APG_s in [13] . [14] proved the global convergence of the iterates generated by BAPG_s to a limiting critical point under some assumptions.

3.2. Adaptive BAPG_s Method for HTPSVM

By writing the nonconvex loss $l_{h t p}$ as the difference of three smooth convex functions, the problem (1) can be expressed as following from [12]

$\min_{b \in ℝ, w \in ℝ^{d}} F (b, w) = f_{1} (b, w) - f_{2} (b, w) - f_{3} (b, w) + P_{1} (b, w),$ (8)

where $P_{1} (b, w) = λ {‖ w ‖}_{1} + \frac{{‖ w ‖}_{2}^{2}}{2} + \frac{b^{2}}{2}$ , $λ$ is the regularization parameter; for $j = 1,2$ , $f_{j} (b, w) = \frac{1}{m} \sum_{i = 1}^{m} l_{j} [y_{i} (b + w^{T} x_{i})]$ , and the smooth convex functions $l_{j}$ are defined as

$l_{1} (u) = {\begin{array}{l} \frac{4}{5} - u, & if & u < \frac{3}{5}, \\ \frac{5}{4} {(1 - u)}^{2}, & if & \frac{3}{5} \leq u < 1, \\ \frac{5}{8} {(1 - u)}^{2}, & if & 1 \leq u < \frac{7}{5}, \\ - \frac{1}{2} (\frac{6}{5} - u), & if & u \geq \frac{7}{5}, \end{array}$ (9)

$l_{2} (u) = {\begin{array}{l} - u - \frac{1}{5}, & if & u \leq - \frac{2}{5}, \\ \frac{5}{4} u^{2}, & if & - \frac{2}{5} < u \leq 0, \\ 0, & if & u \geq 0, \end{array}$ (10)

$l_{3} (u) = {\begin{array}{l} 0, & if & u \leq \frac{8}{5}, \\ \frac{5}{8} {(u - \frac{8}{5})}^{2}, & if & \frac{8}{5} \leq u < 2, \\ - \frac{1}{2} (\frac{9}{5} - u), & if & u \geq 2. \end{array}$ (11)

Then we can apply the BAPG_s to solve problem (8) in the form of (6)

• $P_{1}$ : $f = f_{1} - f_{3}$ (nonconvex), $P_{2} = f_{2}$ (convex);

• $P_{2}$ : $f = f_{1} - f_{2}$ (nonconvex), $P_{2} = f_{3}$ (convex).

Next, We will briefly illistrate that the problem (8) can be solved by the BAPG_s [14] .

Theorem 1. Let f as defined in $P_{1}$ and $P_{2}$ . Set $ϕ (x) : = \frac{1}{2} x^{T} Q x$ , where $Q = \frac{1}{m} \sum_{i = 1}^{m} \frac{5}{2} Q_{i}$ , $Q_{i} = {(y_{i}, y_{i} x_{i}^{T})}^{T} (y_{i}, y_{i} x_{i}^{T})$ . Then, the pair $(f, ϕ)$ is L-smooth adaptable on $ℝ^{n}$ with $L = 1$ .

Proof. Firstly, for $P_{1}$ , since

${l^{'}}_{1 - 3} (u) + \frac{5}{2} u = {\begin{array}{l} \frac{5}{2} u - 1, & u \leq \frac{3}{5}, \\ 5 u - \frac{5}{2}, & \frac{3}{5} < u \leq 1, \\ \frac{15}{4} u - \frac{5}{4}, & 1 \leq u < \frac{7}{5}, \\ \frac{5}{2} u + \frac{1}{2}, & \frac{7}{5} \leq u < \frac{8}{5}, \\ \frac{5}{4} u + \frac{5}{2}, & \frac{8}{5} \leq u < 2, \\ \frac{5}{2} u, & u \geq 2, \end{array}$ (12)

and

$\frac{5}{2} u - {l^{'}}_{1 - 3} (u) = {\begin{array}{l} \frac{5}{2} u + 1, & u \leq \frac{3}{5}, \\ \frac{5}{2}, & \frac{3}{5} < u \leq 1, \\ \frac{5}{4} u + \frac{5}{4}, & 1 < u \leq \frac{7}{5}, \\ \frac{5}{2} u - \frac{1}{2}, & \frac{7}{5} \leq u < \frac{8}{5}, \\ \frac{15}{4} u - \frac{5}{2}, & \frac{8}{5} \leq u < 2, \\ \frac{5}{2} u, & u \geq 2, \end{array}$ (13)

are monotonically increasing, it is easy to verify that $l_{1 - 3} (u) + \frac{5}{4} u^{2}$ and $\frac{5}{4} u^{2} - l_{1 - 3} (u)$ are convex. Then we can easily get the convexity of

$\begin{matrix} f (b, w) + \frac{1}{2} x^{T} Q x = \frac{1}{m} \sum_{i = 1}^{m} l_{1 - 3} [y_{i} (b + w^{T} x_{i})] + \frac{1}{2 m} \sum_{i = 1}^{m} \frac{5}{2} {(b; w)}^{T} Q_{i} (b; w) \\ = \frac{1}{m} \sum_{i = 1}^{m} [l_{1 - 3} [y_{i} (b + w^{T} x_{i})] + \frac{5}{4} {(b; w)}^{T} Q_{i} (b; w)] \\ = \frac{1}{m} \sum_{i = 1}^{m} [l_{1 - 3} [y_{i} (b + w^{T} x_{i})] + \frac{5}{4} {[y_{i} (b + w^{T} x_{i})]}^{2}], \end{matrix}$ (14)

and

$\frac{1}{2} x^{T} Q x - f (b, w) = \frac{1}{m} \sum_{i = 1}^{m} [\frac{5}{4} {[y_{i} (b + w^{T} x_{i})]}^{2} - l_{1 - 3} [y_{i} (b + w^{T} x_{i})]],$ (15)

the proof is similar for $P_{2}$ . It is clear that $(f, ϕ)$ is 1-smooth adaptable on $ℝ^{n}$ , this further implies that there exists $0 < μ \leq 1$ such that $f + μ ϕ$ is convex.

We can see that the problem (8) satisfies the conditions required in [14] with $ϕ (x) = \frac{1}{2} x^{T} Q x$ for the pair $(f, ϕ)$ , where Q defined as Theorem 1. Therefore the BAPG_s method (Algorithm 1), here we let $τ = 1$ and replace (7) with the following steps, can be used for solving (8)

$\begin{array}{l} y^{k} = θ_{k} z^{k} + (1 - θ_{k}) x^{k}, \\ z^{k + 1} = \arg \min_{z \in ℝ^{n}} {〈 \nabla f (y^{k}) - ξ^{k}, z - y^{k} 〉 + P_{1} (z) + \frac{θ_{k}}{2} [{(z - z^{k})}^{T} Q (z - z^{k})]}, \\ x^{k + 1} = θ_{k} z^{k + 1} + (1 - θ_{k}) x^{k} . \end{array}$ (16)

The selection of parameter ${θ_{k}}$ in [14] as: for fixed positive integer N, let $θ_{0} = 1$ ,

$θ_{k + 1} = \frac{\sqrt{θ_{k}^{4} + 4 θ_{k}^{2}} - θ_{k}^{2}}{2}, k = 1,2, \dots, N$

and $θ_{k} \equiv θ_{N}$ for all $k > N$ . It is to see that the value of the positive integer N is difficult to determine. Combining with the adaptive parameter selection criterion proposed in [12] : let $θ_{0} = 1$ , $θ_{k} = \frac{\sqrt{θ_{k - 1}^{4} + 4 θ_{k - 1}^{2}} - θ_{k - 1}^{2}}{2}$ for $k \geq 1$ and compute

$d_{k} : = \frac{H_{k - 1} - H_{k}}{{(x^{k} - x^{k - 1})}^{T} (x^{k} - x^{k - 1})},$ (17)

when $k \geq 2$ , where $H_{k} : = F (x^{k}) + \frac{β_{k}}{2} {(x^{k} - x^{k - 1})}^{T} Q (x^{k} - x^{k - 1})$ and $β_{k} = \frac{α_{k}}{θ_{k - 1}^{2}}$ (the assumption of sequence ${α_{k}}$ given in [14, Assumption 2]). Let N be the first k satisfying $d_{k} \leq d_{k + 1}$ . The BAPG_s algorithm with adaptive parameter for problem (8) (HTPSVM) is shown in Algorithm 2.

4. Numerical Results

In this section, we aim to show the performance of Algorithm 2 for solving problem (1) by using MATLAB R2020b on a 64-bit PC with an Intel(R) Core(TM) i7-10870H CPU (2.20GHz) and 16GB of RAM.

First, consider the optimality condition (19) of Algorithm 2

$\begin{matrix} 0 \in \nabla f (y^{k^{'}}) - ξ^{k} + \partial P_{1} (z^{k + 1}) + θ Q (z^{k + 1} - z^{k}) \\ = \nabla f (y^{k}) - ξ^{k} + λ \partial {‖ z^{k + 1} ‖}_{1} + z^{k + 1} + θ Q (z^{k + 1} - z^{k}) . \end{matrix}$

Due to there is no explicit solution for this subproblem, we try to instead the $l_{1}$ norm by linear approximation, that is, ${‖ z ‖}_{1} \approx {‖ x^{k} ‖}_{1} + v^{k}^{^{T}} (z - x^{k})$ , where $v^{k} \in \partial {‖ x^{k} ‖}_{1}$ (here we take $v^{k} : = sign (x^{k})$ ), then we construct a new iteration step to replace the subproblem in Algorithm 2 as

$\begin{matrix} z^{k + 1} = \arg \min_{z \in ℝ^{n}} {〈 \nabla f (y^{k}) - ξ^{k}, z - y^{k} 〉 + {‖ x^{k} ‖}_{1} + λ v^{k}^{^{T}} (z - x^{k}) \overset{}{} \\ + \frac{1}{2} z^{T} z + \frac{θ}{2} [{(z - z^{k})}^{T} Q (z - z^{k})]}, \end{matrix}$ (20)

it is easy to calculate its solution:

$0 = \nabla f (y^{k}) - ξ^{k} + λ v^{k}^{^{T}} + z^{k + 1} + θ Q (z^{k + 1} - z^{k}),$

which means

$(I + θ Q) z^{k + 1} = θ Q z^{k} - \nabla f (y^{k}) + ξ^{k} - λ v^{k} .$

Then the update (18) and (19) are replaced by

${\begin{array}{l} z^{k^{'} + 1} = {(I + θ_{k^{'}} Q)}^{- 1} (θ_{k^{'}} Q z^{k^{'}} - \nabla f (y^{k^{'}}) + ξ^{k^{'}} - λ v^{k^{'}}), \\ z^{k + 1} = {(I + θ Q)}^{- 1} (θ Q z^{k} - \nabla f (y^{k}) + ξ^{k} - λ v^{k}), \end{array}$ (21)

in experiments, where $v^{k^{'}} = sign (x^{k^{'}})$ and $v^{k} = sign (x^{k})$ . The experiments are conducted on several real world datasets. We select 10 datasets from UCI [20] , to compare the Algorithm 2 with APG (method in [11] ), APG_s [12] and GIST [21] , where in GIST, we set $F = f + P_{1}$ with $f = f_{1} - f_{2} - f_{3}$ . The corresponding parameters of these methods are set the same as in [12] . For each dataset, The 21 initial points are used commonly for all methods: one zero vector, and 5 vectors selected independently from N (0, σ²I) for each $σ \in {1,2,4,8}$ . All algorithms stop if $\frac{‖ (b^{k + 1}; w^{k + 1}) - (b^{k}; w^{k}) ‖}{max {1, ‖ (b^{k}; w^{k}) ‖}} < 10^{- 6}$ or the number of iterations hits 3000. The average results are given in Table 1 and Table 2, including the number of iterations (iter), objective function value (fval) and CPU time in seconds (CPU) at termination with $λ = 1 \times 10^{- 3}$ and $5 \times 10^{- 4}$ , where BAPG_s- $P_{1}$ and APG_s- $P_{1}$ represent using BAPG_s (algorithm 2) and APG_s [12] for $P_{1}$ respectively ( $P_{1}$ described in section 3.2).

Table 1. Comparison on 10 datasets with $λ = 1 \times 10^{- 3}$ .

Table 2. Comparison on 10 datasets with $λ = 5 \times 10^{- 4}$ .

From the above tables, we see that Algorithm 2 for $P_{2}$ always obtain the smaller function values and converge faster than others, this means that Algorithm 2 for solving HTPSVM model (1) performs well.

5. Conclusions and Suggestions

In this paper, based on the BAPG_s method proposed by [14] , we construct the modified BAPG_s with the adaptive parameter selection technique introduced in [12] for solving the HTPSVM model. The linear approximation method is used to improve the subproblem in algorithm and a function $ϕ$ with a suitable matrix Q is set to obtain the L-smad property. Finally, numerical experiments show that our algorithm convergence faster.

Conflicts of Interest

The author declares no conflicts of interest regarding the publication of this paper.

References

[1]	Vapnik, V. and Cortes, C. (1995) Support-Vector Networks. Machine Learning, 20, 273-297. https://doi.org/10.1007/BF00994018
[2]	Joachims, T. (1998) Text Categorization with Support Vector Machines: Learning with Many Relevant Features. 10th European Conference on Machine Learning, Chemnitz, 21-23 April 1998, 137-142. https://doi.org/10.1007/BFb0026683
[3]	Zhang, X., Li, A. and Pan, R. (2016) Stock Trend Prediction Based on a New Status Box Method and AdaBoost Probabilistic Support Vector Machine. Applied Soft Computing, 49, 385-398. https://doi.org/10.1016/j.asoc.2016.08.026
[4]	Chandra, M. and Bedi, S. (2021) Survey on SVM and Their Application in Image Classification. International Journal of Information Technology, 13, 1-11. https://doi.org/10.1007/s41870-017-0080-1
[5]	Rong-En, F., Kai-Wei, C., et al. (2008) LIBLINEAR: A Library for Large Linear Classification. The Journal of Machine Learning Research, 9, 1871-1874.
[6]	Suykens, J. and Vandewalle, J. (1999) Least Squares Support Vector Machine Classifiers. Neural Processing Letters, 9, 293-300. https://doi.org/10.1023/A:1018628609742
[7]	Zhu, W., Song, Y. and Xiao, Y. (2021) Support Vector Machine Classifier with Huberized Pinball Loss. Engineering Applications of Artificial Intelligence, 91, Article 103635. https://doi.org/10.1016/j.engappai.2020.103635
[8]	Zhao, L., Mammadov, M., et al. (2010) From Convex to Nonconvex: A Loss Function Analysis for Binary Classification. 2010 IEEE International Conference on Data Mining Workshops, Sydney, 13 December 2010, 1281-1288. https://doi.org/10.1109/ICDMW.2010.57
[9]	Collobert, R., Sinz, F, et al. (2006) Large Scale Transductive SVMS. Journal of Machine Learning Research, 7, 1687-1712.
[10]	Shen, X., Niu, L., Qi, Z. and Tian, Y. (2017) Support Vector Machine Classifier with Truncated Pinball Loss. Pattern Recognition, 68, 199-210.
[11]	Zhu, W., Song, Y. and Xiao, Y. (2022) Robust Support Vector Machine Classifier with Truncated Loss Function by Gradient Algorithm. Computers & Industrial Engineering, 172, Article 108630.
[12]	Ren, K., Liu, C. and Wang, L. (2024) The Modified Second APG Method for a Class of Nonconvex Nonsmooth Problems. (In Press)
[13]	Lin, D. and Liu, C. (2019) The Modified Second APG Method for DC Optimization Problems. Optimization Letters, 13, 805-824. https://doi.org/10.1007/s11590-018-1280-8
[14]	Wang, L., Liu, C. and Ren, K. (2024) The Bregman Modified Second APG Method for DC Optimization Problems. (In Press)
[15]	Rockafellar, R. and Wets, R. (2009) Variational Analysis. Springer Science & Business Media, Berlin.
[16]	Bolte, J., Sabach, S., Teboulle, M., et al. (2017) First Order Methods beyond Convexity and Lipschitz Gradient Continuity with Applications to Quadratic Inverse Problems. SIAM Journal on Optimization, 28, 2131-2151. https://doi.org/10.1137/17M1138558
[17]	Lev, M.B. (1967) The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming. USSR Computational Mathematics and Mathematical Physics, 7, 200-217. https://api.semanticscholar.org/corpusid:121309410
[18]	Wu, Z., Li, C., et al. (2021) Inertial Proximal Gradient Methods with Bregman Regularization for a Class of Nonconvex Optimization Problems. Journal of Global Optimization, 79, 617-644. https://doi.org/10.1007/s10898-020-00943-7
[19]	Bauschke, H., Bolte, J. and Teboulle, M. (2017) A Descent Lemma beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications. Mathematics of Operations Research, 42, 330-348. https://doi.org/10.1287/moor.2016.0817
[20]	Asuncion, A. and Newman, D. (2007) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[21]	Chen, X., Lu, Z., et al. (2016) Penalty Methods for a Class of Non-Lipschitz Optimization Problems. SIAM Journal on Optimization, 26, 1465-1492. https://doi.org/10.1137/15M1028054

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies