SEPARATE TRAINING OF HYBRID NEURAL NETWORK

Our research is related to the hybrid neural network training. It consists of local elements, each of which solving its own problem, and recurrent control element producing the general solution. Local elements have identical perceptron inputs and outputs. It is assumed that the training sample is clustered and each element is trained beforehand and independently. The training of local elements and the control element is considered in detail. The methods of regret assessment for the control element are shown. A computational experiment of the hybrid neural network training is carried out through a recognition problem. As a result of the training, the control element selected the local element with the least loss. AMS Subject Classification: 68T05, 68T42, 62M20, 62M45


Introduction
The training methods applied in the work are widely used for coordination of expert assessments [1,2] in sequential forecasting problems.The attention of researchers is primarily focused on the development of algorithms that give an es-timate close to the best expert.Some successful results of such approaches were presented in the works of Cesa-Bianchi et al. [3], Hazan and Kale [4], Chiang et al. [5], where sequence prediction based on expert predictions was demonstrated and a convex online optimization problems were solved.Alexander Rakhlin Karthik Sridharan are actively working in the field of online learning [6,7].Their works not only present theoretical proof of boundary estimation and choice of the control element model, but also describe practical applications of adaptive online learning methods [8].
Our research is devoted to the development of the method for separate training of a hybrid neural network.Each neuron or a separate group of neurons (a local element) of such a network solves its own task and the control element produces the general solution.To achieve successful results, it is assumed that the training sample is clustered.The mutual influence of elements on each other in this training is absent.This usually has a good effect on the training outcomes.Each local element is trained beforehand and independently on its data set.In this work, the training of local and control elements was considered in detail.The results of computational experiments through the example of the recognition problem were presented.

Methodology of Separate Training
The control element is trained online.The task of control element consists of the coordination of expert assessments.In our case, the experts are local elements.Training is seen as a game with the environment (Fig. 1).

Training of a Local Element
We will train local elements on a clustered sample, that is In our study, each cluster contains noisy images of the first or second class, distinguished by a small shift.Each local element is trained independently of others on its cluster.We assume that the convex hulls V 1 i and V 2 i are linearly separable.In this case, the operation of the neuron can be represented as a predicate: ⌈(W i , X) ≥ θ i ⌉, X -the input binary vector, W i -the vector of weights, θ i -the threshold.The Figure 1: The scheme of the learning control item problem of the local element training is to calculate the vector of weights and the threshold.The most common in the case of linear separation is Kozinets algorithm [9].The learning algorithm calculates the vector with minimal norm in the convex hull of the set i,j and v 2 i,j -the elements of the sets V 1 i and V 2 i accordingly.The vector of weights is equal to the minimum vector W * i , the threshold is Only two elements of the weight vector change on each iteration of the Kozinets algorithm, so you can expect that the method of projecting the conjugate gradient with a constant step will converge faster.The problem of computing the minimal norm in a vector in a convex hull of a set B = {b 1 , b 2 , ..., b r } is equivalent to the problem: In (1) the columns of the matrix Gcoincide with the elements of the setB.The following calculations at each iteration of the learning algorithm are made: (2) g -the conjugate gradient; the maximum is calculated from the coordinates; λ -solution of the equation:

Training of a Control Element
We call Y -the space of the environment choices, Z -the space of outputs of local and control elements.The loss function of the hybrid neural network when interacting with the environment on the Cartesian product is defined as l : Z × Y → R + .It has standard properties of the loss function, for example, it is convex with respect to the first argument and bounded.
For any t and for each i cumulated losses can be calculated: y 1 , ..., y n -choices of environment, z i,1 , ..., z i,n -Outputs of the i-the local element.The accumulated losses of a hybrid network over the same period are: The interaction of hybrid neural networks with the environment is as follows: at a (concentrated moment) local time t outputs of local elements z i,t+1 and the output of the control element z t+1 are calculated.
After that, the environment makes a choice y t+1 .The control element in the calculation z t+1 uses the outputs of local elements and the information about the quality of the functioning of local elements.The algorithm of control element functioning is considered successful if lim n→∞ Dn n = 0. Successful algorithms include weighed estimation and random selection.
The weighed estimation is calculated as the average of the local elements outputs: It is assumed here that the set Z is a convex set.Exponential weighing is considered good: The output of the control element with random selection is In (5) ξ t -independent random variables, ξ t ∈ {1, ..., M } and P (ξ The calculation of the probabilities that are used in both the first and second methods can be organized in a form more suitable for the control element.The corresponding recurrence equations have the following form:

Regret Assessment
A very important point is the choice of values for the parameter h.[2] proposes different values for the parameter.If h = 8 ln M/T , it is assumed that n ≤ T , l (z, y) ∈ [0, 1]; M -the number of local elements, then regret assessment is: The advantage of the assessment (7) as well as of all subsequent ones is their independence from the choice of environment.The disadvantage of this choice of parameter is the dependence on T .It is necessary to choose a large value for T , because in real problems the number of cycles nis unknown beforehand.This leads to an unjustifiably overestimated regret assessment.It is suggested to restart the algorithm in points T j = 2 j to get rid of this drawback.On the interval 2 j , ..., 2 j+1 − 1 , parameter h j = 8 ln M 2 j .This choice of parameter leads to the following regret assessment: This assessment doesn't have disadvantage of (7) because it does not depend on T .However, the restart of the algorithm is not always justified, because it leads to the loss of the information about the behavior of local elements.Very often the restart occurs at the beginning of training, and then less frequently.
Provided that exp (−hl (z, y))is the oncave function on the first argument, the regret assessment is valid: D n ≤ ln M h , which does not depend on n.Obviously, it is necessary to choose h as large as possible, while guaranteeing the exponential convexity of the loss function.For example, with 2(d+1) for d ≥ 2, and h = 1 2 with d = 1.The regret assessment is: Comparing assessments ( 7) and ( 8), we see that the assessment ( 9) is advantageous for The regret of random selection is calculated as follows The expectation is used because the loss of the control element is a random variable.The regret assessments shown before for weighed estimation are also valid for random selection.

Experimental Analysis
In this section, the problem of recognizing two classes of black and white images is considered.Images are set on a raster size S × S. Images contain noise and differ in shift.In addition, images can be of different sizes and rotate about their centers of mass.The control element must recognize the images without preliminary noise suppression, calculation of the shift, size and rotation.Local elements must recognize images with the same size and orientation.The experiment was implemented using PIL libraries for working with raster graphics and NumPy for working with multidimensional arrays in Python.In our experiment, the training of the control element was carried out in accordance with the algorithm shown below.
2. While (isData) // to continue while the training examples are being submitted // calculate the probability of selecting local elements, where in the denominator -the element-by-element vector sum of the local elements weights.z t = z ξt,t // calculate the output of the control element by the method of random selection (see formula 5).
Z t = X t W * // calculate the outputs of local elements, where W * -the matrix of trained local elements weights.
L t = l (Z t , y t ) // submit the choice of environment to the control element and calculate local elements loss.
U t+1 = U t exp (−hL t ) // calculate the weight vector of the local elements for the next t.

Experiment Setup
Step 1. Generating a training sample with linearly separated classes.A 30 * 30 raster and letters A and B were used when generating a training sample.The sizes of the images in pixels were calculated taking into account the specified font size.It is also worth noting that when letter rotating, the size of the letter area increases in height, so the maximum size along the diagonal of the letter area was selected.At the first stage, a sample was generated with letters shifted to the left, up, right, down by 2 pixels on the raster.At the second stage, the resulting sample was transformed by turning the letters 3 times (the degree of rotation is calculated automatically) and by a decrease in the size of the letters 3 times.20 pictures with noise were added to all received samples.Thus a total of 12 samples, each of size 105 of letters A and B were received.(fig.2) Figure 2: The examples of generated images for each sample (from 1 to 12) Step 2. Training of local elements.12 local elements were trained with the use of the generalized method of Kozinets.
Step 3. Environment setup.12, 9, 6, and 3 samples were recorded.Letters from samples were submitted to the control element randomly with an Step 4. Training of a control element.The parameter h is set to a constant of 0.5.The interruption in the submission of examples was stopped when the error rate of the hybrid neural network stabilized.The error rate at each iteration was calculated as the ratio of the number of incorrectly recognized sample elements to the current number of submitted examples.

Experiment Results
Fig. 3 shows the results of four experiments.On the first two the error rate is close to 0.3, on the third and fourth ones it is close to 0.2 with stabilization.The first and second graphs correspond to the submission of images from 12 and 9 sample, the third and fourth graphics -the submission of 6 and 3 sample respectively.The results of the experiments are shown in Tables 1-4.
The tables show that each of the cases has a leader.uation involving the recognition of very different images without preliminary processing (noise suppression, image normalization).This is proved by the graphs of the hybrid neural network errors (fig.3).As a result of the training, the control element chose the local element that was least likely to have errors in the examples given by the environment.The proposed method can also be recommended for speech recognition of multiple speakers or for time series forecasting, when the time series model is not known in advance.

Figure 3 :
Figure 3: Graphs of errors of the hybrid neural network training.The abscissa axis is the number of examples, the ordinate axis is the error rate.

Table 1 :
Training results of the control element when submitting images from 12 samples.

Table 3 :
Training results of the control element when submitting images from 6 samples

Table 4 :
Training results of the control element when submitting images from 3 samples