MULTIFRACTAL PARAMETRIZATION IN DIAGNOSIS OF LUNGS DISEASES

We consider the possibility of spatially asymmetric lungs diseases diagnosis by means of X-ray images processing. The method of multifractal parametrization is implemented. We adjust its parameters and construct a number of classifiers (pathology/normal) using the characteristics of multifractal spectrum. AMS Subject Classification: 92C55, 94A08, 68U10, 28A80, 37C45


Introduction
One cannot underestimate the problem of diseases detection and computer aided diagnosis.Among all of the lungs diseases (pneumonia, tuberculosis, tumor, emphysema, cancer etc.) and their diagnosis methods (fluorography, radiography, bronchography, lungs volume and contents analysis, pulmonary function tests etc.), we have chosen the most prevalent symptoms: fibrosis and infiltrations viewed on X-ray images.
Such diseases are usually marked by clustering of pocket spots and branching structure of propagation.Therefore they present fractal subsets (selfsimilar, repeating themselves on a number of scales) and in most cases are distributed with varying density, which let us study them as multifractals.The features of multifractal are described numerically by its multifractal (MF) spectrum -a set of dimensions of its fractal subsets.
In theory, by means of MF spectrum one should be able to distinguish the presence of pathology with branching structure and estimate its heaviness.In practice however, the second-rate accuracy when calculating MF spectrum (by box-counting method) and difficulties of its analysis impede the task of classification.Certain success was made by different authors using various MF techniques [1]- [4], mostly studying computed tomography (CT) scans.
In this paper we develop simple classifiers for detection of spatially asymmetric lungs diseases, using a suitable set of MF spectrum characteristics for the task of image recognition and classification.We use fluorographic X-ray images which are way simpler to obtain than CT, albeit rougher and contain less information.
In Section 2, basics of MF image parametrization are described, together with the certain selected numerical values of this method.
In Section 3, the development of new classifiers is discussed.We present the results of processing the training set of X-ray images and propose a set of metrics on images, involving their MF spectrum.These metrics serve as descriptors that we use to construct some classifiers for lungs pathologies.For each classifier we find its optimal parameters.

Multifractal Formalism
Here we state briefly the basics of MF formalism (see further [5]- [10]).Let M ⊂ L × L ⊂ R 2 , L > 0, be a non-empty compact subset of a plane.And let µ be a finite measure supported on M .We divide this set M into a grid G ε of square cells M i , i ∈ N (G ε ) ⊂ N, with linear size ε ∈ (0, L), and denote We define general correlation function and its exponent Indeed χ(q) ≈ ε τ (q) whenever ε is small.We can consider the existence of limit in (1) as the necessary property of multifractal.Parameter q controls the contribution of areas with small and big measures into the sum χ(q): when q ≫ 1, the cells with relatively big values µ i play the main role in this sum, and when q ≪ −1, the cells with smaller µ i get important.Thus q is essentially a kind of scaling factor (a "lens"), which reveals the distribution properties of homogenous subsets in M .
Consider the following functions: Equations ( 2)-( 3) define parametrically a curve f = f (α) of fractal dimensions (f (α)-spectrum and f (q)-spectrum).The value f (α) presents fractal dimension of the homogenous subset M α ⊂ M with the same singularity index α of cell measures µ i ≈ ε α .Its also useful to consider generalized Rényi dimensions: By MF spectrum we denote f (α)-spectrum together with Rényi dimensions.

Multifractal Parametrization
By means of some standardized procedure of X-ray photography followed by image preprocessing (see [11]), we have got a digital image: a matrix M sized n × n, n ∈ N (in our case n = 512), with each element z pq ∈ M being a function of color of the corresponding pixel of the X-ray.In our case z pq ∈ [0, 1] which corresponds to gray scale (black color for dense structures (bones, blood vessels, and pathological formations), white color for thinner matter (lungs and cavities in them)).
In order to find MF characteristics of digital image, we need to examine it on different scales, dividing into the net G k of cells with linear size The set of scales often defines the look of graphs we get.Besides, each scale is in charge for those image details commensurable with this scale, letting us to pick out some specific scales (in our case {4, 8, 16, 32, 64, 128}).
Knowing the color value z pq we can define the pixel measure µ 0pq = z pq / zpq∈M z pq and extend it naturally to finite measure µ on subsets of M .
To each net cell M i of size l k × l k we assign the measure the sum of color function over all pixels in this cell, normalized by the measure of the whole set M .For estimation of MF characteristics we need to know the values of α(q) and f (q), yet the implementation of formula (3) for this is inconvenient.We will use the following trick (see [5]).Consider a net G ε , ε > 0, and the quantities Omitting the justification of passing to the limit, one can derive and then the values α(q) = lim ε→0 ∂Aε ∂ ln ε and f (q) = lim ε→0 ∂Fε ∂ ln ε (L'Hospital rule) are obtained as the slopes of linear regressions of the sets (ln ε, A ε ) and (ln ε, F ε ) correspondingly.Computations (4)-( 5) are dependent on the quotient ε/L, so that at another scale ε ′ = Kε, K > 0, we have ∂ ln ε ′ = ∂(ln ε + ln K) = ∂ ln ε, and α(q) and f (q) stay the same.Therefore instead of ε we will use scales = {l k } m k=1 for linear approximations of points {(ln l k , A l k )} m k=1 and {(ln l k , F l k )} m k=1 , and take µ i = µ ik for each l k ∈ scales.
Continuous spectrum of q ∈ R should also be sampled.We chose to take q ∈ Q = {0, ±1, . . ., ±70}, and then in each particular classifier we specialize it to a lesser subset {q} qmax q=q min ⊂ Q.It is customary to pick out several characteristics of f (q)-spectrum, those having understandable informational sense, see Table 1.

Data Analysis and an Approach to Classification
In this paper, we restrain ourselves to diagnosis of horizontally asymmetrical pathologies of lungs (with reference to spine on x-ray images), in which case one of the lungs is damaged more than the other.
Our goal is to find suitable characteristics of MF spectrum and adjust the method of MF parametrization, in order to diagnose asymmetrical pathology by a given pair of symmetrically cut regions of interest (ROIs).
For this purpose, we use the training set of 55 pairs of ROIs with asymmetric pathologies (the set P ) and 51 pairs of ROIs from healthy lungs (the set H) (see Fig. 1), and denote corresponding statistical populations by P ′ ⊃ P and H ′ ⊃ H (the X-ray database of republic clinical antitubercular health center, Tatarstan, Russia).We assume the presence of scanning procedure generating admissible ROIs, which lay inside pulmonary fields (see [12] for spotting the features on X-rays of lungs).
Let P ill be a set of all ROIs with pathologies taken from the pairs in P , and (P ∪ H) norm be a set of ROIs without pathologies taken from the pairs in P and H.
We have chosen empirically the following values for MF parametrization: The bigger are D −∞ and D +∞ , the less dense is Describes the amplitude of singularity indices: the bigger is K, the less homogeneous is M .
The bigger is f −∞ , the more homogeneous is M .Corresponds to fractal dimension of the most thin homogeneous subset in M .
The bigger is f +∞ , the more homogeneous is M .Corresponds to fractal dimension of the most dense homogeneous subset in M .△ ∞ = D 1 − D +∞ Points out to the limit of inner symmetry violation, regarding MF transformation of measure µ i → µ q i χ(q) .Also points out to the measure of order in M , and the degree of equilibrium of its structure.σ err −∞ = σ err (q min ) σ err +∞ = σ err (q max ) Mean squared errors when estimating f (q min ) and f (q max ).
We can notice that the mean squared error σ err (q) of linear regression when estimating f (q) at each q ∈ Q for every sample in P ill ∪ (P ∪ H) norm growths when |q| is large (see Fig. 2).Thus the result of calculating f (q) (and α(q) as well) becomes less reliable.Therefore we adjust Q ⊂ {−70, . . ., 70} for each classifier in order to maximize the specificity (see further) of the method.To each image I ∈ P ill ∪ (P ∪ H) norm we can assign a set of its MF characteristics (see Table 1): Statistical properties of MF spectrum of ROIs in P ill ∪ (P ∪ H) norm are given in the Table 2 (left).We can derive, that D 0 is nearly constant and can be excluded from our consideration.Besides, the differences in values in P ill and in (P ∪ H) norm are not big, which obstructs us to use them for classification by one image (not by a pair).
Let I = (I ′ , I ′′ ) ∈ P ∪ H be a pair of ROIs order is not important since do not know which side of lungs contains pathology).Consider the following set of metrics d s , s ∈ S = {1, . . ., 9}, (chosen from the bigger amount by maximal values of sensitivity and specificity when changing Q ⊂ {−70, . . ., 70}): 2 /(q max s − q min s + 1), q=q min s (f ′ (q) − f ′′ (q)) 2 /(q max s − q min s + 1), We can notice from the statistical properties of given metrics (see Table 2, right) that the descriptors of healthy pairs are slightly smaller and stay closer to 0 than those of pairs with pathologies.This justifies using a simple classifier by two areas.
We consider the simplest 2-dimensional classifier with parameters s = (s 1 , As an additional parameter we will use the set Q s = {q} qmax s q=q min s ⊂ Q, used for estimation of MF spectrum, with possibility to vary q min s and q max s to obtain the best results. We need two indices of effectiveness for our classification method, sensitivity (true positive rate) and specificity (true negative rate): Since in mass fluorographic (screening) examination, pulmonary diseases are relatively rarely detected (4 patients in 1000), we set high value of γ = 0.9 as the lower limit of sensitivity and search for classifiers with maximal specificity.Effectively analogous classifiers one can obtain from metrics d s , using the relevant vector machine method to separate areas on the graphs like Fig. 3.It is also interesting to implement Mahalanobis distance for this purpose.
Often, pathological structure in lungs has multifractal properties.The existence of pathology should come out in multifractal spectrum of the X-ray image.Here we have studied diagnosis by a pair of symmetrically cut ROIs, and we leave the question of analogous classification by a single ROI for further research.
The author expresses his gratitude to R. Kuleev for the handy consultation and inspiring advices.The work has been supported by the Russian Ministry of education and science (agreement: 14.606.21.0002,ID: RFMEFI60614X0002).

Figure 2 :
Figure 2: Typical graph of mean squared error of regression when estimating f (q).

Table 2 :
Statistics for MF characteristics (left) and for metrics involving them (right).Here E is a mean value over the corresponding sample, and σ is a standard deviation.