THE SIZE DISTRIBUTION OF CITIES IN A REGION: AN EVALUATION OF PARETO, LOGNORMAL AND PPS DISTRIBUTIONS

The Pareto-Positive Stable(PPS) distribution is introduced as a new model for describing city size data of a region in a country. The PPS distribution provides a flexible model for fitting the entire range of a set of city size data and the classical Pareto and Zipf distributions are included as a particular case. The new distribution is compared with two classical models: Pareto and lognormal distribution. In all the data sets considered, the Newtons forward and backward(equal intervals), Lagrange interpolation formula(unequal interval) outperforms the fits of Pareto and lognormal distributions. AMS Subject Classification: 62H10, 62H12


Introduction
In this work we have analyzed the city size distribution data using the methodology viz., PPS distribution developed by Sarabia and Prieto(2009).
Systems with measurable entities (which can be defined by their size) are characterized by particular properties of their distribution.There are extensive literature and case studies in this field that include work on population of countries, incomes of people in the same economy, frequency of words in languages etc. Scholars have been addressing the problem, regarding the size distribution of such systems; the first is finding a mathematical description for these distributions.The most popular suggestions are the lognormal distribution and the power law(known also as Zipfs law).Yet, there are other expressions that describe with equal success general observed distributions.The second problem is to develop model which explains the size distribution.Here also several models (either analytical or computer simulations) were proposed.These models can be divided into two classes: the first includes models with a limited number of parameters and the second class includes mostly economic models which are more complex and includes numerous parameters.
Pareto distribution was initially proposed Auerbach(1913) and followed by Zipf(1949) to fit City size data.Rosen and Resnick(1980) did a cross-country investigation of city sizes in 44 countries and found that Pareto exponent was in the interval α ε [0.81-1.96].They have also tried to explain the variations in the Pareto exponent, and showed that it is sensitive to city definition and city sample size.Based on 135 USA metropolitan areas in 1991, Krugman(1996) calculated the value of close to one.Using the same data set, Gabaix(1999aGabaix( ,1999b) ) derived a statistical explanation of Zipfs law for cities.Brakman et al.(1999) with Netherland data provided Pareto evidence over a wide range of time.Nitsch(2005) used meta analysis and concluded that Pareto distribution as an appropriate one to fit city size data.Zanette and Manrubia(1997) developed an intermittency model to large-scale city size distributions.David and Weinstein(2002) found that variation in Japanese regional population density, as well as the distribution of city sizes, obeyed a Pareto distribution, at all points in time.Soo(2005) updated values for the internal [0.73,1.72]and tried to explain variations in the Pareto exponent.Moura and Riberio(2006) have showed that Pareto distribution was not valid for smaller cities.Some probabilistic and economic models have been proposed by many researchers and the central idea among the above models is that Gibarts law (proportional growth) can lead to Pareto distribution.Simon(1955) has shown that a proportional growth can explain several different skew distributions, in-cluding lognormal, Pareto and Yule.Anderson and Ge(2005) have shown the superiority of the lognormal distribution with respect to Pareto distribution, using size distribution of Chinese cities. Subbarayan(2009) extensively studied the size distribution of cities in Tamilnadu, Indian state for the period 1901-2001.Sarabia and Prieto(2009) have stated that the validity of the Pareto distribution disappears when all the population is fitted, including cities of medium and small size.
In this paper we have considered the models evolved by Sarabia and Prieto (2009).The descriptive model evolved by them is called PPS distribution for city/town size data.It is interesting to note that more flexible models emerge from PPS under certain conditions.The classical Pareto and Zipf distributions are included as particular cases.The PPS distribution provides a flexible model for fitting the entire range of a set of city/town size data, when zero and uni-modelity are possible.Therefore the probability density function always decreases or it has a local maximum.

Pareto Distribution
The linear relation between population of cities and their ranks on a log-log plot is found to be a power law, where the absolute value of this linear function is the exponent of the power law [4].A power law is also known as a classical Pareto distribution with cumulative distribution function (cdf), where α > 0 is a shape parameter and σ is a scale parameter, which represents the population of the smallest city in the sample.The α parameter is called the Pareto coefficient.The quantity x σ −α represents the proportion of cities of large size than a given x value.

The PPS Distribution
Sarabia and Prieto(2009) have defined PPS distribution in terms of cdf.
and F (x) = 0 if x < σ, where, λ, σ, ν > 0 A random variable with cdf given by (2) will be denoted by X ∼ P P S(λ, σ, ν).It may be noted that λ and ν are shape parameters and σ is a scale parameter.

PPS Based on Weibull Distribution
PPS distribution can also be obtained from a monotonic transformation of the Weibull distribution [5,7].
Let Z be a classical Weibull distribution with cdf Then the random variable where,σ, λ > 0 is distributed according to a PPS (λ, σ, ν) distribution with cdf by (2).Using Eq.( 4), if X is a PPS distribution with cdf given by Eq.( 2), the random variable.
is a Weibull random variable with cdf by (3) The pdf of PPS is given by and If ν > 1 the mode a local maximum of the pdf defined by Eq.( 5) is at σ exp(z 0 ), where z 0 is the unique solution of the equation in z, The Three Parameter Lognormal Distribution [3,12] Three-parameter lognormal Distribution The pdf of the three-parameter lognormal distribution where, x > γ ≥ 0, ∝< µ <∝, σ > 0 and γ is the threshold parameter or location parameter that defines the point where the support set of the distribution begins; µ is the scale parameter that stretch or shrink the distribution and σ is the shape parameter that affects the shape of the distribution.
If X is a random variable that has a three parameter log-normal probability distribution, then Y = ln(X − γ) has a normal distribution with mean µ and variance σ 2 .
The cdf of the three-parameter lognormal Distribution is For the three parameter lognormal Distribution defined in equation ( 7), the value of γ is given by the minimum population size value.

Estimation
Let X 1 , X 2 , .X n be a sample of size x drawn from a PPS distribution [2].We assume that σ parameter is given and we obtain it using the population of the smallest city.We will use the Random Variable Z defined by Z = log[ X σ ] and its observed value by The log likelihood function is given by where f(x) is pdf defined in (5)

Maximum Likelihood Estimate of λ and ν
Taking partial derivatives with respect to λ and ν and equating then to zero the following normal equations are obtained [17].
If we eliminate λ in Equations.( 8) and ( 9) the equation in ν is obtained.
The above equation can be solved using Newton Raphson method.The λ estimator We have already stated that more flexible models emerge when ν > 1.The value of ν is considered with the range 2.0 ≤ ν ≤ 2.5.

Maximum Likelihood Estimation for Parameters µ and σ for Three-Parameter Lognormal Distribution
The MLE for the parameters of µ and σ are given by [17] μ Let us suppose that the values of x viz x 0 , x 1 , ..., x n are equidistant.
LetP n (x) be a polynomial of the nth degree in x such that yi = f (x i ) = P n(x i ), i = 0, 1, 2, ..., n.
Let us assume P n (x) in the form given below 1) The (n+1) unknowns a 0 , a 1 , a 2 , ..., an.can be found as follows.

Equal Intervals for Interpolation
Suppose y = f (x) takes the values y 0 , y 1 ...y n corresponding to the values Now, we want to find a collection polynomial P n (x) of degree n in x such that We shall find a 0 , a 1 , ...an such that ) is a factor in all terms of RHS of (4.2.2) except in a 0 Putting x = x n in (4.2.2) Operating (4.2.2) by ∇ r using (4.2.3) ∇ r P n (x) = 0 + 0 + ... + 0 + a r r!h r + (r + 1).r.(r − 1)...2.h r a r+1 (x − x n ) (1)  + terms involving(x − x n )as a f actor.
Setting x = x n in this, ∇ r P n (x n ) = ∇ r y n = a r r!h r .since other terms vanish.

Example 1
Find the values of y at x = 21 and x = 28 from the following data.

Numerical Results for Unequal Interval
The lagrangian interpolation formula can also be written as Differentiating this and substituting x = x i we get .y n (4.3.1)

Conclusion
In this work we have analyzed the city size distribution data using the methodology viz., PPS distribution developed by Sarabia and Prieto(2009).The PPS distribution provides a comparative flexible model for all range of a set of city size for 6 census periods.For purposes we have considered lognormal distribution and Pareto distribution for fitting City Size Distribution data.These distributions are frequently used by urban researchers for fitting City Size Distribution data sets.We have used maximum likelihood estimate method for the estimation of the parameters of lognormal Pareto, and PPS.By using AIC we have compared the results and it is noted that PPS distribution outperforms the fit provided by Pareto and lognormal distribution.This clearly indicates that PPS is considered to be a good fit not only for country data but also for regional city size data.

1 .
For Equal Intervals let Y = f (x) denote a function which takes the values y 0 + y 1 + ... + y n corresponding to the values x 0 + x 1 + ... + x n respectively of x.