CONTRIBUTION OF CHANNEL EQUIVOCATION FOR THE DEVELOPMENT OF SOURCE CODING THEOREMS

The present communication deals with the development of new coding theorems in terms of channel equivocation, that is, coding is done for a source which selects a new set of source statistics after each output symbol is received from the channel. New proof for Fano’s bound on Shannon’s equivocation is provided by using log sum inequality. Moreover, bounds on various generalizations of Shannon’s equivocation have been provided. AMS Subject Classification: 94A24, 94A15, 94A29


Introduction
The main purpose in source coding is to encode the source that produces symbols X = (x 1 , x 2 , ..., x n ) with probabilities p = (p 1 , p 2 , ..., p n ) to a codeword c i of length l i expressed using the D letters of the alphabet.The first source coding theorem investigated by Shannon [4] proved that entropy of the source provides a lower bound to the average number of code symbols needed to encode each source symbol.By making use of Kraft's [8] inequality, which is necessary and sufficient condition for the code to be uniquely decipherable given by n i=1 Shannon [4] proved the following result: where is the mean codeword length and is the Shannon's [4] entropy.Later on, Campbell [9] and Kapur [7] proved the source coding theorems for their own exponentiated mean codeword lengths in the form of following inequalities R α (p) ≤ L α < R α (p) + 1 (1.5) and R α (p) ≤ L α < R α (p) + 1 (1.6) respectively, where is Campbell's [9] mean codeword length, Kapur's [7] mean codeword length and p α i is Renyi's [1] measure of entropy.The above mentioned Kraft's inequality plays an important role in coding theory and many source coding theorems have been proved with the help of this inequality.Some contribution of this inequality to coding theory has recently been provided by Parkash and Priyanka [11] whereas Parkash and Kakkar [13] have obtained optimum probability distribution for minimum redundancy of source coding.Recently, Parkash and Kakkar [12] introduced two mean codeword lengths and proved the following source coding theorems: Theorem 1.For all uniquely decipherable codes (or instantaneous codes), the lower bound of new mean codeword length L(β) lies between K β (p) and K β (p) + 1, where is a new measure of entropy introduced by Parkash and Kakkar [12].
Theorem 2. For all uniquely decipherable codes (or instantaneous codes), the exponentiated mean codeword length L(α, β) satisfies the following relation: where is Kapur's [7] two parameter additive measure of entropy.
Basically, the above results take into account only the source that produces the symbols to be encoded and not the channel from which the input alphabet is transmitted.Thus, it becomes interesting to take into consideration the accountability of the channel for its participation in proving the new source coding theorems.
Also, one of the fundamental concepts in information theory is the relation between entropy and error probability, and one of the pioneer related with such studies is Fano's [15], who developed lower bound on the error probability of decoders which plays an important role in deriving other theorems and criteria in information theory.Dong and Fan [17] extended Fano's inequality and introduced the lower bounds on the mutual information between two random variables.Ho and Verdu [16] found a different upper bound on the conditional entropy (equivocation) in terms of the error probability and the marginal distribution of the random variable.Moreover, the authors found the new lower bound on the conditional entropy for countably infinite alphabets.Hu and Xing [2] derived analytical upper and lower bounds between entropy and error probability based on the closed-form solutions of conditional entropy without involving any approximation.The objective of the present paper is to highlight the role of channel in source coding theorems.
In section 2, coding theorems are provided for a source not with a fixed set of source statistics but for a source which selects a new set of source statistics after each output symbol is received from the channel.Section 3 provides the new proof for Fano's [5] bound on Shannon's equivocation using log sum inequality as well as provides new bounds on various generalizations of Shannon's equivocation.

Coding Theorems in Terms of Channel Equivocation
An information channel is described by giving an input alphabet A = {a i , i = 1, 2, ..., r}, an output alphabet B = {b j , j = 1, 2, ..., s} and a set of conditional probabilities p(b j /a i ) for all i and j where p(b j /a i ) is the probability that the output symbol b j will be received if the input symbol a i is sent.Let us suppose that the input symbols A = {a i , i = 1, 2, ..., r} occur according to the probabilities p(a i ), i = 1, 2, ..., r for transmission through the channel and the output symbols B = {b j , j = 1, 2, ..., s} occur according to the probabilities p(b j ), j = 1, 2, ..., s .So, a priori entropy of A is given by and the posterior entropy of A, when b j is received is given by So, the average number of binary digits necessary to represent an input alphabet a i , if we are given an output symbol, is the average a posterior entropy It is to be mentioned that we restrict ourselves to the binary case so the logarithms are taken to the base 2 in further discussion.
It has been shown in [10] that One major difference between Shannon's noiseless coding theorem and inequality (2.3) is that the former applies to all uniquely decipherable codes, instantaneous or not, whereas the latter applies only to instantaneous codes.
Next, we extend the source coding theorems given by Parkash and Kakkar [12] in the context of channel equivocation.
Extension of Theorem 1.For this purpose, we make use of the concept of escort distribution as follows: If p(a 1 ), p(a 2 ), ..., p(a r ) is the original distribution of the input alphabet, then its escort distribution is given by P (a 1 ), P (a 2 ), ..., P (a r ) where P (a i ) = p β (a i ) n i=1 p β (a i ) for some parameter β > 0. Let us associate the probabilities P (a i ), i = 1, 2, ..., r with the input alphabet A of the channel, called a priori probabilities and let us suppose that output symbols occur with probabilities P (b j ), j = 1, 2, ..., s .In this case, the average number of binary digits necessary to represent an input alphabet a i is the average a posterior entropy, B p(b j )H(A/b j ) for the given output symbol.
Next, we construct s binary codes one for each of the possible received symbols b j and we use the j th binary code to encode the transmitted symbol a i when the output of the channel is b j .Let the word lengths of s codes be as shown in Table 1.If we require each code to be instantaneous, we may apply the first part of theorem 1 to each code separately which gives where L ij (β) is the average length of the j th code.Here, the conditional probabilities are used instead of marginal probabilities because j th code is used only when b j is the received symbol.So, averaging with respect to the received symbols b j will give us the average number of binary digits required for each member of the input alphabet.
Multiplying both sides of (2.4) by P (b j ) and summing over all B gives where L(β) is the average number of binary digits for each member of the input alphabet averaged with respect to both input and output symbols.
Next, we illustrate a coding procedure in order to show that the bound (2.5) can be achieved.
We select l ij , the codeword length corresponding to the input a i , when b j is the output of the channel as the unique integer satisfying First, we need to check that the codeword lengths chosen in this manner satisfy the Kraft's inequality and therefore acceptable as the word lengths of an instantaneous code.Now, from L.H.S of inequality (2.6), we have So, equation (2.6) defines an acceptable set of codeword lengths l ij , for an instantaneous code for each j.Now, multiplying equation (2.6) by P (a i , b j ) = P (a i /b j )P (b j ) and summing over A and B, we get So, equation (2.8) can be written as So, result (2.9) is an extension of theorem 1 in terms of channel equivocation.
Note.One major difference between noiseless coding theorem 1 and inequality (2.9) is that the former applies to all uniquely decipherable codes, instantaneous or not, whereas the latter applies only to instantaneous codes.This fact is because of the reason that in the latter case, though each of the code used is uniquely decipherable but it is also not generally true that a sequence of codewords from a known sequence of uniquely decipherable codes is uniquely decipherable.Hence, the codes must all be instantaneous.

Remark. For β = 1,K β (A/B) becomes H(A/B). So, it is an extension of Shannon's equivocation.
Extension of Theorem 2. Exponentiated mean codeword length L(α, β) and Kapur's [7] entropy E α β (p) respectably can also be written as where Let us associate the probabilities Q(a i ) = p β/α (a i ) n i=1 p β/α (a i ) , α > 0, β > 0, with the input alphabet A of the channel and let us suppose that output symbols occur with probabilities Q(b j ).So, B Q(b j )E α β (A/b j ) represents the average number of binary digits necessary to represent an input alphabet a i , if we are given an output symbol.On similar lines as in extension of theorem 1, s binary codes are constructed one for each of the possible received symbols b j and j th binary code is used to encode the transmitted symbol a i when the output of the channel is b j .Then, we may apply the first part of theorem 2 to each code separately which gives where L ij (α, β) is the average length of the j th code.So, averaging with respect to the received symbols b j will give us the average number of binary digits required for each member of the input alphabet.
Multiplying both sides of (2.10) by Q(b j ) and summing over all B gives (2.11) where L(α, β) is the average number of binary digits for each member of the input alphabet averaged with respect to both input and output symbols.
Next, we illustrate a coding procedure in order to show that the bound (2.11) can be achieved.
We select l ij , the codeword length corresponding to the input a i , when b j is the output of the channel as the unique integer satisfying Now, from inequality (2.12), we get i 2 −l ij ≤ 1 for each j.
(2.13) So, equation (2.12) defines an acceptable set of codeword lengths l ij , for an instantaneous code for each j.Now, we consider the following cases: Case I: When 0 < α < 1, 1−α α > 0. Now, multiplying (2.12) by 1−α α and raising to the power 2, we get the following expression upon multiplying the resulting equation by Q(a i /b j ) and summing over all members of A: Now, So, equation (2.14) can be written as Case II: When α > 1, 1−α α < 0, we have Again, this result applies only to instantaneous codes.
Similar results can also be proved for Campbell's [9] and Kapur's [7] mean codeword lengths.

Bounds on Equivocation in Terms of Probability of Error
The probability of error for a channel with an input alphabet A = {a i , i = 1, 2, ..., r}, an output alphabet B = {b j , j = 1, 2, ..., s} is given by where d(b j ) is any function specifying a unique input symbol for each output symbol and is called decision rule.Now, the conditional maximum likelihood decision rule says that in order to minimize channel error probability, we should use that decision rule which chooses for each output symbol the input symbol with the highest probability.So, (3.1) is minimized if we choose d(b j ) = a * for each j where a * is defined by p(a * /b j ) ≥ p(a i /b j ) for all i.So, (3.1) can be written as The New Proof of Fano's Bound using Log Sum Inequality.To prove Fano's [15] bound, we make use of log sum inequality given below: For non-negative numbers x 1 , x 2 , ..., x n and y 1 , y 2 , ..., y n , the log-sum inequality is expressed as with equality if and only if x i y i is constant for all i.Now, where Multiplying both sides of (3.7) with p(b j ) and summing over B , we get Since f (V ) is concave function of V , applying Jensen's [6] inequality to the first part of R.H.S. of (3.8), we get which is the Fano's [15] bound on equivocation.The equality is achieved when p(a/b) = p E r−1 for all b, a = a * and p(a * /b) = pE for all b.

Table 1 :
Word lengths for s codes