For time invariant discrete memoryless sources
The source coding theorem states that for any and any discrete-time i.i.d. source such as and for any rate less than the entropy of the source, there is large enough and an encoder that takes i.i.d. repetition of the source, , and maps it to binary bits such that the source symbols are recoverable from the binary bits with probability at least .
Let be the total number of possible messages. Next map each of the possible source output sequences to one of the messages randomly using a uniform distribution and independently from everything else. When a source is generated the corresponding message is then transmitted to the destination. The message gets decoded to one of the possible source strings. In order to minimize the probability of error the decoder will decode to the source sequence that maximizes , where denotes the event that message was transmitted. This rule is equivalent to finding the source sequence among the set of source sequences that map to message that maximizes . This reduction follows from the fact that the messages were assigned randomly and independently of everything else.
Thus, as an example of when an error occurs, supposed that the source sequence was mapped to message as was the source sequence . If was generated at the source, but then an error occurs.
Let denote the event that the source sequence was generated at the source, so that Then the probability of error can be broken down as Thus, attention can be focused on finding an upper bound to the .
Let denote the event that the source sequence was mapped to the same message as the source sequence and that . Thus, letting denote the event that the two source sequences and map to the same message, we have that
and using the fact that and is independent of everything else have that
A simple upper bound for the term on the left can be established as
for some arbitrary real number This upper bound can be verified by noting that either equals or because the probabilities of a given input sequence are completely deterministic. Thus, if then so that the inequality holds in that case. The inequality holds in the other case as well because
for all possible source strings. Thus, combining everything and introducing some , have that
Where the inequalities follow from a variation on the Union Bound. Finally applying this upper bound to the summation for have that:
Where the sum can now be taken over all because that will only increase the bound. Ultimately yielding that
Now for simplicity let so that Substituting this new value of into the above bound on the probability of error and using the fact that is just a dummy variable in the sum gives the following as an upper bound on the probability of error:
- and each of the components of are independent. Thus, simplifying the above equation yields
The term in the exponent should be maximized over in order to achieve the highest upper bound on the probability of error.
Letting see that the error exponent for the source coding case is: