Degrees of Freedom Region of the (M, N1, N2) MIMO Broadcast Channel with Partial CSIT: An Application of Sum-set Inequalities

The degrees of freedom (DoF) region is characterized for the 2-user multiple input multiple output (MIMO) broadcast channel (BC), where the transmitter is equipped with M antennas, the two receivers are equipped with N1 and N2 antennas, and the levels of channel state information at the transmitter (CSIT) for the two users are parameterized by β1, β2, respectively. The achievability of the DoF region was established by Hao, Rassouli and Clerckx, but no proof of optimality was heretofore available. The proof of optimality is provided in this work with the aid of sum-set inequalities based on the aligned image sets (AIS) approach.


Introduction
The availability of channel state information at the transmitter(s) (CSIT) greatly affects the capacity of wireless networks, so much so that even the coarse degrees of freedom (DoF) metric is significantly impacted.Under perfect CSIT a K-user interference channel has K/2 DoF [1] and the corresponding K-user MISO BC has K DoF almost surely [2].However, if CSIT is limited to finite precision then the DoF collapse to unity in both cases.The large gap between the two extremes underscores the importance of studying partial CSIT settings.A key obstacle for these studies tends to be the proof of optimality once an achievable DoF region has been established based on the best known achievable schemes.For instance, the conjecture by Lapidoth, Shamai and Wigger [3], that the DoF collapse under finite precision CSIT, remained open for nearly a decade, until it was finally settled using an unconventional (combinatorial) argument, called the aligned image sets (AIS) approach in [4].The AIS approach seeks to directly bound the number of codewords that can be resolved at one receiver while aligning at another receiver, under arbitrary levels of CSIT.Since its introduction in [4], the AIS approach has been successfully applied to construct proofs of optimality for a number of basic broadcast and interference channel settings.With each application the AIS approach has been further generalized, broadening its utility and scope.The unconventional nature of the AIS approach, in particular its reliance on combinatorial reasoning from first principles to bound the sizes of the aligned image sets, makes these generalizations quite challenging.Particularly relevant to this work is the recent effort in [5] to derive a new class of sumset inequalities based on the AIS approach, to serve as a toolkit for future DoF studies.In this work we demonstrate the utility of these sumset inequalities by providing the proof of optimality 1 arXiv:1901.06010v1[cs.IT] 17 Jan 2019 for a DoF region for the 2-user MIMO BC under partial CSIT, that was shown to be achievable by by Hao, Rassouli and Clerckx in [6], but whose optimality was heretofore open.
The setting of interest is a 2-user MIMO BC where the transmitter is equipped with M antennas, the two receivers are equipped with N 1 and N 2 antennas, and the levels of CSIT for the two users are parameterized by β 1 , β 2 ∈ [0, 1], respectively, such that β i = 0 represents no CSIT, β i = 1 represents perfect CSIT, and the intermediate values represent corresponding levels of partial CSIT.Existing results for this channel focus primarily on the two extremes of perfect CSIT and no CSIT.Exact capacity is known for the MIMO BC if the CSIT is perfect [7].The collapse of DoF under no CSIT has been shown for this channel under restrictive assumptions such as isotropic fading that essentially appeal to the degraded BC perspective [2,8,9,10].The particular setting of the MISO BC, where each user is equipped with only one antenna, i.e., N 1 = N 2 = 1, has recently seen much progress based on the AIS approach in [4,11,12], leading ultimately to its full GDoF characterization in [12] with arbitrary channel strengths and arbitrary channel uncertainty levels.For arbitrary antenna configurations and arbitrary levels of partial CSIT, an achievable DoF region is established by Hao, Rassouli and Clerckx in [6].The optimality of this achievable region has been shown in [6] for certain parameter regimes (mainly N 1 ≤ N 2 , M ≤ N 2 ), based on existing bounds, as well as AIS arguments.However, the general DoF region characterization remains open.Our main goal in this work is to provide a complete DoF region characterization by providing the proof of optimality that was heretofore missing for the remaining parameter regime.Remarkably, the proof makes use of the sumset inequalities recently developed in [5].

Notation and Definitions
For n ∈ N, define the notation [n] = {1, 2, • • • , n}.The cardinality of a set A is denoted as |A|.The notation X [n] stands for {X(1), X(2), • • • , X(n)}.Moreover, X [n] i also stands for {X i (t) : ∀t ∈ [n]}.The support of a random variable X is denoted as supp(X).The sets R, Q, R n and Q n stand for the sets of real numbers, rational numbers, all n-tuples of real numbers and all n-tuples of rational numbers, respectively.Moreover, the set R 2+ is defined as the set of all pairs of nonnegative numbers.For any set S, we define the set S c as the complement of the set S. If A is a set of random variables, then H(A) refers to the joint entropy of the random variables in A. Conditional entropies, mutual information and joint and conditional probability densities of sets of random variables are similarly interpreted.Moreover, we use the Landau O(•) and o(•) notations as follows.For functions f (x), g(x) from R to R, f (x) = O(g(x)) denotes that lim sup x→∞ = B to indicate that the difference |A − B| is negligible in the DoF sense.We use P(•) to denote the probability function Prob(•).For any real number x we define x as the largest integer that is smaller than or equal to x when x > 0, the smallest integer that is larger than or equal to x when x < 0, and x itself when x is an integer.The number X r,s may be represented as X rs if there is no cause of ambiguity.For any vector V = v 1 • • • v k T and non-negative integer numbers m and n less than k, let us define the notation V m→n as follows, 2 For any two vectors Finally, for any m × n matrix V = The following definitions, inherited from [5], are replicated here for the sake of completeness.
Definition 1 (Power Levels) Consider integer valued variables X i over alphabet X λ i , where P λ i is a compact notation for √ P λ i .We refer to P ∈ R + as power, and are primarily interested in limits as P → ∞.Quantities that do not depend on P will be referred to as constants.The constant λ i ∈ R + denotes the power level of X i .
Definition 2 For non-negative real numbers X, λ 1 and λ 2 , define (X) λ 1 and (X) λ 2 λ 1 as, In words, for any X ∈ X λ 1 +λ 2 , (X) retrieves the top λ 2 power levels of X, while (X) λ 1 retrieves the bottom λ 1 levels of X. (X) λ 3 λ 1 retrieves only the part of X that lies between power levels λ 1 and λ 3 .Note that X ∈ X λ can be expressed as . A conceptual illustration of power level partitions is shown in Figure 1.Since expressions of the form (X) 1  1−λ appear frequently in this particular work, let us define a compact notation for this as follows.
Definition 3 For the vector Figure 1: Conceptual depiction of an arbitrary variable X ∈ X λ 1 +λ 2 +λ 3 , and its power-level partitions (X) λ 1 , (X) Definition 4 (Bounded Density Channel Set G) Let G be a set of real-valued random variables, which satisfies both of the following conditions.
1.The magnitudes of all the random variables in G are bounded away from infinity, i.e., there exists a constant ∆ < ∞ such that for all g ∈ G we have |g| ≤ ∆.
2. There exists a finite positive constant f max , such that for all finite cardinality disjoint subsets G 1 , G 2 of G, the joint probability density function of all random variables in G 1 , conditioned on all random variables in G 2 , exists and is bounded above by f Without loss of generality we will assume that f max ≥ 1, ∆ ≥ 1.
Definition 5 (Arbitrary Channel Set H) Let H be a set of real-valued constants with magnitudes bounded away from infinity, i.e., for all h ∈ H we have |h| ≤ ∆.
Definition 6 For real numbers for distinct random variables g j i ∈ G, arbitrary real valued constants h j i ∈ H, and arbitrary real valued constants 0 Noting that these functions are approximately (because of the • operations) linear, for simplicity we refer to them as linear combinations.In particular, we will refer to L g functions as random linear combinations and to L h functions as arbitrary linear combinations.The variables x i , v i will generally be used to represent different parts of transmitted signals.Note that the subscripts, such as L j , will be used to distinguish among different linear combinations, and may be dropped if there is no potential for ambiguity.

Definition 7 For the linear combinations
Note that the terminology from Definition (6) is invoked in Definition (7). Figure 2 provides a visual illustration of L h γ δ and T (A).From the definition of T (A) and T (B) in ( 16), it follows that, This is because the magnitudes of all 1 elements of G, H are bounded from above by ∆.
µ and let us bound them as follows.
Figure 2: Visual illustration of L γ δ and T (A).In this example, x 1 ∈ X η 1 and x 2 ∈ X η 2 are obtained as partitions of X 1 ∈ X η 1 +η 2 .Similarly, x 3 ∈ X η 3 and x 4 ∈ X η 4 are obtained as partitions of X 2 ∈ X η 3 +η 4 .Note that (γ i , δ i ) are only used to further trim the size of x i , yielding (x i ) γ i δ i as the trimmed versions.These trimmed variables are then combined with arbitrary coefficients to produce A = L γ δ .Finally, note that T (A) represents the size (power level) of the largest trimmed variable involved in L γ δ .
The channel uses are indexed by t ∈ N.
then for any acceptable random variable W 2 H(Z 2 Let G(Z) ⊂ G denote the set of all bounded density channel coefficients that appear in K , and let W be a random variable such that conditioned on any Go ⊂ (G/G(Z)) ∪ {W }, the channel coefficients G(Z) satisfy the bounded density assumption.
Figure 3: Illustration of an application of Theorem 1 of [5].Note that in this figure we dropped the time index (t) for convenience.On the left is the joint entropy of the sum (bounded density linear combination) of N = 3 dependent random variables, X 1 (t), X 2 (t), X 3 (t) ∈ X max k∈ [2] {λ k1 +λ k2 +λ k3 +λ k4 } , (M = 4), which is bounded below by the joint entropy of l

System Model
In this work we will focus on the setting where all variables take only real values.Extensions to complex valued settings may be cumbersome but are expected to be conceptually straightforward as shown in [4].We will focus on the two-user MIMO BC equipped with M antennas at the transmitter and N 1 , N 2 antennas at the two receivers, with the assumption throughout that since this is the only non-trivial setting where the DoF remain open.For all cases where the condition N 1 ≤ N 2 ≤ M is not true, the DoF are already established in [6].For all cases where M > N 1 +N 2 , it is easy to see that the DoF region is not affected if the number of transmit antennas is reduced to N 1 + N 2 , as follows.First, from the achievability side, note that the DoF innerbound shown in [6] remains unaffected if the number of transmit antennas is reduced to N 1 + N 2 .Then from the outer bound perspective we note that the capacity cannot be reduced if a genie informs the transmitter of the M − N 1 − N 2 dimensional transmit signal space that is not heard by either user, which allows the transmitter to discard these M − N 1 − N 2 transmit dimensions (antennas) without loss of generality, thereby reducing the effective number of transmit dimensions to N 1 +N 2 .Therefore in order to establish the DoF region for all cases where M > N 1 + N 2 , it suffices to show that the DoF outer bound matches the DoF inner bound for the setting M = N 1 + N 2 .

The Channel
If perfect CSIT was available, then for generic channel realizations in the two user (M, N 1 , N 2 ) MIMO broadcast channel with N 1 ≤ N 2 ≤ M ≤ N 1 + N 2 , there are M − N 1 transmit directions available to the transmitter that are in the null-space of the channel matrix between the transmitter and the first receiver.Similarly, there are M − N 2 transmit directions available to the transmitter that are in the null-space of the channel matrix between the transmitter and the second receiver.
A canonical representation of the channel (obtained by applying a change of basis operation at the transmitter) makes these directions explicit by mapping them to transmit antennas, so an M dimensional input vector X is partitioned as follows.
Recall that the notation X m→n as used here stands for (X m+1 , X m+2 , • • • , X m+n ).Thus, the partition X a contains transmit directions that are in the null-space of User 2 but not User 1, the partition X c contains transmit directions that are in the null space of User 1 but not User 2, and X b contains transmit directions that are not in the null space of either user.Further, note that if Evidently, for the first user, zero forcing is possible only along the M − N 1 dimensional space corresponding to X c , and for the second user, zero forcing is possible only along the M − N 2 dimensional space corresponding to X a .With partial CSIT, only partial zero-forcing is possible based on channel estimates available to the transmitter.Therefore, the channel model for the two user (M, N 1 , N 2 ) MIMO BC with partial CSIT, is represented in its canonical form by the following input output equations.See The dimensions of these symbols are listed as follows.
Here, over channel use t ∈ N, the vector of symbols seen at Receiver i, i ∈ {1, 2}, is Y i (t), and the vector of symbols sent from the transmitter is X t) correspond to directions along which no zero-forcing is possible, while G 1c , G 2a correspond to directions that can be partially zero-forced based on channel estimates available to the transmitter.Note that due to partial CSIT, the dimensions that can be partially zeroforced have channel strength diminished by the negative power exponents β 1 , β 2 for users 1 and 2 respectively, relative to those directions along which no zero-forcing is possible.The quality of CSIT is captured by As the CSIT parameters β j , j ∈ {1, 2}, take values in the interval from 0 to 1 they cover the full range from no CSIT (i.e., no zero-forcing ability) to perfect CSIT (perfect zero-forcing ability) in the DoF sense.Γ i (t) are the zero-mean unit variance additive white Gaussian noise terms seen at outputs Y i (t), independent of all inputs and channel realizations.
The input vector X(t) is subject to unit power constraint.All channel coefficients are distinct random variables drawn from the bounded density channel set G, (see Definition 4), therefore all channel coefficient magnitudes are bounded above by ∆ < ∞.Further, in order to avoid degenerate conditions, we assume that all channel matrices have determinants bounded away from zero, i.e., the absolute value of the determinant of each square channel submatrix is greater than a positive constant .
Perfect channel state information at the receivers (CSIR) is assumed to be available for all channels.In terms of CSIT, we assume that the transmitter is aware of the bounded density probability density functions of all channels, but not the actual channel realizations.

DoF
The definitions of achievable rates R i (P ) and capacity region C(P ) are standard.The DoF region is defined as

Main Result
The following theorem characterizes the complete DoF region of the two-user MIMO BC with arbitrary levels of partial CSIT Theorem 2 Without loss of generality, assume N 1 ≤ N 2 .The DoF region is expressed as follows.
1.For N 2 ≤ M : where β o is defined as, Remark 1 Wang and Varanasi [13] studied the DoF of the two-user MIMO broadcast channel with general message set (a common message and two private messages) under hybrid CSIT models where for each user the CSIT is either perfect (P), delayed (D), or not available (N).While the 'PP', 'PD', 'DP', 'DD', 'NN' settings are fully settled, for the 'PN', 'NP', 'DN' and 'ND' settings, only the linear DoF regions (i.e., the DoF region restricted to linear achievable schemes) are found and it is conjectured that the same regions are optimal even without restriction to linear achievable schemes.It is further explained in [13] that the 'NP' setting (where no CSIT is available for the first user and perfect CSIT is available for the second user)3 is the key, i.e., if the 'NP' case can be solved then the other three cases can be easily resolved.Furthermore, tight outer bounds that include a common message are found directly from the setting with only private messages by reducing the decoding requirement for the common message to only one of the receivers.Since the 'NP' setting corresponds to β 1 = 0, β 2 = 1, evidently Theorem 2 settles the conjectures of Wang and Varanasi [13] in the affirmative.
Remark 2 Achievability of the DoF region of Theorem 2 is established by Hao et al. in [6] based on a rate-splitting scheme that includes interesting 'space-time' scheduling aspects.Partial converse results are also presented in [6] based on relatively straightforward applications of the aligned image sets (AIS) argument [4].The problem that remains open is the proof of the outer bound (44) for , which is the main contribution of this work.Our proof exemplifies the utility of the 'sum-set inequalities' that were recently developed from AIS arguments in [5].

Deterministic Model
The first step of the AIS approach is to transform the channel model into the deterministic setting, such that a DoF outer bound for the deterministic setting is also a DoF outer bound for the original channel.This deterministic transformation produces a BC with input X(t) = Xa (t) Xc (t), and outputs Ȳ1 (t), Ȳ2 (t).
where Xa (t) and Xc (t) are defined as, and Xm (t

A Key Lemma
The key to the proof of the bound, , is the following lemma, which makes use of sumset inequalities from [5].
See Figure 5 for an accompanying illustration for Lemma 1.The proof of Lemma 1 is presented in Appendix A .

Proof of the Bound
1. Starting from Fano's Inequality for the first receiver and suppressing no(log(P )) terms that are inconsequential for DoF, we have, where (58) follows from Definition 2 and (59) from the chain rule.( 60) is implied by the fact that the entropy of a random variable is bounded by logarithm of the cardinality of its support, i.e., 2H( 61) is obtained from Lemma 1. (62) follows from the property that for independent random variables B and C, I(A; B) ≤ I(A; B | C).As a result, we have I(( 2. Similarly, starting from Fano's Inequality for the second receiver we have, (64) is true for the same reason as (60), i.e., because the entropy of a discrete random variable is bounded by logarithm of the cardinality of its support, i.e., H( 3. Summing the inequalities (62) and (64) we obtain, where ( 66) is obtained by a direct application of the sumset inequalities of [5], as explained below.From (66), the sum DoF bound d 1 + d 2 ≤ 3 + 7 9 follows immediately.
4. Finally, we explain how (66) is implied by the sumset inequalities of [5].Specifically, we need Theorem 4 of [5] to prove that H(( In order to apply the result of Theorem 1 (i.e., Theorem 4 of [5]) to our setting, let us set M = 1, K = 2, λ 1,1 = λ 2,1 = 1, l 1 = l 2 = 1, and I 11 = I 21 = {1}.Then, the inequality (29) reduces to, Let us specialize W = W 2 and define Note that L g functions can be used instead of L h functions in Z 11 , Z 21 , because it only weakens the result of Theorem 1.In other words, Theorem 1 makes the stronger claim that (181) holds even if channel coefficients are chosen as arbitrary constants in Z 11 , Z 21 .Since the claim is true for arbitrary constants, it is also true for randomly chosen coefficients, i.e., L g functions may be used instead of L h functions in Z 11 , Z 21 .Next, consider the ' .=' in (70) and (71).This is justified as follows.
Let us prove (70), and (71) is similarly implied.In order to prove (70) we will show that Z 11 (t) = ( Ȳ11 (t)) 1/3 − δ 11 (t) where H(δ 11 (t)) is bounded by a constant which does not scale with P .Since adding or subtracting bounded entropy noise terms can only make a difference of the order of no(log(P )) which is inconsequential in the DoF sense, the .= in (70) and (71) is justified. where Here, δ a (t) is a random variable which can only take values from the set {−1, 0, 1} as A + B − A + B ∈ {−1, 0, 1} for any real numbers A and B. Next, consider δ b , whose entropy is bounded as follows.Thus, from (181) we have, For the two-user(M, 2 ) levels of partial CSIT, from Theorem 2 the DoF region is computed as, and β o = 1 16 from (45).The challenge is to prove the bound (44), i.e., d 1 + d 2 ≤ 3 + 1 16 .

Deterministic Model
Similar to Section 6.1, the deterministic transformation produces a BC with input X(t) = Xa (t) Xc (t), and outputs Ȳ1 (t), Ȳ2 (t).
where Xa (t) and Xc (t) are defined as, and Xm (t

A Key Lemma
To prove the bound d 1 + d 2 ≤ 3 + 1 16 we need the following lemma.
Lemma 2 For the two-user MIMO BC with (M, N 1 , N 2 ) = (4, 1, 3) and See Figure 6 for an accompanying illustration for Lemma 2. The proof of Lemma 2 is presented in Appendix B. : 3 ) 2 ) 1. Starting with Fano's Inequality for the first receiver, we have, = H( = H( where (129) follows from Definition 2 and (91) is true from the chain rule.( 92) is concluded as the entropy of a random variable is bounded by the logarithm of the cardinality of its support, i.e., 3H(( 93) is obtained as from Lemma 2 and the chain rule we have As a result, we have I(( 2. Similarly, writing Fano's Inequality for the second receiver we have, Scaling ( 96) and (97) by 3 and 1 respectively, and summing them together we have, , ( X3 (t))  , ( X4 (t))  while Ȳ21 (t) is a bounded density linear combination of random variables ( X1 (t)) 1/2 , X2 (t), X3 (t), X4 (t).Thus, we have 8 Proof of Theorem 2 To prove Theorem 2 we only need to prove the outer bound (44).The proof for the general setting follows closely along the lines of the examples presented above.We start, as before, with the corresponding deterministic model.

Deterministic Channel Model
For all t ∈ [n], the channel outputs in the deterministic model are Ȳ1 (t) and Ȳ2 (t), which are defined as follows.

Useful Lemma
The following lemma from [14] will be useful, and is reproduced here for the sake of completeness.

Split Ȳ1 into Ȳ1a , Ȳ1ã
Since Xa is a vector random variable of size M − N 2 , using a change of basis operation at the receiver, the N 1 dimensional vector Ȳ1 can be partitioned into Ȳ1ã , which is its projection into the N 1 + N 2 − M dimensional space that does not contain Xa (t), and a projection Ȳ1a into the M − N 2 dimensional space that contains Xa (t).
When β 1 + β 2 ≥ 1, the bound (44) reduces to, where β o is equal to, Corresponding to Lemma 1, in this general setting we need the following lemma which is the key to the proof of the outer bound.
Lemma 4 For the two-user (M, N 1 , N 2 ) MIMO BC with partial CSIT where where the numbers N0 , N1 and N2 are defined as, See Appendix C for proof of Lemma 4. Now, let us prove the bound (121).
1. Starting with Fano's Inequality for the first receiver and suppressing no(log(P )) terms that are inconsequential for DoF, we have, Here (129) follows from Definition 2. The chain rule of entropy, and the fact that since User 1 has only N 1 antennas, the entropy of Ȳ[n] 1 cannot be more than nN 1 log( P ) + no(log(P )), justifies (130).Similarly, (131) is obtained because the entropy of a random variable is bounded by logarithm of the cardinality of its support, i.e., N2 H(( 2. Similarly, starting with Fano's Inequality for the second receiver we have, (136) is justified similarly as (131), as the entropy of a random variable is bounded by logarithm of the cardinality of its support, i.e., H( Ȳ[n] 2 | G) ≤ nN 2 log P + no log(P ).
3. Summing the inequalities (134) and (136) results in, 2 , we have, where ( 139) is true as all the parameters (λ 1i − λ 2i ) + in Lemma 3 are equal to zero.Finally, applying the DoF limit in (138), the bound In this section we prove the bound (44) for general (M, N 1 , N 2 ) when β 1 + β 2 < 1.The bound (44) simplifies to, where β o is equal to, The proof relies on the following lemma.
Lemma 5 For the two-user (M, N 1 , N 2 ) MIMO BC with partial CSIT where where the numbers N0 , N1 and N2 are defined as, See Figure 7 for the comparison of the two sides of (142).
See Appendix D for proof of Lemma 5.With the aid of Lemma 5, the proof of the bound (140) proceeds along the lines of the corresponding proof for β 1 + β 2 ≥ 1 that was presented in Section 8.3.1.
1. Starting from Fano's Inequality for the first receiver and proceeding through the same set of steps as (( 129)-( 134)) we have, where the difference between ( 134) and ( 146) is due to the difference in the definitions of ( N1 , N2 ) versus ( N1 , N2 ).

Conclusion
The DoF region of the the two-user MIMO BC with arbitrary levels of partial CSIT was characterized as a function of the number of antennas and the levels of CSIT while perfect CSIR is assumed.The main challenge was deriving an outer bound that captures the difference of entropies caused by asymmetric number of antennas and asymmetric levels of partial CSIT which was accomplished with the aid of sum-set inequalities and AIS approach.
3. The entropy of a discrete random variable is bounded by logarithm of the cardinality of it, i.e., H(( where ( 200) is true from the chain rule.
From the above observations in order to prove (121), it is sufficient to demonstrate the following inequality, Before proceeding to proof of (202) let us define the random variables Ci (t) as the components of Xc (t), i.e., Starting from the left side of (202), we have Let us explain how (204) follows from Lemma 3. Set M 1 = M 2 = 2, and define Ū1 and Ū2 as, From ( 116), ( 204) is concluded as all the (λ 1i − λ 2i ) + are zero in the right side of (116).( 205) is true from the chain rule, (206) is concluded as i in ( 230) and ( 208) follows from sub-modularity properties of the entropy function, i.e., for any m random variables {X 1 , X 2 , • • • , X m } where we define X k+m as X k for positive numbers k we have, where λ ri is derived for any i ∈ {1, 2} as

D Proof of Lemma 5
Define the set s β as the set {(β 1 , β 2 ) ∈ R 2+ ; β 1 , β 2 ≤ 1}.We claim that for any positive numbers Q 1 and Q 2 the real-valued function S(β 1 , β 2 ) defined as, is a continuous function on the set s β under the conditions specified in the following lemma .
Now, let us bound P(λ ∈ S ν ) from above.We wish to bound the probability that the images of these two codewords align, or in other words U (λ, G) = U (ν, G).Thus, we have For fixed value of g 1 the random variable g 2 (E 2 − F 2 ) must take values within an interval of length no more than 4. Thus, the probability of which is no more than 4fmax

E.1.2 Bounding the Average Size of Aligned Image Sets
Our goal is to prove that, for some positive constant c not depending on P which results in H(U ) − H(U | G) ≤ log(c log P ).Z i is defined as the support of the random variable |λ i − ν i | 7 .Define ∆ j as follows Consider the following two cases of max j∈{1,2} |∆ j | ≥ 2 and max j∈{1,2} |∆ j | ≤ 1.

E.2 Aligned Image Sets
Define Ū and Ū as Z We are only interested in the difference of entropies of Ū and Ū conditioned on W and G. Similar to the AIS approach in [5], we first claim that from the functional dependence, Ū can be made a function of Ū , W, G. Consider some instance of Ū , e.g., ν [n] .For given W and channel realization G, define aligned image set S ν [n] (W, G) as the set of all Ū which result in the same Ū as ν [n] .Since uniform distribution maximizes the entropy, where P(λ n ) is defined as the probability that λ n and ν n correspond to the same Ū.

E.3 Bounding the Probability of Image Alignment
Given G and W = w, consider two distinct instances of Ū denoted as λ [n] = (λ  ( Ēc (t)) L g kl (t)(( Ēa (t)) 1 e q Ēc (t)) a kl q a kl −1 q 1 < l ≤ e + 1 (298) L h k1 (t) ( Fa (t)) 1 ( Fc (t)) L g kl (t)(( Fa (t)) 1 e q Fc (t)) a kl q a kl −1 q where for any k ∈ [(M −N 2 )] we assume that a kl are arbitrary distinct decreasing numbers belonging to the set {m + 1, m + 2, • • • , q}, i.e., we assume that for any 2 ≤ l < l ≤ e + 1, a kl > a kl and {a k2 , a k3 − (E j (t)) where (308) follows from (307) as for any real number x, |x − x | < 1.For any j ∈ s For any k ∈ [K], t ∈ [n], l ∈ {N } and any fixed values of g k1 (t), • • • , g k(l−1) (t), g k(l+1) (t), • • • , g kM (t) the random variable g kl (t)A l (t) must take values within an interval of length no more than 2M .Therefore, for any k ∈ [K], t ∈ [n], l ∈ s M if A l (t) = 0, then g kl (t) must take values in an interval of length no more than 2M |A l (t)| , the probability of which is no more than

E.4 Bounding the Average Size of Aligned Image Sets
Let us assume n = 1, K = 1 as the generalization to arbitrary n and K follows similar to proof of Theorem 4 in [5].Without loss of generality let us drop the time index (t).Thus, our goal is to prove that, |f (x)| |g(x)| < ∞. f (x) = o(g(x)) denotes that lim sup x→∞ |f (x)| |g(x)| = 0.We use the notation A .
a, b, c, d ∈ N where a + b ≤ m and c + d ≤ n, define

( 275 )
is concluded as for any summation a∈Sa,b∈S b f (a, b) and the real-valued function f (a, b) we have, a∈Sa,b∈S b f (a, b) ≤ |S a | b∈S b max a∈Sa f (a, b) ≤ |S a ||S b | max a∈Sa,b∈S b f (a, b)