Linear and Deep Neural Network-based Receivers for Massive MIMO Systems with One-Bit ADCs

The use of one-bit analog-to-digital converters (ADCs) is a practical solution for reducing cost and power consumption in massive Multiple-Input-Multiple-Output (MIMO) systems. However, the distortion caused by one-bit ADCs makes the data detection task much more challenging. In this paper, we propose a two-stage detection method for massive MIMO systems with one-bit ADCs. In the first stage, we propose several linear receivers based on the Bussgang decomposition, that show significant performance gain over existing linear receivers. Next, we reformulate the maximum-likelihood (ML) detection problem to address its non-robustness. Based on the reformulated ML detection problem, we propose a model-driven deep neural network-based (DNN-based) receiver, whose performance is comparable with an existing support vector machine-based receiver, albeit with a much lower computational complexity. A nearest-neighbor search method is then proposed for the second stage to refine the first stage solution. Unlike existing search methods that typically perform the search over a large candidate set, the proposed search method generates a limited number of most likely candidates and thus limits the search complexity. Numerical results confirm the low complexity, efficiency, and robustness of the proposed two-stage detection method.


I. INTRODUCTION
Massive multiple-input multiple-output (MIMO) systems, possessing the capability of boosting the throughput and energy efficiency by several orders of magnitude over conventional MIMO systems [1], [2], are considered to be a disruptive solution for 5G-and-beyond networks [3], [4].However, a massive MIMO system requires a large number of radio-frequency (RF) chains, which significantly increases the power consumption and hardware complexity.Among the components of an RF chain, high-resolution analog-to-digital converters (ADCs) are power-hungry devices whose power consumption increases exponentially with the number of bits per sample and linearly with the sampling rate [5].A promising solution for reducing the power consumption and hardware complexity is to use lowresolution ADCs.The simplest architecture involving one-bit ADCs requires only one comparator and does not require an automatic gain control (AGC).Therefore, the use of one-bit ADCs can significantly reduce both the power consumption and hardware complexity.However, the severe nonlinearity of one-bit ADCs causes significant distortions in the received signals, since only the sign of the real and imaginary parts of the received signals is retained.
Due to the severe nonlinearity, data detection in one-bit massive MIMO systems becomes much more challenging.Numerous efforts have been made to address this problem, e.g., [6]- [12].A one-bit maximum-likelihood (ML) detector was derived in [6].For large-scale systems where ML detection is impractical, the authors of [6] proposed a socalled near-ML (nML) data detection method.The ML and nML methods are however non-robust at high signal-to-noise ratios (SNRs) when the channel state information (CSI) is not perfectly known.A one-bit sphere decoding (OSD) technique was proposed in [7].However, the OSD technique requires a preprocessing stage whose computational complexity is exponentially proportional to both the number of receive and transmit antennas.The exponential computational complexity of OSD makes it difficult to implement in large-scale MIMO systems.Generalized approximate message passing (GAMP) and Bayes inference are exploited in [8], but the resulting method is sophisticated and expensive to implement.Several other data detection approaches have also been proposed in [9]- [12], but they are only applicable in systems where either a cyclic redundancy check (CRC) [9]- [11] or an error correcting code such as a low-density parity-check (LDPC) code [12] is available.In this paper, we propose a two-stage detection method for massive MIMO systems with one-bit ADCs.The proposed method is efficient and robust with low complexity, and also applicable to large-scale systems without the need for CRC or error correcting codes.
In the first stage, we focus on a class of linear receivers.Existing work in this class has taken one of the following two strategies: (i) using standard linear receivers designed for systems with infinite-resolution ADCs, e.g., [6], [13], [14]; or (ii) using an approximate model for the one-bit ADC to construct other linear receiver designs, e.g., [15], [16].Here, we exploit the Bussgang decomposition [17] to propose new Bussgang-based linear receivers.Then, we study a deep learning-based detector for one-bit massive MIMO systems.There has been considerable recent interest in learningbased methods for MIMO data detection [18]- [25].While the deep learning-based detectors in [18]- [21] are designed for MIMO systems with full-resolution ADCs, the learningbased detectors in [22]- [24] are dedicated to systems with low-resolution ADCs and are "blind" in the sense that channel state information (CSI) is not required.However, these blind detection methods are restricted to MIMO systems with a small number of transmit antennas and only low-dimensional constellations.More recently, in [25] a support vector machine (SVM) was exploited for one-bit MIMO data detection, and the SVM approach was shown to achieve better performance than the above linear and learning-based receivers.In this paper, we develop new linear receiver designs, as well as a new deep neural network (DNN)-based architecture, namely OBMNet, that can be implemented for one-bit massive MIMO data detection.
The contributions of the proposed receivers for this first stage are summarized as follows: • First, we exploit the Bussgang decomposition to mitigate the severe nonlinearity of one-bit ADCs and achieve a linear input-output relationship, which is then used to derive Bussgang-based linear receivers.Numerical results show that the high-SNR bit-error-rate (BER) floor of our proposed Bussgang-based linear receivers is significantly lower than that of existing methods.• Next, we reformulate the ML detection problem by approximating the cumulative distribution function of a Gaussian random variable with a Sigmoid function.We show that the reformulated problem addresses the nonrobustness issue of conventional ML detection.We then propose a model-driven OBMNet for data detection in one-bit massive MIMO systems.Unlike the structure of conventional DNNs where each layer contains a fixed weight matrix and a fixed bias vector, each layer of the proposed OBMNet has two adaptive weight matrices and no bias vector.Numerical results show that OBMNet outperforms the linear receivers and its performance is also comparable with that of the SVM-based method in [25].However, the proposed OBMNet has much lower computational complexity than the SVM-based method.
In the second stage, we propose a nearest-neighbor (NN) search method to refine the solution of stage 1.The idea of using two-stage detection methods has been studied previously in [6], [25].However, the search metric used by the second stage of [6] is susceptible to CSI errors.This issue was addressed in [25] thanks to a more robust search metric.Although the second stage in [25] is robust, its complexity can be very high since the dimension of the search space over the entire candidate set can be very large.The contribution of the proposed NN search method is that it generates searches over a limited number of candidates that are nearest to the solution of stage 1 and thus helps contain the search complexity.The main challenge is to obtain the set of nearest candidates efficiently and quickly.To overcome this challenge, we propose a recursive strategy that can obtain this candidate set quickly so that the NN search method can be implemented in an efficient manner.
The rest of this paper is organized as follows: Section II introduces the assumed system model and presents the conventional as well as the proposed Bussgang-based linear receivers.The reformulated robust ML detection problem and OBMNet are proposed in Section III.Section IV presents the proposed NN search method.A computational complexity analysis and numerical results are given in Section V and Section VI  The notation ℜ{•} and ℑ{•} respectively denotes the real and imaginary parts of the complex argument.R and C denote the set of real and complex numbers, respectively, and j is the unit imaginary number satisfying j 2 = −1.CN (0, σ 2 ) denotes a zero-mean circularly symmetric Gaussian random variable with variance 2 dτ is the cumulative distribution function of the standard Gaussian random variable and σ(t) = 1/(1 + e −t ) is the Sigmoid function.If ℜ{•}, ℑ{•}, Φ(•), and σ(•) are applied to a matrix or vector, they are applied separately to every element of that matrix or vector.

II. LINEAR RECEIVERS FOR FIRST-STAGE DETECTION
This section introduces different types of linear receivers for massive MIMO systems with one-bit ADCs.We first present conventional linear receivers and then use the Bussgang decomposition to propose three new ones including Bussgangbased maximal ratio combining (BMRC), Bussgang-based zero-forcing (BZF), and Bussgang-based minimum mean squared error (BMMSE).

A. System Model
We consider an uplink massive MIMO system as illustrated in Fig. 1 with K single-antenna users and an N -antenna base station, where it is assumed that N ≥ K. Let x = [x 1 , x2 , . . ., xK ] T ∈ C K denote the transmitted signal vector, where xk is the signal transmitted from the k th user under the power constraint E[|x k | 2 ] = 1.The signal xk is drawn from a constellation M, e.g, QPSK or 16-QAM.Let H ∈ C N ×K denote the channel, which is assumed to be block flat fading.Let r = [r 1 , r2 , . . ., rN ] T ∈ C N be the unquantized received signal vector at the base station, which is given as where z = [z 1 , z2 , . . ., zN ] T ∈ C N is a noise vector whose elements are assumed to be independent and identically distributed (i.i.d.) as CN (0, N 0 ), and N 0 is the noise power.Each analog received signal is then quantized by a pair of one-bit ADCs.Hence, we have the received signal where sign(•) represents the one-bit ADC with sign(a) = +1 if a ≥ 0 and sign(a) = −1 if a < 0. The operator sign(•) of a matrix or vector is applied separately to every element of that matrix or vector.The SNR is defined as ρ = 1/N 0 .Given a received signal vector ȳ and a linear receiver represented by a combining matrix W = [w 1 , w 2 , . . ., w K ] T ∈ C K×N , the demultiplexing task is performed as The signal x is then equalized before symbol-by-symbol detection is performed.In the following, we present different structures for the combining matrix W. The discussion in the following sections assumes that the channel H is available at the base station, but in practice an estimate of the channel would be used instead.

B. Conventional Linear Receivers
A straightforward strategy to obtain linear receivers for onebit massive MIMO systems is to simply ignore the non-linear effect of the one-bit ADCs and use the conventional linear receivers designed for massive MIMO systems with infiniteresolution ADCs, as follows: • MRC receiver W MRC = HH , • ZF receiver • MMSE receiver In another strategy, the nonlinear effect of the one-bit ADCs can be linearized by the Additive Quantization Noise Model (AQNM) [26], [27] as where κ = 1 − α and α is the inverse of the signal-to quantization-noise ratio, which for one-bit ADCs is approximately given by α ≈ 0.3634 [27].The quantization distortion d is treated as additive Gaussian noise d ∼ CN (0, Σ d) that is uncorrelated with r, where The MMSE receiver for the model in ( 4) is given as [15] Another approximate MMSE receiver for quantized MIMO systems, referred to as the "Wiener Filter on Quantized data" (WFQ), is proposed in [16] as where Σ r = H HH + N 0 I N is the covariance matrix of r.
Once a combining matrix W has been computed, the demultiplexing task can be performed as in (3).If the combining matrix is W MRC , then the signal x is equalized as where w k is the k th column of W MRC .Since the norm squared of x = [x 1 , x2 , . . ., xK ] T may not equal K, the signal x should be rescaled as [6] x Finally, the signal ẋ can be used for symbol-by-symbol detection:

C. Proposed Bussgang-Based Linear Receivers
Here, we exploit the Bussgang decomposition to linearize the system model ȳ = sign(r) and then use the linearized model to propose new MRC, ZF, and MMSE receiver structures.Following the Bussgang decomposition, the system model ȳ = sign(r) can be rewritten as ȳ = Vr + ē [28] where ē is the quantization distortion, which is uncorrelated with r, i.e., E rē H = E r E ēH , and Let Ā = V H and n = Vz + ē, so the system model becomes where H is the effective channel and n is the effective noise, which is modeled as Gaussian with zero mean and covariance matrix [28]: Note that arcsin(C) = arcsin(ℜ{C}) + j arcsin(ℑ{C}) for any complex matrix C, and the operation arcsin(•) of a real matrix is applied separately on each element of that matrix.
Based on the effective channel Ā, we can derive a Bussgang-based MRC (BMRC) receiver and a Bussgangbased ZF (BZF) receiver as and We now derive the MMSE receiver for this Bussgangbased system model.The Bussgang-based MMSE (BMMSE) receiver can be obtained by solving the following optimization problem: whose solution is given in closed form as follows: We can expand In addition, E ȳȳ H is given by [28] E ȳȳ Hence, the resulting BMMSE receiver is given as where the second equality comes from the equivalent model in (11) and the expression for Σ n in (12).
It can be seen that the structure of the BMMSE receiver is similar to the that of the MMSE receiver, except that the BMMSE receiver applies a new effective channel and a new effective noise covariance.These differences come as the result of linearizing the system model with the Bussgang decomposition.
Since the effective channel is Ā, if the BMRC receiver is used, the equalization step is now performed as where w k and āk are the k th column of W BMRC and Ā, respectively.The rescaling step and symbol-by-symbol detection are the same as in ( 8) and ( 9).

III. DNN-BASED RECEIVER FOR FIRST-STAGE DETECTION
In this section, we first reformulate the conventional ML rule for one-bit MIMO systems, which is then exploited to devise OBMNet.We consider the same system model as presented in Section II, but for convenience in later derivations, we convert (1) and ( 2) into the real domain as follows: where , and We also denote y x (1) 2 x (1) 2K Layer 2 The conventional ML detection problem [6] for one-bit ADCs is given as which can also be written as where ĥn is an estimate of h n for n ∈ {1, . . ., 2N }.The ML detection formulations in ( 20) and ( 21) are however non-robust at high SNRs when ĥn = h n , or in other words, when the CSI is imperfectly known.This non-robustness issue is due to the function Φ(•) which approaches 0 exponentially fast and has been reported in [22], [23].A detailed explanation for this issue can be found in [25].
To address the non-robustness of the above ML formulation, we exploit a result in [29], which shows that the function Φ(t) can be accurately approximated by the Sigmoid function σ(t), which is a widely-used activation function in machine learning research.The approximation of Φ(t) is given as where c = 1.702 is a constant.It was shown in [29] that is approximately equivalent to minimizing log(1 + e −ct ).
Applying the approximation in ( 22) to (21), we obtain the following ML detection problem: As mentioned earlier, the ML detection formulation in (20) and ( 21) is not robust against imperfect CSI due to the Φ(•) function.However, the reformulated ML detection problem (23) does not share this robustness problem.It is interesting to note that log(1 + e t ) is referred to as the SoftPlus activation function in the machine learning literature.Hence, the proposed robust ML detection problem in (23) can be interpreted as a minimization problem whose objective is a sum of SoftPlus activation functions.Now, we develop a DNN-based receiver based on the proposed robust ML detection problem in (23).We relax the constraint x ∈ MK in (23) to x ∈ C K and denote the channel Input Output weight matrix −G weight matrix G T Fig. 3: Specific structure of layer ℓ where the trainable parameter is α ℓ and the weight matrices are adaptive to the channel and the received signal.
In order to optimize the step sizes {α ℓ }, we use the deep unfolding technique [30] to unfold each iteration in (26) as a layer of a deep neural network.The overall structure of the proposed OBMNet is illustrated in Fig. 2, where there are L layers and each layer takes a vector of 2K elements as the input and generates an output vector of the same size.The specific structure for each layer ℓ is illustrated in Fig. 3.
It can be seen that the proposed layer structure in Fig. 3 is different from that of conventional DNNs, since it exploits the specific structure of the ML detection problem.In particular, each layer of a conventional DNN often contains a weight matrix and a bias vector to be trained.However, due to the structure of the ML detection problem, in each layer of OBMNet the only trainable parameter is the step size α ℓ .The proposed layer structure has two weight matrices −G and G T and no bias vector, and the weight matrices are defined by the channel estimate and the received signal.
Since G ∈ R 2N ×2K , the learning process of each layer can be interpreted as first up-converting the signal from dimension 2K to dimension 2N using the weight matrix −G, then applying nonlinear activation functions before down-converting the signal back to dimension 2K using the weight matrix G T .The activation function in OBMNet is the Sigmoid function, which is also widely used in conventional DNNs.Note that the use of the Sigmoid activation function in OBMNet is not arbitrary but results from the use of the approximation in (22) and the structure of the ML detection problem.
The objective function to be minimized during the training phase is x − x 2 , where and x is the target signal, i.e., the transmitted signal.It should also be noted that the layered structure in Fig. 3 does not contain the coefficient c √ 2ρ.We omit this coefficient because it is a constant throughout the layers of OBMNet, and the output of the last layer x (L) needs to be normalized as in (27).We found by experiments that this omission not only helps improve the detection performance but also helps the training process to stably converge.
The training process is accomplished offline.A training sample can be obtained by randomly generating a channel matrix H, a transmitted signal x, and a noise vector z.The received signal y and the channel H are used to build the weight matrices and the transmitted signal x is used as the target.After the offline training processing, the trained step sizes {α ℓ } are ready to be used for the online detection phase.Similar to DetNet for unquantized MIMO detection [18], OBMNet for one-bit MIMO detection does not need to be retrained for a new channel realization H.

IV. NEAREST-NEIGHBOR SEARCH FOR SECOND-STAGE DETECTION
Given a received signal, as discussed above we can either use a linear receiver or OBMNet to obtain an estimate x of the transmitted signal x.However, these receivers all ignore the constraint that the transmitted signal x belongs to a known discrete set of constellation points.Ignoring this constraint can result in elements of the estimate x that are well removed from the constellation points, and thus detection errors are likely to occur once symbol-by-symbol detection is applied.This motivates us to propose here an NN search method as a second detection stage in order to fine-tune the solution of stage 1.
The proposed NN search method first finds a limited set of symbol vectors that are nearest to x and then searches over that set for the most likely symbol vector as the final detection solution.As mentioned in the Introduction, this idea has already been used in [6] and [25].However, the search space for the methods in [6] and [25] is very large when the number of users is large, and so they are not efficient in terms of computational complexity.The contribution of the proposed NN search method is that it generates searches over a limited number of symbol vectors that are nearest to the estimate x, and thus significantly reduces the computational load.
We denote M as the constellation in the real domain; for example, M = ± 1 An illustrative example for the relative difference between xi and the constellation points is given in Fig. 4.This example illustrates the problem that occurs when xi is close to a decision boundary point, where symbol-by-symbol detection may not be reliable.Here, we use a threshold γ > 0 to classify whether symbol-by-symbol detection is used or not.More specifically, if the distance from xi to its nearest decision boundary point b i is greater than γ, i.e., |x i − b i | > γ, then we can use symbol-by-symbol detection for xi .When |x i − b i | ≤ γ, symbol-by-symbol detection is not reliable, and so we list the two nearest constellation points to xi as the candidates for the transmitted signal x i .
Let A i denote the set of candidates for the transmitted signal x i .When |x i − b i | > γ, we apply symbol-by-symbol detection and so for QPSK and for 16-QAM.Hence, A i contains only one or two elements.The following example illustrates the formation of A i .
Hence, in this example, A 1 and A 3 have two elements while A 2 and A 4 have only one element.
The complete set of candidates for the transmitted signal vector is given by the Cartesian product and so the size of where A is the number of sets A i having two elements.The existing search methods in [6] and [25] always search over the entire set A. However, it can be seen that the size of A grows exponentially with A. In addition, A also grows as the number of users K increases.Thus, searching over the entire list A as in [6] and [25] can be prohibitively complex when the number of users is large.
On the other hand, the proposed NN search method finds a set of M symbol vectors in A that are nearest to x, then searches over that smaller set for the final solution.In this way, the NN search method can limit the computational complexity.Note that a symbol vector in this context is any element of A. Let X M = {x 1 , x 2 , . . ., x M } denote the set of the M nearest symbol vectors to x.The larger M is, the higher the probability that the set X M contains the true symbol vector.However, a large value of M will result in more computation for the search.Therefore, M should be chosen to achieve a good trade-off between detection accuracy and computational complexity.The value of M can be chosen by empirical evaluations.The main challenge here is how to find the M nearest symbol vectors to x quickly and efficiently.To address this problem, we employ the following notation and definitions.
For any two symbol vectors x ∈ A and x ′ ∈ A, let d(x, x ′ ) denote the number of position indices at which the elements of x are different from the corresponding elements of x ′ .Since each element of x and x ′ belongs to a finite set of just one or two elements, d(x, x ′ ) is actually the Hamming distance between x and x ′ .

Definition 1 (Neighbor of a symbol vector). A symbol vector
x is called a neighbor of another symbol vector x ′ , or vice versa, when the Hamming distance between them is one, i.e., d(x, x ′ ) = 1.
. . . . . . . . .(31) by (33) by ( 33) by ( 33) Definition 2 (Neighbor of a set).Given a set of symbol vectors S and another symbol vector x / ∈ S, let The symbol vector x is called a neighbor of S if and only if d min (x, S) = 1, or in other words, if and only if x is the neighbor of at least one member of S.
Let N (x) and N (S) denote the set of neighbors of symbol vector x and set S, respectively.Let X M = {x 1 , x 2 , . . ., x M } with x m ∈ A and m ∈ {1, 2, . . ., M } denote the set of the M nearest symbol vectors to x satisfying where x is any symbol vector in A, but not in X M .Hence, x m is the m th nearest symbol vector to x.Clearly, the nearest symbol vector x 1 is obtained by applying symbol-by-symbol detection to x.The problem now is how to efficiently find x 2 , . . ., x M .The following proposition can be exploited to solve this problem.
Proof: Please refer to Appendix A Proposition 1 indicates that we can find the m th nearest symbol vector x m from the neighbor set of X m−1 , i.e., where N (X m−1 ) is the neighbor set of X m−1 and is given as Hence, in order to find x m , we need to accomplish two tasks: (i) find m − 1 subsets {N (x p ) \ X m−1 } p=1,...,m−1 and (ii) search for x m within the subsets.The method of directly finding the m − 1 subsets and then searching them for x m is not efficient.In the following, we present a recursive strategy to obtain x m quickly and efficiently.
Note that the inner term on the right-hand side of ( 32) can be written as follows: Therefore, we can exploit (33) to obtain the first m−2 subsets ..,m−2 , which were already obtained previously when we found x m−1 .The last subset N (x m−1 ) \ X m−1 is obtained by using x m−1 and the other nearest symbol vectors.A flowchart illustrating this recursive strategy is given in Fig. 5. Remark 1: If the elements of N (x p ) \ X m−2 are already sorted in ascending order of distance to x, then x m−1 can be removed from N (x p ) \ X m−2 by simply checking the first element of N (x p ) \ X m−2 .The reason for this is that x m−1 is the (m− 1) th nearest symbol vector, which means the distance from x m−1 to x cannot be greater than the distance from any element of N (x p ) \ X m−2 to x.In addition, the elements of N (x p ) \ X m−2 are distinct and already sorted, and so if Remark 2: If the elements of each subset N (x p ) \ X m−1 are already sorted in ascending order of distance to x, then the search over the m − 1 subsets for x m can be done by simply searching over a list of m − 1 candidates, where each candidate is the first element of a subset N (x p ) \ X m−1 .
Based on the observations in Remarks 1 and 2, we propose the nearest-neighbor search method described in Algorithm 1.
The key idea is to use the recursive strategy depicted in Fig. 5 and to implement the observations made in Remarks 1 and 2. Whenever forming a set N (x m ), we sort its elements in ascending order of distance to x as described in lines 8 and 18 of Algorithm 1.In this way, we only need to sort M −1 times, and the remainder of the proposed algorithm only involves comparisons based on checking the first elements of the subsets.We denote C 1 , . . ., C M−1 as the subsets corresponding to x 1 , . . ., x M−1 , respectively, and C m [1] denotes the first element of the subset C m .Lines 10 and 11 implement Remark 2 to obtain x m .Remark 1 is implemented in lines 13- The last subset is obtained in lines 18-23.Finally, line 26 gives the final solution by searching for the highest-likelihood symbol vector among the M nearest symbol vectors.

A. Computational Complexity Analysis
A computational complexity comparison in terms of big-O notation is provided in Table I.It can be seen that the computational complexity of the proposed receivers is lower than that of existing methods.In particular, the linear receivers have the lowest complexity, while the OSD method in [7] has the highest complexity, which grows exponentially with K and N .Note that the complexity of the SVM-based method [25] is due to the decomposition techniques used to solve the SVM problem, e.g., [31]- [33].The term κ(N ) is empirically reported to be a super-linear function of N .The complexity of   the DNN-based OBMNet detector is only O(KN LT d ), which is lower than that of the SVM-based method.The computational complexity of the proposed NN search method is O(M K max{M, N }T d ) in the worst case.This complexity is mainly due to the detection step for x and the for loops as described in Algorithm 1.The complexity of the full A-space search method is O(|A|KN T d ) where |A| can grow exponentially with K.

B. Numerical Results
This section presents numerical results to show the performance of the proposed two-stage detection methods.The channel elements are assumed to be i.i.d. and each channel element is generated from the normal distribution CN (0, 1).
First, we evaluate the performance of the conventional and proposed Bussgang-based linear receivers assuming perfect CSI is available (examples with estimated CSI will be given next).Fig. 6 presents BER comparisons between the proposed and existing linear receivers for QPSK signaling.Among the   existing receivers, we see that the ZF and MMSE receivers obtain the same performance (blue curves with symbols), as do the AQNM-MMSE [15] and WFQ receivers [16] (black curves with symbols).The proposed Bussgang-based linear receivers significantly outperform their conventional counterparts.The high-SNR error floors of the proposed linear receivers are much lower than those of the conventional approaches.These performance improvements are achieved thanks to the exact linear input-output relationship of massive MIMO systems with one-bit ADCs obtained by the Bussgang decomposition.In Fig. 6b, we evaluate the performance as the number of users K increases.Here, we omit AQNM-MMSE and WFQ since they are outperformed by ZF and MMSE.It is observed that the proposed linear receivers always yield lower BERs than the standard and the performance improvement is best seen when the number of users K is not too large.As K increases, the gap between the error floors tend to diminish.This is due to the fact that for large K, we have where µ = 2/(π(K + N 0 )).These approximations result in Bussgang-based linear receivers that are equivalent to the conventional approaches to within a scaling factor: In Fig. 7, we provide BER comparisons between the ZF, MMSE, BZF, and BMMSE linear receivers with estimated CSI for a case with K = 2 users and N = 16 antennas.Figure 7(a) shows results for the Bussgang-based channel estimator in [13], while Fig. 7(b) employs the SVM-based channel estimator of [25].It can be seen that the BMMSE receiver always outperforms the others.A striking observation is that ZF and MMSE with estimated CSI outperform ZF and MMSE with perfect CSI.There is a reason for this.Recall that Bussgang-based linear receivers BZF and BMMSE use the effective channel Let Āi,: and Hi,: denote the i th row of Ā and H, respectively, then we have Hi,: Hi,: 2 + N 0 , i = 1, 2, . . ., N.
This indicates that the effective channel Āi,: is a normalized version of the true channel.Note that the instantaneous magnitude of Hi,: is not identifiable in 1-bit quantized MIMO systems [34], and consequently the SVM-based [25] and BMMSE [13] channel estimators provide estimates whose magnitudes are normalized.Therefore, when using a channel estimator such as [13], [25], ZF with estimated CSI will give the same performance as BZF with estimated CSI. ZF with estimated CSI outperforms ZF with perfect CSI since the channel estimate takes into account the inherent scaling ambiguity in the observed data.For the same reason, MMSE and BMMSE with estimated CSI also outperform MMSE with perfect CSI, but MMSE performs worse than BMMSE because MMSE still applies the noise covariance matrix N 0 I, while BMMSE uses the covariance matrix Σ n that includes information about the quantization noise.
For the first stage, besides the Bussgang-based linear receiver, we also proposed OBMNet, which is devised from a reformulated robust ML detection problem.In Fig. 8, we verify the robustness of the reformulated ML detection problem in (23) when implemented with estimated CSI.We carried out simulations using the BMMSE channel estimator [13] with different training lengths T t .It can be seen from Fig. 8 that when the CSI is perfectly known, both the conventional and the proposed ML detection algorithms yield almost identical performance.However, when the CSI is imperfectly known, the performance of conventional ML detection is significantly degraded at high SNR, while the proposed robust ML detection algorithm remains stable.This verifies our analysis in Section III.Fig. 9 provides a performance comparison between the proposed DNN-based, BMMSE, and BZF receivers and the SVM-based receiver in [25].The performance of OSD is comparable to that of the SVM-based method but with much higher computational complexity.Since the SVM-based method also outperforms other prior methods, we use it as a comparative benchmark in this paper.To implement the SVM-based receiver, we use the Scikit-learn machine learning library [35], and the maximum number of iterations is set to be 30.For training OBMNet, we use TensorFlow [36] and the Adam optimizer [37] with a learning rate of 10 −2 .The size of each training batch is set to 1000.The input of the first layer x 0 is set to a zero vector.During the detection phase, the trained OBMNet is employed to perform batch detection.Note that batch detection is an advantage of DNN since it can take a batch of multiple symbol vectors as its input, which speeds up the detection process [18].The effect of batch size on run time can be seen in Table II.The results in Fig. 9 show that the proposed OBMNet and the SVM-based method outperform the Bussgang-based linear receivers.At high SNRs, the BER floor of OBMNet detector is slightly lower than that of the SVMbased method.For the case of QPSK, K = 4, and N = 32, OBMNet has 10 layers (L = 10) with the following trained step sizes: To evaluate the computational complexity of the receivers used in Fig. 9, average run time is reported in Table II.Since the run time is largely affected by implementation details and the associated hardware/platform, to ensure fairness, we implemented all the receivers using the same simulation hardware with Python 3.7 and the Numpy package.Note that the run time of the SVM-based method depends on the SNR, and so we report the resulting range of run times.It can be seen from Table II that the Bussgang-based linear receivers have lower complexity than OBMNet and the SVM-based receiver.This is obvious since the linear receivers only require a matrix-vector multiplication for detecting each received signal.The run time of the BZF receiver is smaller than that of BMMSE because the combining matrix W BZF only involves the inversion of a       K × K matrix while W BMMSE requires the inverse of an N × N matrix.OBMNet is more computationally expensive than the linear receivers but its complexity is still much less than that of the SVM-based method.It can also be seen that the run time of OBMNet can be significantly reduced by increasing the batch size.A similar observation about the effect of the batch size on run time is reported in [18].Note that the run time of the SVM-based method does not depend on the batch size since it processes different received signals separately and each time slot requires the SVM-based method to solve a new optimization problem.
For the second stage, performance comparisons are given in Fig. 10 for the case of QPSK with K = 4 and N = 32, and Fig. 11 for the case of 16-QAM with K = 8 and N = 128.We set γ = 1 2 √ 2 for QPSK and γ = 1 2 √ 10 for 16-QAM.Here, we compare the BZF, OBMNet, and SVM-based receivers and omit BMMSE since the performance of BZF and BMMSE are comparable, and the complexity of BZF is lower than that of BMMSE.The case of M = 1 is equivalent to the use of symbol-by-symbol detection in the first stage.In this case, OBMNet provides the best performance, i.e., it yields the best initial detection results.When increasing M , the proposed NN search method in the second stage significantly improves the performance compared to the first stage.In Fig. 10, the BERs obtained with a small M , e.g., M = 2, are already close to the BER of the ML detection approach.The results in Fig. 11 clearly show that the performance can be improved by increasing M , but this requires more computation resources  as seen in Table III.Thus, one should choose M to balance the detection accuracy and computational complexity.It should be noted that |A| is always a power of two, but M can be any positive integer number.

VI. CONCLUSION
In this paper, we have proposed two-stage detection methods for massive MIMO systems with one-bit ADCs.In particular, for the first stage, we proposed new linear receivers based on the Bussgang decomposition and a novel modeldriven OBMNet detector, which is constructed based on a reformulated robust ML detection problem.The layered structure of OBMNet is simple, unique, and adaptive to the CSI and received signals.These receivers outperform existing approaches and also have low complexity.For the second stage, an NN search method was proposed to further improve the performance of the first stage.This NN search method allows one to limit the search complexity as desired.

APPENDIX A PROOF OF PROPOSITION 1
Since x m is the m th nearest symbol vector, we have the following condition: (36) for any x / ∈ X m .We prove the proposition by contradiction.Suppose that x m is not a neighbor of X m−1 , i.e., x m / ∈ N (X m−1 ) or d min (x m , X m−1 ) > 1.For the sake of simplicity, we consider the case where d min (x m , X m−1 ) = 2. Proof for the other cases where d min (x m , X m−1 ) > 2 can be accomplished similarly.
Let x p ∈ X m−1 with p ∈ {1, 2, . . ., m − 1} be a symbol vector such that d(x p , x m ) = 2. Without loss of generality, we can always assume that the two position indices at which the differences occur are 1 and 2, i.e.,

    
x m,1 = x p,1 x m,2 = x p,2 x m,i = x p,i ∀i ∈ {3, . . ., 2K}.The inequality in (40) indicates that x m − x 2 > x ′′ − x 2 , which means x ′′ is closer to x than x m , or in other words, x m is not the m th nearest symbol vector of x.This is contradicted by (36).

Fig. 1 :
Fig. 1: Block diagram of a massive MIMO system with K single-antenna users and an N -antenna base station equipped with 2N one-bit ADCs.

Fig. 4 : 2 , 2 ;
Fig. 4: An example for the relative difference between xi and the constellation points: (a) the estimate xi is far from b i = 0 and close to the constellation point 1/ √ 2, which means there is a high probability that the transmitted signal x i is 1/ √ 2; (b) the estimate xi is close to the boundary point b i = −2/ √ 10, thus it is difficult to say if −3/ √ 10 or −1/ √ 10 was transmitted.

Fig. 5 :
Fig.5: Flowchart of the proposed nearest-neighbor search method.A recursive formation of sets is exploited to reduce the computational complexity.A subset N (xp)\X m−1 with p ∈ {1, . . ., m−2} is obtained by removing x m−1 from the subset N (xp)\X m−2 as given in(33).The last subset N (x m−1 )\X m−1 is obtained by using x m−1 and other nearest symbol vectors.The m th nearest symbol vector xm is then obtained by searching over the m − 1 subsets.

Fig. 6 :
Fig. 6: First stage performance comparison between the proposed and existing linear receivers with QPSK signaling.

Fig. 8 :Fig. 9 :
Fig.8: Performance comparison between the conventional and the proposed ML detection problems with K = 2, N = 16, and QPSK signaling.The BMMSE channel estimator[13] is used with different training lengths Tt.

TABLE I :
Computational Complexity Comparison: T d is the data block length, κ(N ) is a super-linear function of N , and GNs = 2N .

TABLE II :
First stage average run time.

TABLE III :
Second stage average run time.