An overview on modern Coherent Optical Systems - Part III

An overview on modern Coherent Optical Systems - Part III

Quick recap

In Part II we discussed the main demodulation architectures and techniques implemented in modern coherent optical receivers. Although all the constituent details underlying their operation have been described, we could jokingly say that we "reckoned without one's host". And the host who demands the bill is called noise, which is the main cause of performance degradation in telecommunications systems, generally speaking. Fairly though, the noise has "crept" into the discussions led since Part I, without however delving further into the issue. Therefore, the purpose of this third discussion is precisely to analyze how noise affects coherent optical systems and how to evaluate their overall performance in this regard.

Noise analysis in coherent telecommunication systems

In the previous two articles I have inserted external links for some topics that would require, each one, a specific discussion. In this third part, however, in order to avoid constant references to external sites with the risk of making the treatment dispersive, I have decided to bring in a single collection all that is needed to make clear and exhaustive the noise analysis in coherent optical receivers, as discussed further on. Certainly some topics may be trivial or redundant for the more experienced, but I always prefer to address the "ideal" reader who does not know the subject in detail, but still wants to understand it.

Recall of basic probability theory principles

Probability density function (PDF) and random variables

Probability theory is rooted in situations where an experiment with a result is subject to chance. The experiment is said to exhibit statistical regularity if, for any sequence of n trials, for very large n, the relative frequency limit na/n converges to a finite value:

No alt text provided for this image

This limit is defined just as the probability of event A.

Whenever we flip a coin, and lose (or win…) a bet, we're doing a random experiment. Heads or tails, however, is not a suitable mathematical representation. It is then useful to assign a number or a range of values to the results of the random experiment. For example, heads could correspond to 1 and tails to 0. We use the expression random variable precisely to describe the assignment of a number to the result of a random experiment. The advantage of using random variables is that probability analysis can be developed in terms of real quantities, regardless of the shape or course of events in the random experiment. Random variables can be discrete and have only a finite number of values (like the flip of a coin). Otherwise, they are continuous and assume values in a real interval such as, for example, the amplitude of the noise voltage which, in a particular instant of time, can assume any value between -∞ and +∞. The probability that a random variable X takes on values less than or equal to the real variable x is defined as a function of the distribution:

No alt text provided for this image

If X is a continuous random variable and furthermore Fx is differentiable with respect to x, then the most commonly used probability density function (PDF) can be defined as follows:

No alt text provided for this image

It is common that the result of an experiment can be described by several random variables, and we are interested in the relationships between them. If we consider X and Y as two random variables, we can extend the result [2] defining the joint probability distribution Fxy as the probability that the random variable X is less than or equal to a fixed value of x and that the random variable Y is less than or equal to a fixed value y. Similarly to the result [3], one can generalize to the case of two random variables by obtaining the joint probability density fxy. We do not elaborate further on this generalization to the case of multiple causal variables, but we only use it to point out the following remarkable result; two random variables, X and Y, are statistically independent, if the result of X does not affect the result of Y. Mathematically, this translates into the fact that the joint probability is given by the product of the individual probabilities:

No alt text provided for this image

Conditional probability

Suppose we are studying a random experiment or a signal characterized by two random variables, X and Y, which are not independent. Then knowing the value of one random variable, X, could affect the observed values of the other random variable. Let P[Y|X] be the discrete probability density of Y given that X has occurred. The probability P[Y|X] is called the conditional probability of Y given X. Assuming that X has a non-zero probability, the conditional probability takes the following form:

No alt text provided for this image

where P[X, Y] is the joint probability defined in [4].

Expected values: mean and variance

The distribution function provides a complete description of a random variable by including more information than is necessary. In these cases, simple statistical concepts called expected values, such as mean and variance, can be used to describe the random variable.

In the case of a discrete random variable X, the mean μx is the weighted sum calculated on all possible outcomes of the random variable X:

No alt text provided for this image

In the case of continuous random variable with PDF fx, the analogous definition of the expected value is the following:

No alt text provided for this image

For example, if X is a random variable representing the voltage observations of a random signal, then the mean value of X represents the mean voltage, or DC component, of the signal.

The variance of a random variable is an estimate of the probability dispersion around the mean. In the case of discrete random variables, the variance is given by the expected value of the square of the distance of each result from the mean value of the distribution:

No alt text provided for this image

In the presence of a continuous random variable with usual PDF fx, the analogous definition of variance is given by:

No alt text provided for this image

For example, if X is representative of the voltage observations of a random signal, then the variance takes the meaning of the mean (or average) power of that signal. In general, in the presence of signals, the variance typically takes the name of root mean square (RMS) and physically represents the total power of the signal.

Random Gaussian variables

The Gaussian random variable plays a very important role in many applications and is by far the random variable that is encountered most frequently in the statistical analysis of communication systems. A Gaussian random variable is a continuous random variable with a PDF given by:

No alt text provided for this image

In the particular case in which the mean [7] is zero and the variance [9] is unitary, the [10] takes the following form:

No alt text provided for this image

It is called a normalized (or standard) Gaussian random variable. Below is a Python script that represents the typical trend.

import matplotlib.pyplot as plt
import numpy as np

mu_x, sigma_x = 0, 1
f_x = np.random.default_rng().normal(mu_x, sigma_x, 1000)

c, samples, i = plt.hist(f_x, bins=50, rwidth=0.85, density=True, label="Samples", color='gray')
plt.plot(samples, 1 / (sigma_x * np.sqrt(2 * np.pi)) * np.exp(- (samples - mu_x) ** 2 / (2 * sigma_x ** 2)),
         label="PDF", color='red', linewidth='2')
plt.legend(loc="upper right")
plt.ylabel('Probability')
plt.xlabel('Samples')
plt.grid(True)
plt.show()        
No alt text provided for this image

A function often used in communications is the Q(x) function, defined as:

No alt text provided for this image

It is equal to the area subtended by the positive tail of the normalized Gaussian PDF, and is particularly useful, as we are going to see, for evaluating the effect of noise in numerical communication. In the literature there is often the definition of the (Gaussian) error function, defined as follows:

No alt text provided for this image

A companion function, called a complementary error function, is defined by:

No alt text provided for this image

In particular, by examining equation [12] and the first member of [14], we observe that the function Q(x) and the complementary error function are related as follows:

No alt text provided for this image

Below is a Python script that produces the graph of the Q(x) function in semilogarithmic scale:

import matplotlib.pyplot as plt
import numpy as np
from scipy import special

start, stop, step = 0.0, 5.0, 0.4
float_range_array = np.arange(start, stop, step)
float_range_list = list(float_range_array)

data = [0.5 - 0.5 * special.erf(i/np.sqrt(2)) for i in float_range_list]

plt.yscale("log")
plt.plot(data, color='green')
plt.xlim(0, 12)
plt.ylabel('Q(x)')
plt.xlabel('x')
plt.grid(True)
plt.show()        
No alt text provided for this image

Random processes and correlation

As is well known, in a telecommunications system, the received signal is formed by an information signal component, a random interference component and a channel noise component. That is, the received time-varying signal is found to be of a random nature; the combination of the concepts of time-variance and random variables allows us to introduce the principle of random processes. These are processes for which it is not possible to predict in advance the exact value of the signals involved, but it is still possible to describe them in terms of statistical parameters, such as the mean power and the power spectral density. In real applications, such statistical characterizations are often found to be independent of the observation instant; in other words, if a random process is divided into a certain number of time intervals, the different sections of the process essentially have the same statistical properties: in this case we are talking of stationary random processes. Furthermore, without going into analytical detail, a stationary random process is said to be stationary of the first order if mean and variance are time-invariable. It is instead stationary of the second order if the covariance and correlation do not depend on the absolute time.

Given two random variables X and Y, the covariance is given by the expected value of the product of the two random variables, that is:

No alt text provided for this image

If the two random variables are continuous with joint density fxy, then [16] can be rewritten as follows:

No alt text provided for this image

If the two random variables are statistically independent, we find that:

No alt text provided for this image

as expected from [4]. Now, let's take a step forward by applying the covariance results to a random process that varies over time, X(t); in fact, it must be premised that while random processes are, by definition, not predictable, it is often observed that the samples of the process at different instants of time can be correlated. Therefore, considering two samples of the process X(t) at the time instants t1 and t2, hence X(t1) and X(t2), the covariance is given by applying [16]:

No alt text provided for this image

Precisely the first addendum in the second member of [19] is the autocorrelation of the random process, and is defined with the following general notation:

No alt text provided for this image

If X(t) is stationary of the second order (or greater), then equation [20] can be rewritten as follows:

No alt text provided for this image

Second order stationarity implies many other statistical properties; what interests us, however, because it is typical of real applications, is that:

  • The mean of the random process is a time-independent constant for any interval considered:

No alt text provided for this image

  • The autocorrelation of the random process depends only on the difference of the times, arbitrarily taken two instants t and τ:

No alt text provided for this image

For our discussion, it is sufficient that a random process possesses the properties [22] and [23]; in this case, it is called stationary in the broad sense or weakly stationary. We can say that the physical meaning of the autocorrelation function Rx(τ) is that it provides a powerful tool for describing the interdependence between two random variables obtained by observing a random process X(t) in instants τ seconds apart. It is evident that the faster the random process X(t) changes, the faster the autocorrelation function Rx(τ) will decrease from its maximum value Rx(0), as τ increases.

Spectrum of random signals

Let x(t) be a sample function of a random process X(t). The following figure shows the behavior of the xT(t) waveform in the interval - T < t < T.

No alt text provided for this image

It is possible to define the Fourier transform of the sample function xT(t) as

No alt text provided for this image

In fact, it happens that the Fourier transform [24] converts a collection of random variables X(t), indexed by the parameter t, into a new family of random variables ΞT(f), indexed by the parameters f. As known, the mean power is obtained by calculating the square of [24] and averaging over a period equal to 2T. In dealing with a random process, the mean requires the use of the probability distribution of the transformed family of random variables, previously defined. This allows us to define the power spectral density of the random process X(t) as

No alt text provided for this image

The power spectral density, defined by [25], and the autocorrelation function, defined by [23], of a stationary random process in the broad sense form a pair of Fourier transforms in the variables f and τ. These two equations are known as Weiner-Khintchine relations applied to stochastic processes (they are valid, in fact, under other hypotheses, also for deterministic processes):

No alt text provided for this image

They practically form the basis of the spectral analysis theory for random processes and also show how it is possible, known the autocorrelation, or the spectral power density, to determine the dual quantity with precision.

The typology of random processes commonly encountered in the study of communication systems is once again of the Gaussian type. We have in fact seen before that the results of Gaussian random variables play a fundamental role in telecommunication systems mainly for two reasons:

  1. the probability distribution of many physical processes that generate noise in communication systems can be considered approximately Gaussian;
  2. a Gaussian variable is mathematically tractable and therefore easier to process.

Similarly, a Gaussian random process plays an important role in the study of random processes, again for two reasons:

  1. the Gaussian process has many properties that allow us to obtain analytical results;
  2. the random processes obtained from physical phenomena are often such as to make a Gaussian model appropriate.

Without going into mathematical detail, these two fundamental considerations are sufficient to exhaustively deal with noise.

White noise

The noise analysis of communication systems is often based on an ideal noise model called white noise, whose power spectral density is frequency-independent. In particular, all frequency components are present in equal measure. The power spectral density of a white noise W(t) is indicated as

No alt text provided for this image

where the factor 1/2 indicates that half of the power is associated with positive frequencies and half with negative frequencies, as shown in the following figure.

No alt text provided for this image

The dimensions of N0 are watts per 1 Hz of bandwidth. N0 is usually measured at the receiver input. Observing the previous figure, we note that the spectral density does not contain the Dirac delta function at the origin, so that white noise has no DC power, i.e. its average value is zero. By applying the Weiner-Khintchine relations [26] - [27], we can calculate exactly the autocorrelation function of white noise:

No alt text provided for this image

Therefore the white noise autocorrelation function is constituted by a Dirac delta function weighted by the factor N0/2 and placed at the instant τ = 0, as shown below.

No alt text provided for this image

We observe that the autocorrelation [29] is zero when τ ≠ 0. Consequently, two different samples of white noise, no matter how close, are uncorrelated.

Performance evaluation criteria of a coherent communications system in the presence of noise

In practice we observe that modulated signals, regardless of type, are disturbed by noise and imperfect channel characteristics during transmission. In general, noise is any unknown signal that disturbs the detection of the desired signal. There can be many sources of noise in a communication system, but often the main sources are the communication devices themselves, or the interference encountered during transmission. Noise can change the desired signal in a number of ways, but the most common is additive distortion. That is, the received signal r(t) is modeled as

No alt text provided for this image

where s(t) is the random signal representing the sample function of the random process associated with the actual information we want to transmit; w(t) is still the representation of the sample function associated with the random process that generates the additive noise. Equation [30] is representative of a transmission channel model known as Additive White Gaussian Noise (AWGN) channel, where all the statistical characteristics discussed so far regarding Gaussian variables and random processes are included. Therefore, given that random signals are involved, and randomness being an intrinsic characteristic of both the information transmitted and the noise, how is it possible to quantify the performance of a particular communication system? The Signal-to-Noise-Ratio (SNR) is a quality measure applicable to both analog and digital systems. For the latter, which we are dealing with in this series of articles, the SNR alone is not sufficient to evaluate the performance but, due to the binary nature of digital information, it is necessary to measure the quality in terms of probability of error on the bit: thence, a second evaluation tool is needed and it is precisely the Bit Error Rate (BER).

Signal-to-Noise-Ratio (SNR)

A receiver is made up of many stages placed in cascade. For bandpass systems a common stage is a narrowband filter whose bandwidth is wide enough to pass the modulated signal so that it is not distorted, but not so wide that excessive noise passes through. Now that we know the characteristics of the noise, we need to determine how it modifies the received signal. To do this, we need to calculate the noise power and this requires us to measure the noise within a specified band. At which point to make the measurement? Without going into analytical detail, it is shown that the noise power output N from a filter with Bt bandwidth (called equivalent noise band) is

No alt text provided for this image

Clearly, the smaller the Bt band the smaller the noise power N will be. By linking this result to equation [30], it seems intuitive to think that we should choose Bt as small as possible to minimize noise, but it shouldn't be smaller than the signal band s(t), otherwise we would distort the desired signal. The following figure shows just this situation, where the transmitted signal is distorted by the additive white noise and their combination passes through a filter that has a Bt bandwidth.

No alt text provided for this image

If the filter band is larger than that of the signal, we conserve all the energy of the desired signal. However, if the filter is not wider than required to pass the undistorted signal, we will minimize the amount of outgoing noise. Consequently, the Bt band is nothing but the signal transmission band. Indeed, the correspondence between the filter being received and the band of the transmitted signal is the basis of many optimal detection schemes. We can therefore represent the signal downstream of the initial filtering as follows:

No alt text provided for this image

where n(t) is narrowband noise, unlike w(t) which is assumed to be white. We have seen that the average value of the noise corresponds to a continuous component. In most communication systems, continuous components are eliminated since the design phase, as they require power and carry little information. Consequently, it is generally assumed that both the noise and the signal have zero average. For zero-mean random processes a simple measure of signal quality is the ratio of the variance [9] of the desired signal to that of the unwanted signal. Based on this, the Signal-to-Noise-Ratio (SNR) is formally defined by

No alt text provided for this image

By this time, we have learned that the square of the signal is generally proportional to the power. Consequently, the SNR is often the ratio of the average signal strength to the average noise power. Equivalently, it can be considered a ratio between the average energy of the signal per unit of time and the average energy of the noise per unit of time; and the latter interpretation is more common for digital communication systems. SNR can be usefully measured at two points on the receiver:

  • at the input of the first stage of the receiver to evaluate the quality of the transmission channel and of the receiver: in this case we are talking about SNR before demodulation;
  • at the receiver output to evaluate the quality of the information signal recovered; in this case we speak of SNR after demodulation.

In the first case, however, it is necessary to adopt an ideal model of the receiver and, in order to compare different mo-demodulation schemes, it is also necessary to introduce the idea of a reference transmission model that includes the following two hypotheses:

  1. the signal strength is the same as that of the modulated signal;
  2. the low-pass filter in the base band lets the signal pass and cuts the noise out of the band. Consequently, we can define a reference SNR as follows:

No alt text provided for this image

Probability of error and Bit Error Rate (BER)

In digital communications, an error occurs whenever a transmitted bit and the corresponding received bit do not match; this is a random process, as we discussed earlier. If we denote by n the number of errors observed in a sequence of bits of length N, then the definition of relative frequency of Bit Error Rate (BER), according to equation [1], is

No alt text provided for this image

Of course, the probability of error required by a numerical communication system depends on the application. In the case of optical systems, among a group of optical receivers, a receiver is said to be more sensitive if it achieves the same performance with less optical power incident on it. A commonly used criterion for digital optical receivers requires the BER to be below 1 x 10^-9, that corresponds to on average 1 error per billion bits. Now, for digital transmission systems, usually the quality is not a linear function of the SNR defined in [33]; however, the equivalent of a reference SNR can always be defined as the ratio between the energy per bit of information and the spectral density of unilateral noise:

No alt text provided for this image

Equation [36] defines a reference SNR independent of the transmission rate; being a ratio of energies, it is normalized to the bitrate. Finally it provides a reference for a fair comparison between different mo-demodulation schemes. However, for notational simplicity, we will always refer to the expression [36] as SNR simply; to be even more precise, the definition [36] refers to the SNR-per-bit. If we introduce Es as the energy associated with the symbol rather than the bitrate, we can define an SNR-per-symbol as the ratio Es/N0. For uncoded M-ary modulation scheme with k = log2(M) bits per symbol, the signal energy per modulated symbol is given by Es = kEb. So ultimately:

No alt text provided for this image

However, in purely practical applications, consider that SNR is measured in dB. This differentiation in definitions, which might appear redundant, is actually very useful as a quality tool for multilevel mo-demodulation schemes where, as known since Part I of this discussion, it makes more sense to talk about symbols (phase states of the carrier signal) rather than of bits. In fact, one should more correctly speak of probability of error on the symbol, or even of Symbol Error Rate (SER). If then, simplifying, we denote with Ps the SER, with γb the SNR-per-bit [36], with γs the SNR-per-symbol [37], and recalling the expression of the Q-function [12], it can be shown that, for the modulations of our interest (i.e., BPSK, QPSK , M-PSK and M-QAM), the probability of error on the symbol assumes the following expressions:

No alt text provided for this image

Performance simulation of a coherent M-PSK communication system affected by Gaussian noise

This paragraph shows the simulation of a communication system based on the M-PSK modulation scheme. The aim is to apply the tools discussed so far to simulate the quality of a digital telecommunications system, from transmitter to receiver. This simulation is mainly based on four scripts that I have developed using Python code.

The first module is called transceivers.py and implements the logic of a complex equivalent baseband M-PSK modulator/coherent demodulator; as already known from Part I, in PSK, all the information gets encoded in the phase of the carrier signal. The M-PSK modulator transmits a sequence of information symbols drawn from the set m ∈{0, 1, …, M - 1}. Each symbol holds k bits of information (k = log2(M)). The information symbols are then modulated using M-PSK mapping. The general expression for generating the M-PSK signal set is given by

No alt text provided for this image

Here, M refers to the modulation order and it defines the number of points in the constellation diagram. The value of M depends on the parameter k, i.e., the number of bits we desire to squeeze in a single M-PSK symbol. As an example, if we wish to squeeze in 3 bits (k = 3) in one transmit symbol, then M = 2^k = 2^3 = 8 and this results in 8-PSK configuration. M = 2 gives BPSK scheme; M = 4 is referred as QPSK. The parameter A is the amplitude scaling factor. Using trigonometric identity, equation [42] can be separated into cosine and sine basis functions as follows

No alt text provided for this image

This can be expressed as a combination of in-phase and quadrature phase components on an I-Q plane as

No alt text provided for this image

The amplitude is normalized by dividing it by the root of two. The coherent M-PSK IQ detection technique is based on minimum Euclidean distance metric where a vector simulation model is leveraged. Specifically, the transmitter and receiver agree on the same reference constellation for modulating and demodulating the information. The implemented modulator, incorporate the code to generate the reference constellation for the selected modulation type. The same reference constellation should be used in order to achieve the coherent detection of the received data vector.

# transceivers.py

import numpy as np
from scipy.spatial.distance import cdist

class Transceiver:

    def __init__(self, M_points, ref_const, transceiver_type):
        self.M_points = M_points
        self.transceiver_type = transceiver_type
        self.ref_const = ref_const

    def modulation(self, in_symbols):
        mod_signal = self.ref_const[in_symbols]
        return mod_signal

    def demodulation(self, rec_symbols):
        demod_symbols = self.quadrature_detection(rec_symbols)
        return demod_symbols

    def quadrature_detection(self, rec_symbols):
        x_a = np.column_stack((np.real(rec_symbols), np.imag(rec_symbols)))
        x_b = np.column_stack((np.real(self.ref_const), np.imag(self.ref_const)))
        eucl_dist = cdist(x_a, x_b, metric = 'euclidean')
        demod_symbols = np.argmin(eucl_dist, axis = 1)
        return demod_symbols

class PSKTransceiver(Transceiver):
    def __init__(self, M):
        m = np.arange(0, M)
        inphase_comp = 1 / np.sqrt(2) * np.cos(m / M * 2 * np.pi)
        quadr_comp = 1 / np.sqrt(2) * np.sin(m / M * 2 * np.pi)
        constellation = inphase_comp + 1j * quadr_comp
        Transceiver.__init__(self, M, constellation, transceiver_type = 'PSK')

class QAMTransceiver(Transceiver):
    #Placeholder for QAM implementation#
    pass        

The second module called noisy_chan_models.py implements exactly the AWGN channel model discussed above.

# noisy_chan_models.py

from numpy import sum, sqrt
from numpy.random import standard_normal

def awgn_model(s, SNRps_dB, L = 1):
    gamma_s = 10 ** (SNRps_dB / 10)
    P = L * sum(abs(s) ** 2) / len(s)
    N0 = P / gamma_s
    n = sqrt(N0 / 2) * (standard_normal(s.shape) + 1j * standard_normal(s.shape))
    r = s + n
    return r        

The third module called prob_error_compute.py implements the logic for the probability of error on the symbol computation, compatibly with the relations [38] - [40] tabulated above.

# prob_error_compute.py

from numpy import log2, sqrt, sin, pi, exp
from scipy.special import erfc

def ser_error_computation(SNRpb_dBs, mod_scheme=None, M_levels=0):
    func_dict = {'psk': psk_error_computation, 'qam': qam_error_computation}
    gamma_s = log2(M_levels) * (10 ** (SNRpb_dBs / 10))
    return func_dict[mod_scheme.lower()](M_levels, gamma_s)  # call appropriate function

def psk_error_computation(M_levels, gamma_s):
    gamma_b = gamma_s / log2(M_levels)

    if (M_levels == 2):
        Ps = 0.5 * erfc(sqrt(gamma_b))
    elif M_levels == 4:
        Q_func = 0.5 * erfc(sqrt(gamma_b))
        Ps = 2 * Q_func - Q_func ** 2
    else:
        Ps = erfc(sqrt(gamma_s) * sin(pi / M_levels))

    return Ps

def qam_error_computation(M, gamma_s):
    # Placeholder for QAM implementation#
    pass        

The fourth and last module called system_performance_eval.py, imports the previous three ones, and then processes and graphically displays the trend of the probability of error per symbol Ps compared to the SNR (dB) value for different M-PSK formats, as M varies.

# system_performance_eval.py

import numpy as np
import matplotlib.pyplot as plt
from transceivers import PSKTransceiver, QAMTransceiver
from noisy_chan_models import awgn_model
from prob_error_compute import ser_error_computation

N_symbols = 10 ** 6
SNRpb_dBs = np.arange(start = -4, stop = 12, step = 2)
mod_scheme = 'PSK'
M_levels = [2, 4, 8, 16, 32]
transceivers_dict = {'psk': PSKTransceiver, 'qam': QAMTransceiver}
palette = plt.cm.jet(np.linspace(0, 1, len(M_levels)))  # colormap
figure, axis = plt.subplots(nrows=1, ncols=1)

for i, M in enumerate(M_levels):
    bits_ps = np.log2(M)
    SNRps_dBs = 10 * np.log10(bits_ps) + SNRpb_dBs
    sim_Ps = np.zeros(len(SNRpb_dBs))
    in_symbols = np.random.randint(low = 0, high = M, size = N_symbols)
    transceiver = transceivers_dict[mod_scheme.lower()](M)
    modulatedSyms = transceiver.modulation(in_symbols)

    for j, SNRps_dB in enumerate(SNRps_dBs):
        rec_symbols = awgn_model(modulatedSyms, SNRps_dB)
        demod_symbols = transceiver.demodulation(rec_symbols)
        sim_Ps[j] = np.sum(demod_symbols != in_symbols) / N_symbols

    theor_Ps = ser_error_computation(SNRpb_dBs, mod_scheme, M)
    axis.semilogy(SNRpb_dBs, sim_Ps, color=palette[i], marker='o', linestyle='',
                  label='Simulated ' + str(M) + '-' + mod_scheme.upper())
    axis.semilogy(SNRpb_dBs, theor_Ps, color=palette[i], linestyle='-',
                  label='Theoretic ' + str(M) + '-' + mod_scheme.upper())

axis.set_xlabel('SNR(dB)')
axis.set_ylabel('$P_s$')
axis.set_title('Probability of symbol error for M-' + str(mod_scheme) + ' over AWGN channel')
axis.legend()
plt.grid(True)
plt.show()        
No alt text provided for this image

Focus on physical noise mechanisms in coherent optical receivers

In both Part I and Part II, hints have been made of the main sources of noise afflicting a digital optical communications system. Now that we have explained all the statistical tools, it is possible to carry out a complete analysis, albeit not excessively in-depth, on the sources and mechanisms of noise, and on their effects on coherent optical receivers.

Shot noise

Shot noise is a manifestation of the fact that an electric current is actually a stream of electrons that are generated randomly over the time. When a constant optical signal with a certain power is incident on the photodetector, the resulting current takes the following form:

No alt text provided for this image

where Ip is the well-known average photocurrent, directly proportional to the incident power Pin through the responsivity Rd, and is(t) the well-known random current term that takes into account fluctuations due to the shot noise, as mentioned several times. Statistically, is(t) has a Poisson distribution, but can be roughly modeled as Gaussian noise, so we can calculate the autocorrelation function once the power spectral density Ss(f) is known, precisely by effectively applying the Weiner-Khintchine relations [26] and [27]:

No alt text provided for this image

The two-sided (i.e., negative frequencies included) power spectral density of shot noise is a constant value given by the product of the electron charge, q, and the photocurrent Ip. By setting τ = 0 in [46], we obtain the variance of the shot noise:

No alt text provided for this image

where Δf is the effective noise bandwidth that corresponds to the intrinsic photodetector bandwidth if photocurrent fluctuations are measured, and its actual value depends on receiver design. Since the dark current Id also generates shot noise, the total shot noise variance is then given by

No alt text provided for this image

According to definition [9], therefore σs is the RMS value of the noise current induced by shot noise.

Thermal noise

The front end of a receiver consists of a photodiode followed by a preamplifier in order to amplify the electrical signal, transduced by the photodiode indeed, for further processing. There are various front-end circuit configurations for optical receivers; the option that provides improved characteristics is called transimpedance configuration, and its equivalent circuit is depicted below:

No alt text provided for this image

In particular, the photodiode is modeled by a current generator with constant value Ip; Cin is the equivalent capacitance which takes into account the capacitive contributions of both the photodiode and the input amplification stage; Rl is the resistive load on which, as the photocurrent passes, are taken the variations in the voltage of the signal to be processed subsequently, suitably amplified by a factor G (the gain of the amplifier, in fact). Thus, at a finite temperature, random thermal motion of electrons in a resistor manifests as a fluctuating current even in the absence of an applied voltage. It is then Rl that adds random fluctuations to the current supplied by the photodiode. If we denote by it(t) the current fluctuations due to thermal noise, the resulting current takes the following form:

No alt text provided for this image

Statistically, it(t) is modeled as a stationary Gaussian random process, as discussed earlier, with two-sided power spectral density that is frequency independent up to 1 THz (nearly a white noise) and is given by

No alt text provided for this image

where KB is the Boltzmann constant and T the absolute temperature. Applying once again the Weiner-Khintchine relations [26] and [27] for the calculation of autocorrelation, we obtain the following formula for the variance of thermal noise:

No alt text provided for this image

where the quantity Fn, called amplifier noise figure, includes the thermal effects of the resistive components present in the subsequent preamplification and amplification stages within the optical receiver circuitry. It can therefore be said that Fn represents a further enrichment factor for thermal noise.

Since the current fluctuations is(t) and it(t) are statistically independent Gaussian random variables, as previously analyzed on this topic, the total variance of current fluctuations can be obtained simply by adding individual variances. The final result is equal to:

No alt text provided for this image

The equation [52] is fundamental because it allows us to calculate the SNR of the photocurrent, in order to evaluate and compare the quality performances of different coherent optical receivers for different mo/demodulation schemes (and other parameters, as we will see in future publications).

SNR and BER considerations for heterodyne detectors

The responsivity Rd can be conveniently expressed in terms of another parameter, η, called quantum efficiency, defined as follows:

No alt text provided for this image

where h is the Planck constant and ν the frequency of the incoming optical signal with power Pin. We also recall the equation of the photocurrent, calculated in Part II for the heterodyne detector, shown below for convenience:

No alt text provided for this image

The shot-noise variance [48] can then be rewritten by including the photocurrent [54], simply indicated with I:

No alt text provided for this image

As already established, in practice, Plo >> Pin, and the current I in both equations [51] and [55] can be replaced by the dominant term Rd Plo. The SNR is obtained by dividing the average signal power by the average noise power. In the heterodyne case, the SNR is the ratio between the alternate component of the equation [54] and the total noise variance [52] modified under the Plo >> Pin condition:

No alt text provided for this image

where Pin is marked to represent its mean value. We can immediately underline one of the most important advantages in the use of coherent revelation, anticipated several times in the previous parts but without any analytical demonstration; i.e., since the local-oscillator power Plo can be controlled at the receiver, it can be made large enough, so that the receiver noise is dominated by shot noise. Precisely, this occurs when the following inequality is verified:

No alt text provided for this image

Under the same condition, the dark-current effect on the shot noise is also negligible, i.e. Id << Rd Plo. The SNR can be then approximated as follows:

No alt text provided for this image

where we applied the definition of quantum efficiency given by [53]. It is convenient to link the SNR expression to the number of photons, Np, received in a single (time of) bit. If we fix a value B for the bitrate, the average power of the incident optical signal can be expressed as follows:

No alt text provided for this image

Since, typically, the effective noise bandwidth is approximately equal to B/2, the SNR for a heterodyne receiver can be finally simplified as per below:

No alt text provided for this image

When we introduced the concept of BER through the [35], we also introduced the concept of sensitivity for an optical receiver. Actually, the logical link exists: if it is true that, for optical communications, an acceptable BER must remain below 1 x 10 ^-9, the sensitivity is exactly that minimum received average power, Prec, required for the receiver to perform the aforementioned BER. During this discussion, we also saw how the SER could be used in place of the BER for a coherent receiver; as the two are related, we continue to use the BER even for a coherent receiver. In order to determine the BER, let's start considering the following figure:

No alt text provided for this image

It schematizes the fluctuating current signal received by the decision circuit that is integrated inside the clock recovery sub-system within the optical receiver; it samples the signal at a specific decision instant td. Thus there is a sample I of the current that fluctuates continuously from bit to bit around a "High" mean value, IH, and a "Low" mean value, IL, mapped to levels 1 and 0 of the incoming bitstream, respectively. The decision block, which includes at least a threshold comparator, confronts indeed the sampled value with a threshold value ID retrieving the information level 1 if I > ID, or 0 if I < ID. The presence of noise, superimposed on the information signal, is exactly the cause of error during the recovering phase; indeed, an error occurs if I < ID for bit 1 as well as if I > ID for bit 0. By applying the definitions of joint and conditional probability, [4] and [5], we can calculate the BER as follows:

No alt text provided for this image

where P[1] and P[0] are the probabilities of receiving actually bits 1 and 0, respectively, P[0|1] is the probability of recover 0 when 1 is received, and P[1|0] the probability of deciding 1 when the actual value is 0. Since the information is binary, there is equiprobability in the transmission of bit 1 and bit 0, and therefore P[1] = P[0] = 1/2, thence the [61] can be rewritten as follows:

No alt text provided for this image

Observing the previous figure, it is evident that the conditional probabilities are related to the PDF of the sampled current value I. As we have discussed extensively, the shot and thermal noise variances can be approximated by Gaussian PDFs, and thus the total noise variance [52] is still a Gaussian PDF. By associating the variances σH and σL respectively to bits 1 and 0, and applying the relations [10] - [15] valid for the Gaussian random variables, we obtain:

No alt text provided for this image

By substituting equations [63] and [64] in the [62] one, the BER is given by

No alt text provided for this image

The equation [65] is pivotal because it shows the dependence of the BER on the current threshold value, ID, set in the decision block of the receiver. And then it is possible to design the electronics in such a way it is possible to minimize the BER (that is, to reduce the error bits on the total of those received with respect to the formal definition [35]). The minimum value determinable by equation [65] is obtained, as known from calculus, by calculating the first derivative and setting it equal to zero; the final formula is:

No alt text provided for this image

Since the logarithmic addend is negligible for the case of our interest, [66] can be rewritten so as to formally link it to the definition of the Q-function given by [12]. Therefore:

No alt text provided for this image

This leads to the expression of the ID current-threshold:

No alt text provided for this image

The BER with the optimum setting of the decision threshold is obtained by using equations [65] and [67] and depends only on the Q-function as follows:

No alt text provided for this image

which is obviously formally analogous to equation [15] (presented as a function of SER in formulas [38] - [41]), but now obtained by reasoning exactly on the influence of noise mechanisms on the reconstruction of the correct bit level. The Q-function which appears in [69] takes the following form:

No alt text provided for this image

The approximate form of BER is obtained by using the asymptotic expansion of the complementary error function, and is sufficiently accurate when Q takes values greater than 3. The figure below graphics the BER as a function of Q:

No alt text provided for this image

The trend is obviously similar to that obtained previously, for the Q function, using a Python script. The BER improves as Q increases and becomes lower than 10^-12 for Q > 7. The receiver sensitivity corresponds to the average optical power for which Q << 6, since BER << 10^-9 when Q = 6.

Finally, we can relate the BER with the SNR and the average number of photons Np associated with bit "1". In the thermal-noise limit, variances σH and σL are practically equal; thence, by setting IL = 0, the equation [70] provides Q = IH/2σH, and in accordance with the definition of SNR that we have provided above, we obtain:

No alt text provided for this image

Since Q = 6 for a BER of 10^-9, as said, it follows that SNR must be at least:

No alt text provided for this image

In the shot-noise limit σL is practically zero; in fact, since there is no thermal noise contribution, the effect of the shot noise associated with bit "0" can be neglected as long as the contribution of the photodetector dark-current can also be neglected. Under this condition we have that Q = IH/σH; to always have a BER lower than 10^-9 the following SNR is required:

No alt text provided for this image

Still under the conditions of predominance of the shot noise, it can be shown that:

No alt text provided for this image

which is substantially the result already obtained in [60] with the exception of the factor 2. The Q function can therefore be expressed as follows:

No alt text provided for this image

By substituting [75] in equation [69] we have:

No alt text provided for this image

For a receiver with 100% quantum efficiency (η = 1), BER = 1 x 10-9 when Np = 36. In practical application (IM-DD detectors included), most optical receivers require Np ~ 1000 in order to achieve a BER of 10^-9, as their performance is severely limited by thermal noise. In the case of coherent receivers, values of Np < 100 have been realized because shot noise can be made to dominate over thermal noise by increasing the local-oscillator power, as demonstrated above.

Synchronous heterodyne detectors

In Part II we determined the following equation of the average output current from the low-pass filter, in the case of PSK modulation scheme:

No alt text provided for this image

Ip is constant and the phase φ takes values 0 or ?? depending on whether a 0 or 1 bit is being transmitted. Id is a Gaussian random variable and the average value is either Ip/2 or ?Ip/2, depending on the received bit; furthermore, IL = ?IH. Thence, by applying equation [69], for the PSK format we obtain the following BER:

No alt text provided for this image

Obviously, the BER can also be calculated for multilevel PSK and QAM modulation schemes; the performance analysis for such formats is long and laborious (for the curious, the bibliographic reference is [III] but there are also various sources available online), so we report only the final results. The BER for the M-level-PSK format is approximately given by the following equation [III]:

No alt text provided for this image

with k = log2(M) bits per symbol, as known. In the QPSK case (M = 4), this result is identical to the expression in [78] obtained for the binary PSK format. The case of M-level-QAM format has also been analyzed and leads to the following BER expression [III]:

No alt text provided for this image

Logically, the forms obtained in [79] and [80] are analogous to that already presented in formulas [41] and [42], with the exception of the naming convention and the fact that, in the latter, there is a direct dependence between SER and function Q rather than the former one, where BER and complementary error function are related; but by now we have shown how all these parameters are analytically and conceptually connected to each other. The following figure shows the BER as a function of SNR for the BPSK, QPSK/4-PSK, and 16-QAM formats:

No alt text provided for this image

Let's observe how the trend is compatible with the one simulated previously using Python scripts.

Asynchronous heterodyne detectors

The BER analysis for asynchronous receivers is more complicated because noise does not remain Gaussian when an envelope detector is used, as explained in Part II. In particular (to learn more, refer to [IV] and [V]), the computation of BER includes the use of modified Bessel functions of the first kind, of Rice distributions, and of Marcum Q-function. Recalling that, for the asynchronous heterodyne receiver, the use of differential modulation allows us to adopt a delay-detection scheme in the electrical domain (microwaves), we show below only the final result of the BER for the DPSK scheme:

No alt text provided for this image

The required SNR for a BER of 10^?9 is around 13 dB.

Self-coherent detectors

In Part II we saw that one or more MZ interferometers with one-symbol delay are used within a self-coherent detector. It can be shown that for the DPSK scheme, the BER is the same as that shown in equation [81]. The calculation is much more involved in the case of the DQPSK format. Although the analysis in [III] is for a heterodyne receiver with delay implemented in the microwave domain, the results apply as well to the case of optical delay demodulation. In particular, when the Gray coding is implemented, the BER is given by

No alt text provided for this image

where Q1(a,b) is the above-mentioned Marcum Q-function defined as follows:

No alt text provided for this image

being IL always the Ip current value associated with bit 0. The values of a and b take the following expressions instead:

No alt text provided for this image

The following plot graphics the BER curves for the DPSK and DQPSK formats:

No alt text provided for this image

Dotted curve shows, for comparison purpose, the BER trend when a heterodyne receiver is used in order to detect the QPSK signal. When DPSK is employed in place of BPSK, the receiver sensitivity decreases by less than 0.5 dB so that, taking into account this small penalty, DPSK is sometimes used in place of BPSK because its use simplifies the receiver design consistently. Anyhow, a penalty value of around 2.4 dB occurs if DQPSK modulation scheme is implemented in place of the QPSK format. Frankly speaking, the BER expression for DQPSK given by equation [82] is quite complicated so that, especially for actual application purposes, the following simpler approximate form can be used:

No alt text provided for this image

Consider that [85] is accurate to within 1% for BER values smaller than 3 × 10^?2.

References

[I] - G. P. Agrawal,?Fiber-Optic Communication Systems, 5th Edition, Copyright ? 2021 by John Wiley & Sons, Inc. All rights reserved.

[II] - S. Haykin and M. Moher,?Introduction to Analog and Digital Communications, Second Edition, Copyright ? 2007 John Wiley & Sons. All rights reserved.

[III] - J. G. Proakis and M. Salehi, Digital Communications, 5th ed., McGraw Hill, 2008.

[IV] - J. W. Goodman, Statistical Optics, Wiley, 1985.

[V] - S. O. Rice, Bell Syst. Tech. J. 23, 282 (1944); 24, 96 (1945).

要查看或添加评论,请登录

Claudio Di Girolamo的更多文章

社区洞察

其他会员也浏览了