From Baby to Teenage Kyber

From Baby to Teenage Kyber

Quantum super computers will be soon available. These computers will be able to crack password hashes (Grover's algorithm) and break the RSA encryption (Shor's algorithm).

A couple of months ago, IBM announced a major advancement in quantum hardware and software. At the moment they are able to provide 400-qubit systems. These quantum processors do not have enough power to crack password hashes or factor large numbers such are the ones used in RSA, yet! But, they will, soon!

At this stage, embracing the post-quantum cryptography is crucial. In this short period the world shall adapt and upgrade the systems in use. The cryptographers and scientist believe that the proposed algorithms are quantum attack resistant. Recently, NIST has accepted and announced the first Quantum-Resistant Cryptographic algorithms.

One of the algorithms is Kyber which is a general Encryption and Key Exchange algorithm. Some companies and cloud platforms like Amazon Web Services support Kyber already in some of their systems.

How Kyber works?

Kyber relies on Lattice Based Cryptography, Module Learning With Errors - MLWE to be more specific. The theory behind MLWE is not trivial and reading directly all of the accompanying specifications and scientific papers requires deeper understanding in mathematics. At the same time, it is nice if we can understand how algorithm works without going into too much details.

Can we start with baby steps?

Yes we can. Here is a very nice, short and concise explanation how Kyber works. Enter the Baby Kyber!

In essence Kyber works with multiplication and addition of polynomials that belong to Polynomial Ring expressed as R_q = Z_q[X] / (X^n+1). Each polynomial in this ring is represented as:

a[0] + a[1]*X + a[2]*X^2 + a[3]*X^3 + . . . + a[n-1]*X^(n-1)        

The maximum degree of these polynomials is n-1. Each of the coefficients a[i] is a number in the interval [0, q-1].

Baby Kyber explains the algorithm using:

q = 17

n = 4

Simply speaking, the encryption part of the algorithm:

  • Takes the message m and converts it into polynomial
  • Multiplies and add polynomials (operations with the public key A, t)
  • Adds the message m to one of the resulting polynomials (u and v)

The decryption part of the algorithm:

  • Multiplies and subtract polynomials (operations with the secret key s)
  • The resulting polynomial represents the message m because the remaining error is not big enough to convert 0s to 1s or 1s to 0s.

Key Generation

The first step is secret key generation. The secret key is a vector of k (in our case k=2) R_q polynomials. This is the secret key vector s.

Next, we shall generate k x k matrix of R_q polynomials. This is the matrix A.

Next, generate the vector e of k R_q polynomials and calculate:

t = A * s + e

Here * represents matrix multiplication operator.

The resulting vector t is a vector of k R_q polynomials.

Finally we have the secret key s and the public key (A, t).

Encryption

Generate at random:

  • r as vector of k R_q polynomials
  • e_1 as vector of k R_q polynomials
  • e_2 as R_q polynomial

The message m is converted first into polynomial with binary coefficients and each coefficient is multiplied with floor(q/2).

Calculate u and v as:

u = transpose(A) * r + e_1

v = transpose(t) * r + e_2 + m

The cipher text is the pair (u,v).

Decryption

Calculate:

m_1 = v - transpose(s) * u

Let's see what we get with the operation above if we substitute u and v:

m_1 =

= transpose(t) * r + e_2 + m - transpose(s) * (transpose(A) * r + e_1) =

= transpose(t) * r + e_2 + m - transpose(s) * transpose(A) * r - transpose(s) * e_1 =

= transpose(t) * r + e_2 + m - transpose(A * s) * r - transpose(s) * e_1 =

= transpose(t) * r + e_2 + m - transpose(t - e) * r - transpose(s) * e_1 =

= transpose(t) * r + e_2 + m - transpose(t ) * r + transpose(e) * r - transpose(s) * e_1 =

= e_2 + transpose(e) * r - tranpose(s) * e_1 + m =

= error + m

The error is not big enough and it does not interfere with the final rounding process which is defined the following way for each decrypted coefficient:

  • If the coefficient is closer to q/2 instead of 0 or q then output 1
  • Otherwise output 0

Enough theory. Let's us check with Python code the numbers and polynomials presented in the Baby Kyber article. At least this could be a nice numpy exercise.

import numpy as np

def matrix_polymul(X, Y, f, q):
    xShape = X.shape
    yShape = Y.shape

    if len(xShape) == 2:
        X = np.expand_dims(X, axis=0)
        xShape = X.shape

    if len(yShape) == 2:
        Y = np.expand_dims(Y, axis=1)
        yShape = Y.shape

    assert (
        xShape[1] == yShape[0]
    ), f"Multiplication: Invalid matrix dimensions: X[{xShape}], Y[{yShape}]"

    result = []
    row = None
    s = None

    for i in range(xShape[0]):
        for j in range(yShape[1]):
            if j == 0:
                row = []
                result.append(row)
                s = None
            for k in range(xShape[1]):
                el = np.polymul(X[i][k], Y[k][j])
                el = np.polydiv(el, f)[1]
                el = np.remainder(el, q)
                if k == 0:
                    s = el
                else:
                    s = np.polyadd(s, el)
                    s = np.remainder(s, q)
                if k == xShape[1] - 1:
                    row.append(s)

    return np.squeeze(np.array(result)).astype(int)

def matrix_polyadd(X, Y, q):
    xShape = X.shape
    yShape = Y.shape

    while len(xShape) < 3:
        X = np.expand_dims(X, axis=0)
        xShape = X.shape

    while len(yShape) < 3:
        Y = np.expand_dims(Y, axis=0)
        yShape = Y.shape

    assert (
        xShape == yShape
    ), f"Addition: Invalid matrix dimensions: X[{xShape}], Y[{yShape}]"

    result = []
    row = None

    for i in range(xShape[0]):
        for j in range(xShape[1]):
            if j == 0:
                row = []
                result.append(row)
            el = np.polyadd(X[i][j], Y[i][j])
            el = np.remainder(el, q)
            row.append(el)

    return np.squeeze(np.array(result)).astype(int)

q = 17

f = np.array([1, 0, 0, 0, 1])

s = np.array([[-1, -1, 1, 0], [-1, 0, -1, 0]], dtype=int)

A = np.array(
    [[[6, 16, 16, 11], [9, 4, 6, 3]], [[5, 3, 10, 1], [6, 1, 9, 15]]], dtype=int
)

e = np.array([[0, 1, 0, 0], [0, 1, -1, 0]], dtype=int)

t = matrix_polyadd(matrix_polymul(A, s, f, q), e, q)

print(f"t:\n{t}")

m = np.array([1, 0, 1, 1])

m = m * q // 2

print(f"m:\n{m}")

r = np.array([[-1, 1, 0, 0], [1, 1, 0, -1]], dtype=int)

e1 = np.array([[0, 1, 1, 0], [0, 1, 0, 0]], dtype=int)

e2 = np.array([-1, -1, 0, 0])

u = matrix_polyadd(matrix_polymul(np.transpose(A, axes=(1, 0, 2)), r, f, q), e1, q)

v = matrix_polyadd(matrix_polyadd(matrix_polymul(t, r, f, q), e2, q), m, q)

print(f"u:\n{u}")

print(f"v:\n{v}")

m1 = matrix_polyadd(v, matrix_polymul(s, u, f, q) * (-1), q)

print(f"Decrypted message before rounding:\n{m1}")

m_decrypted = [1 if abs(x - q // 2) < min(x, q - 1 - x) else 0 for x in m1]

print(f"Decrypted message:\n{m_decrypted}")
        

The essence of the code are the functions matrix_polymul and matrix_polyadd. Note that, in order to have generic functions that multiply or add matrix with matrix or matrix with vector, at the beginning of the functions we expand dimensions if needed and at end we remove dimensions if needed (squeeze). This has nothing with the essence of the implementation, it is only convenience to reduce the number of the functions defined.

The most interesting part in the code is the multiplication of polynomials in R_q:

el = np.polymul(X[i][k], Y[k][j])
el = np.polydiv(el, f)[1]
el = np.remainder(el, q)        

Note: The numpy polynomial operations used in the code require representation where the coefficient for the largest degree comes first.

The execution output is:

t:
[[16 15? 0? 7]
?[10 12 11? 6]]
m:
[8 0 8 8]
u:
[[11 11 10? 3]
?[ 4? 4 13 11]]
v:
[ 7? 6? 8 15]
Decrypted message before rounding:
[ 7 14? 7? 5]
Decrypted message:
[1, 0, 1, 1]:        

The numbers (except [8,0,8,8] instead of [9,0,9,9]) match the numbers and polynomials presented in the Baby Kyber article.

So far, so good! Let's move slowly to the real Kyber specification and implementation. The specification along with the reference implementation in C can be downloaded here.

In the Kyber specification we have:

q = 3329

n = 256

Note that:

q = 13 * n + 1

In order to sample the coefficients for s, e, r, e_1 and e_2 we shall define what Centered Binomial Distribution - CBD is. For some parameter eta, it is defined as:

(a_1, a_2, a_3, ... , a_eta, b_1, b_2, b_3, ... , b_eta) <- {0,1} ^ (2*eta)

output = Sum(a_i - b_i)

where the Sum is from 1 to eta.

For example, for eta = 2 we get random number among the values [-2, -1, 0, 1, 2]. The probabilities of the numbers are not equal, i.e. Pr(0) > Pr (1) > Pr (2).

Here is a code in Python that generates an array of n coefficients where each coefficient comes from CBD(eta).

def _cbd(n, eta):
    i = 0
    while i < eta:
        p1 = np.random.randint(0, 2, n)
        if i == 0:
            p = p1
        else:
            p = p + p1
        i += 1
    return p

# centered binomial distribution
def cbd(n, eta):
    a = _cbd(n, eta)
    b = _cbd(n, eta)
    return a - b        

In Kyber512 (k=2):

  • eta_1 = 3 is used for generating polynomials s, e and r
  • eta_2 = 2 is used for generating polynomials e_1 and e_2

Next, Kyber uses Compress and Decompress functions. Compression discards the least significant bits in the polynomial coefficients (not that important bits). Decompress transforms the coefficients back into value similar, but not the same to the original coefficients. The operations are defined as:

Compress(x, q, d) = round(x * 2 ^ d / q) mod 2 ^ d

Decompress(x, q, d) = round(x * q / 2 ^ d) mod q

In Python:

def compress(x, q, d):
    q1 = 2**d
    x = np.round(q1 / q * x).astype(int)
    x = np.remainder(x, q1)
    return x

def decompress(x, q, d):
    q1 = 2**d
    x = np.round(q / q1 * x).astype(int)
    x = np.remainder(x, q)
    return x        

In Kyber512 (k=2):

  • d_u = 10 is used for compression/decompression of u
  • d_v = 4 is used for compression/decompression of v

So let's us see how the Teenage Kyber looks like.

import numpy as np

def generate_poly(n, q):
    p = np.random.randint(0, q, n)
    return p

def generate_poly_matrix(n, q, k):
    result = []
    row = None
    for i in range(k):
        for j in range(k):
            if j == 0:
                row = []
                result.append(row)
            row.append(generate_poly(n, q))
    return np.array(result, dtype=int)

def _cbd(n, eta):
    i = 0
    while i < eta:
        p1 = np.random.randint(0, 2, n)
        if i == 0:
            p = p1
        else:
            p = p + p1
        i += 1
    return p

# centered binomial distribution
def cbd(n, eta):
    a = _cbd(n, eta)
    b = _cbd(n, eta)
    return a - b

def cbd_vector(n, eta, k):
    result = []

    for i in range(k):
        result.append(cbd(n, eta))

    return np.squeeze(np.array(result, dtype=int))

def matrix_polymul(X, Y, f, q):
    xShape = X.shape
    yShape = Y.shape

    if len(xShape) == 2:
        X = np.expand_dims(X, axis=0)
        xShape = X.shape

    if len(yShape) == 2:
        Y = np.expand_dims(Y, axis=1)
        yShape = Y.shape

    assert (
        xShape[1] == yShape[0]
    ), f"Multiplication: Invalid matrix dimensions: X[{xShape}], Y[{yShape}]"

    result = []
    row = None
    s = None

    for i in range(xShape[0]):
        for j in range(yShape[1]):
            if j == 0:
                row = []
                result.append(row)
                s = None
            for k in range(xShape[1]):
                el = np.polymul(X[i][k], Y[k][j])
                el = np.polydiv(el, f)[1]
                el = np.remainder(el, q)
                if k == 0:
                    s = el
                else:
                    s = np.polyadd(s, el)
                    s = np.remainder(s, q)
                if k == xShape[1] - 1:
                    row.append(s)

    return np.squeeze(np.array(result)).astype(int)

def matrix_polyadd(X, Y, q):
    xShape = X.shape
    yShape = Y.shape

    while len(xShape) < 3:
        X = np.expand_dims(X, axis=0)
        xShape = X.shape

    while len(yShape) < 3:
        Y = np.expand_dims(Y, axis=0)
        yShape = Y.shape

    assert (
        xShape == yShape
    ), f"Addition: Invalid matrix dimensions: X[{xShape}], Y[{yShape}]"

    result = []
    row = None

    for i in range(xShape[0]):
        for j in range(xShape[1]):
            if j == 0:
                row = []
                result.append(row)
            el = np.polyadd(X[i][j], Y[i][j])
            el = np.remainder(el, q)
            row.append(el)

    return np.squeeze(np.array(result)).astype(int)

def compress(x, q, d):
    q1 = 2**d
    x = np.round(q1 / q * x).astype(int)
    x = np.remainder(x, q1)
    return x

def decompress(x, q, d):
    q1 = 2**d
    x = np.round(q / q1 * x).astype(int)
    x = np.remainder(x, q)
    return x

n = 256
q = 3329
k = 2

eta1 = 3
eta2 = 2

du = 10
dv = 4

f = np.zeros(n + 1)
f[0] = 1
f[n] = 1

s = cbd_vector(n, eta1, k)

print(f"s\n{s}")

A = generate_poly_matrix(n, q, k)

e = cbd_vector(n, eta1, k)

print(f"e:\n{e}")

t = matrix_polyadd(matrix_polymul(A, s, f, q), e, q)

print(f"t:\n{t}")

#Encryption

m = np.array(generate_poly(n, 2))

m_scaled = decompress(m, q, 1)

print(f"m_scaled:\n{m_scaled}")

r = cbd_vector(n, eta1, k)

e1 = cbd_vector(n, eta2, k)

e2 = cbd_vector(n, eta2, 1)

u = matrix_polyadd(matrix_polymul(np.transpose(A, axes=(1, 0, 2)), r, f, q), e1, q)

v = matrix_polyadd(matrix_polyadd(matrix_polymul(t, r, f, q), e2, q), m_scaled, q)

print(f"u:\n{u}")

print(f"v:\n{v}")

u = compress(u, q, du)

v = compress(v, q, dv)

print(f"u:\n{u}")

print(f"v:\n{v}")

# Decryption

u = decompress(u, q, du)

v = decompress(v, q, dv)

m1 = matrix_polyadd(v, matrix_polymul(s, u, f, q) * (-1), q)

print(f"Decrypted message before rounding:\n{m1}")

m_decrypted = compress(m1, q, 1)

print(f"Original message:\n{m}")

print(f"Decrypted message:\n{m_decrypted}")

print(np.array_equal(m, m_decrypted))
        

The above code generates polynomials in line with the CBD rules, relies on compress and decompress operations and generalises for Kyber512 (k=2), Kyber768 (k=3) and Kyber1024 (k=4) (the specification document contains the concrete k, eta_1, eta_2, d_u and d_v parameters for each Kyber version).

Is the Teenage Kyber above production ready? No, it is far from production ready code.

Consider the main ingredient in the algorithm:

f = np.zeros(n + 1)
f[0] = 1
f[n] = 1

el = np.polymul(X[i][k], Y[k][j])
el = np.polydiv(el, f)[1]
el = np.remainder(el, q)        

In order to multiply two polynomials we have to multiply each coefficient from the first polynomial with each coefficient from the second polynomial. For polynomials of order n, the number of multiplications is n * n = n ^ 2, i.e. the complexity is O(n ^ 2). Also we calculate remainders modulo q = 3329. Adding, multiplying and shifting numbers are efficient and cheap processor operations. Division and modulo operations with arbitrary numbers are expensive processor operations.

Can we do better?

Yes we can!

The Number-Theoretic Transform (NTT) comes to the rescue. With the help of NTT and it's inverse we can multiply two polynomials with complexity O(n*log(n)). Note that for n = 256 (Kyber's polynomials length), NTT will speed up the whole process 32 times. NTT is such crucial algorithm today. It is used in many blockchain algorithms as well.

Efficient modulo operations can be achieved using Barrett and Montgomery Reduction. Now we talk!

NTT, Barrett and Montgomery Reduction will be the topic of the next article in this two part series.

Marjan

great read ??

Aleksandar Polizovski

Software Engineer == Akta Manniskor

1 年

Hi Marjan, thank you for explaining the algorithms with the nice examples and your digested comments. Very insightful information. Thank you for sharing it.

Dimitar Bogatinov

Head of Cyber Security & Digital Forensics Department | Associate Professor at Military Academy 'General Mihailo Apostolski' - Skopje

1 年

Najverojatno poodamna se dostapni za government.

要查看或添加评论,请登录

Marjan Sterjev的更多文章

  • Secure Multi-Party Computation: Idea and Example

    Secure Multi-Party Computation: Idea and Example

    We are seeing massive AI adoption these days. At least everyone is talking about it.

    1 条评论
  • Theseus TryHackMe Room Writeup

    Theseus TryHackMe Room Writeup

    Theseus is an old room, almost 5 years old. Its difficulty has been specified as Insane.

  • Breaking weak Captcha: Tesseract OCR

    Breaking weak Captcha: Tesseract OCR

    Capture Returns TryHackMe room has been recently added. Sometimes CTF scenarios could be artificial.

  • Behaviour Analysis: Outlier Sequence Detection

    Behaviour Analysis: Outlier Sequence Detection

    Each production system deals with a lot of signals and sequence of events. Some of the sequences are unusual and…

    6 条评论
  • RSA Optimizations: Think Twice

    RSA Optimizations: Think Twice

    I will start here straight ahead with the summary: Resist your temptations and do not implement your customised and…

    2 条评论
  • RSA & Chinese Remainder Theorem MEGA Exploits

    RSA & Chinese Remainder Theorem MEGA Exploits

    There is a lot of data we are dealing with today. Most of us have multiple digital devices.

    4 条评论
  • A word or two about the parallelisation

    A word or two about the parallelisation

    This article will be a short one. It talks about parallelisation and efficiency, so it shall waste as less as possible…

  • Covert Tcp - Scapy Version

    Covert Tcp - Scapy Version

    Covert Tcp is one of the tools for covert communication mentioned in the white hacking courses. Instead of sending…

  • Network Traffic Anomaly Detection

    Network Traffic Anomaly Detection

    It is well known today that anomaly detection helps us spot interesting data, events, malicious behaviour, potential…

  • Kyber: NTT and Efficient Polynomial Multiplication

    Kyber: NTT and Efficient Polynomial Multiplication

    In my previous article From Baby to Teenage Kyber I've explained how Kyber encryption/decryption works. As it was…

    4 条评论

社区洞察

其他会员也浏览了