From Baby to Teenage Kyber
Marjan Sterjev
IT Engineer | CISSP | CCSP | CEH (Master): research | learn | do | MENTOR
Quantum super computers will be soon available. These computers will be able to crack password hashes (Grover's algorithm) and break the RSA encryption (Shor's algorithm).
A couple of months ago, IBM announced a major advancement in quantum hardware and software. At the moment they are able to provide 400-qubit systems. These quantum processors do not have enough power to crack password hashes or factor large numbers such are the ones used in RSA, yet! But, they will, soon!
At this stage, embracing the post-quantum cryptography is crucial. In this short period the world shall adapt and upgrade the systems in use. The cryptographers and scientist believe that the proposed algorithms are quantum attack resistant. Recently, NIST has accepted and announced the first Quantum-Resistant Cryptographic algorithms.
One of the algorithms is Kyber which is a general Encryption and Key Exchange algorithm. Some companies and cloud platforms like Amazon Web Services support Kyber already in some of their systems.
How Kyber works?
Kyber relies on Lattice Based Cryptography, Module Learning With Errors - MLWE to be more specific. The theory behind MLWE is not trivial and reading directly all of the accompanying specifications and scientific papers requires deeper understanding in mathematics. At the same time, it is nice if we can understand how algorithm works without going into too much details.
Can we start with baby steps?
Yes we can. Here is a very nice, short and concise explanation how Kyber works. Enter the Baby Kyber!
In essence Kyber works with multiplication and addition of polynomials that belong to Polynomial Ring expressed as R_q = Z_q[X] / (X^n+1). Each polynomial in this ring is represented as:
a[0] + a[1]*X + a[2]*X^2 + a[3]*X^3 + . . . + a[n-1]*X^(n-1)
The maximum degree of these polynomials is n-1. Each of the coefficients a[i] is a number in the interval [0, q-1].
Baby Kyber explains the algorithm using:
q = 17
n = 4
Simply speaking, the encryption part of the algorithm:
The decryption part of the algorithm:
Key Generation
The first step is secret key generation. The secret key is a vector of k (in our case k=2) R_q polynomials. This is the secret key vector s.
Next, we shall generate k x k matrix of R_q polynomials. This is the matrix A.
Next, generate the vector e of k R_q polynomials and calculate:
t = A * s + e
Here * represents matrix multiplication operator.
The resulting vector t is a vector of k R_q polynomials.
Finally we have the secret key s and the public key (A, t).
Encryption
Generate at random:
The message m is converted first into polynomial with binary coefficients and each coefficient is multiplied with floor(q/2).
Calculate u and v as:
u = transpose(A) * r + e_1
v = transpose(t) * r + e_2 + m
The cipher text is the pair (u,v).
Decryption
Calculate:
m_1 = v - transpose(s) * u
Let's see what we get with the operation above if we substitute u and v:
m_1 =
= transpose(t) * r + e_2 + m - transpose(s) * (transpose(A) * r + e_1) =
= transpose(t) * r + e_2 + m - transpose(s) * transpose(A) * r - transpose(s) * e_1 =
= transpose(t) * r + e_2 + m - transpose(A * s) * r - transpose(s) * e_1 =
= transpose(t) * r + e_2 + m - transpose(t - e) * r - transpose(s) * e_1 =
= transpose(t) * r + e_2 + m - transpose(t ) * r + transpose(e) * r - transpose(s) * e_1 =
= e_2 + transpose(e) * r - tranpose(s) * e_1 + m =
= error + m
领英推荐
The error is not big enough and it does not interfere with the final rounding process which is defined the following way for each decrypted coefficient:
Enough theory. Let's us check with Python code the numbers and polynomials presented in the Baby Kyber article. At least this could be a nice numpy exercise.
import numpy as np
def matrix_polymul(X, Y, f, q):
xShape = X.shape
yShape = Y.shape
if len(xShape) == 2:
X = np.expand_dims(X, axis=0)
xShape = X.shape
if len(yShape) == 2:
Y = np.expand_dims(Y, axis=1)
yShape = Y.shape
assert (
xShape[1] == yShape[0]
), f"Multiplication: Invalid matrix dimensions: X[{xShape}], Y[{yShape}]"
result = []
row = None
s = None
for i in range(xShape[0]):
for j in range(yShape[1]):
if j == 0:
row = []
result.append(row)
s = None
for k in range(xShape[1]):
el = np.polymul(X[i][k], Y[k][j])
el = np.polydiv(el, f)[1]
el = np.remainder(el, q)
if k == 0:
s = el
else:
s = np.polyadd(s, el)
s = np.remainder(s, q)
if k == xShape[1] - 1:
row.append(s)
return np.squeeze(np.array(result)).astype(int)
def matrix_polyadd(X, Y, q):
xShape = X.shape
yShape = Y.shape
while len(xShape) < 3:
X = np.expand_dims(X, axis=0)
xShape = X.shape
while len(yShape) < 3:
Y = np.expand_dims(Y, axis=0)
yShape = Y.shape
assert (
xShape == yShape
), f"Addition: Invalid matrix dimensions: X[{xShape}], Y[{yShape}]"
result = []
row = None
for i in range(xShape[0]):
for j in range(xShape[1]):
if j == 0:
row = []
result.append(row)
el = np.polyadd(X[i][j], Y[i][j])
el = np.remainder(el, q)
row.append(el)
return np.squeeze(np.array(result)).astype(int)
q = 17
f = np.array([1, 0, 0, 0, 1])
s = np.array([[-1, -1, 1, 0], [-1, 0, -1, 0]], dtype=int)
A = np.array(
[[[6, 16, 16, 11], [9, 4, 6, 3]], [[5, 3, 10, 1], [6, 1, 9, 15]]], dtype=int
)
e = np.array([[0, 1, 0, 0], [0, 1, -1, 0]], dtype=int)
t = matrix_polyadd(matrix_polymul(A, s, f, q), e, q)
print(f"t:\n{t}")
m = np.array([1, 0, 1, 1])
m = m * q // 2
print(f"m:\n{m}")
r = np.array([[-1, 1, 0, 0], [1, 1, 0, -1]], dtype=int)
e1 = np.array([[0, 1, 1, 0], [0, 1, 0, 0]], dtype=int)
e2 = np.array([-1, -1, 0, 0])
u = matrix_polyadd(matrix_polymul(np.transpose(A, axes=(1, 0, 2)), r, f, q), e1, q)
v = matrix_polyadd(matrix_polyadd(matrix_polymul(t, r, f, q), e2, q), m, q)
print(f"u:\n{u}")
print(f"v:\n{v}")
m1 = matrix_polyadd(v, matrix_polymul(s, u, f, q) * (-1), q)
print(f"Decrypted message before rounding:\n{m1}")
m_decrypted = [1 if abs(x - q // 2) < min(x, q - 1 - x) else 0 for x in m1]
print(f"Decrypted message:\n{m_decrypted}")
The essence of the code are the functions matrix_polymul and matrix_polyadd. Note that, in order to have generic functions that multiply or add matrix with matrix or matrix with vector, at the beginning of the functions we expand dimensions if needed and at end we remove dimensions if needed (squeeze). This has nothing with the essence of the implementation, it is only convenience to reduce the number of the functions defined.
The most interesting part in the code is the multiplication of polynomials in R_q:
el = np.polymul(X[i][k], Y[k][j])
el = np.polydiv(el, f)[1]
el = np.remainder(el, q)
Note: The numpy polynomial operations used in the code require representation where the coefficient for the largest degree comes first.
The execution output is:
t:
[[16 15? 0? 7]
?[10 12 11? 6]]
m:
[8 0 8 8]
u:
[[11 11 10? 3]
?[ 4? 4 13 11]]
v:
[ 7? 6? 8 15]
Decrypted message before rounding:
[ 7 14? 7? 5]
Decrypted message:
[1, 0, 1, 1]:
The numbers (except [8,0,8,8] instead of [9,0,9,9]) match the numbers and polynomials presented in the Baby Kyber article.
So far, so good! Let's move slowly to the real Kyber specification and implementation. The specification along with the reference implementation in C can be downloaded here.
In the Kyber specification we have:
q = 3329
n = 256
Note that:
q = 13 * n + 1
In order to sample the coefficients for s, e, r, e_1 and e_2 we shall define what Centered Binomial Distribution - CBD is. For some parameter eta, it is defined as:
(a_1, a_2, a_3, ... , a_eta, b_1, b_2, b_3, ... , b_eta) <- {0,1} ^ (2*eta)
output = Sum(a_i - b_i)
where the Sum is from 1 to eta.
For example, for eta = 2 we get random number among the values [-2, -1, 0, 1, 2]. The probabilities of the numbers are not equal, i.e. Pr(0) > Pr (1) > Pr (2).
Here is a code in Python that generates an array of n coefficients where each coefficient comes from CBD(eta).
def _cbd(n, eta):
i = 0
while i < eta:
p1 = np.random.randint(0, 2, n)
if i == 0:
p = p1
else:
p = p + p1
i += 1
return p
# centered binomial distribution
def cbd(n, eta):
a = _cbd(n, eta)
b = _cbd(n, eta)
return a - b
In Kyber512 (k=2):
Next, Kyber uses Compress and Decompress functions. Compression discards the least significant bits in the polynomial coefficients (not that important bits). Decompress transforms the coefficients back into value similar, but not the same to the original coefficients. The operations are defined as:
Compress(x, q, d) = round(x * 2 ^ d / q) mod 2 ^ d
Decompress(x, q, d) = round(x * q / 2 ^ d) mod q
In Python:
def compress(x, q, d):
q1 = 2**d
x = np.round(q1 / q * x).astype(int)
x = np.remainder(x, q1)
return x
def decompress(x, q, d):
q1 = 2**d
x = np.round(q / q1 * x).astype(int)
x = np.remainder(x, q)
return x
In Kyber512 (k=2):
So let's us see how the Teenage Kyber looks like.
import numpy as np
def generate_poly(n, q):
p = np.random.randint(0, q, n)
return p
def generate_poly_matrix(n, q, k):
result = []
row = None
for i in range(k):
for j in range(k):
if j == 0:
row = []
result.append(row)
row.append(generate_poly(n, q))
return np.array(result, dtype=int)
def _cbd(n, eta):
i = 0
while i < eta:
p1 = np.random.randint(0, 2, n)
if i == 0:
p = p1
else:
p = p + p1
i += 1
return p
# centered binomial distribution
def cbd(n, eta):
a = _cbd(n, eta)
b = _cbd(n, eta)
return a - b
def cbd_vector(n, eta, k):
result = []
for i in range(k):
result.append(cbd(n, eta))
return np.squeeze(np.array(result, dtype=int))
def matrix_polymul(X, Y, f, q):
xShape = X.shape
yShape = Y.shape
if len(xShape) == 2:
X = np.expand_dims(X, axis=0)
xShape = X.shape
if len(yShape) == 2:
Y = np.expand_dims(Y, axis=1)
yShape = Y.shape
assert (
xShape[1] == yShape[0]
), f"Multiplication: Invalid matrix dimensions: X[{xShape}], Y[{yShape}]"
result = []
row = None
s = None
for i in range(xShape[0]):
for j in range(yShape[1]):
if j == 0:
row = []
result.append(row)
s = None
for k in range(xShape[1]):
el = np.polymul(X[i][k], Y[k][j])
el = np.polydiv(el, f)[1]
el = np.remainder(el, q)
if k == 0:
s = el
else:
s = np.polyadd(s, el)
s = np.remainder(s, q)
if k == xShape[1] - 1:
row.append(s)
return np.squeeze(np.array(result)).astype(int)
def matrix_polyadd(X, Y, q):
xShape = X.shape
yShape = Y.shape
while len(xShape) < 3:
X = np.expand_dims(X, axis=0)
xShape = X.shape
while len(yShape) < 3:
Y = np.expand_dims(Y, axis=0)
yShape = Y.shape
assert (
xShape == yShape
), f"Addition: Invalid matrix dimensions: X[{xShape}], Y[{yShape}]"
result = []
row = None
for i in range(xShape[0]):
for j in range(xShape[1]):
if j == 0:
row = []
result.append(row)
el = np.polyadd(X[i][j], Y[i][j])
el = np.remainder(el, q)
row.append(el)
return np.squeeze(np.array(result)).astype(int)
def compress(x, q, d):
q1 = 2**d
x = np.round(q1 / q * x).astype(int)
x = np.remainder(x, q1)
return x
def decompress(x, q, d):
q1 = 2**d
x = np.round(q / q1 * x).astype(int)
x = np.remainder(x, q)
return x
n = 256
q = 3329
k = 2
eta1 = 3
eta2 = 2
du = 10
dv = 4
f = np.zeros(n + 1)
f[0] = 1
f[n] = 1
s = cbd_vector(n, eta1, k)
print(f"s\n{s}")
A = generate_poly_matrix(n, q, k)
e = cbd_vector(n, eta1, k)
print(f"e:\n{e}")
t = matrix_polyadd(matrix_polymul(A, s, f, q), e, q)
print(f"t:\n{t}")
#Encryption
m = np.array(generate_poly(n, 2))
m_scaled = decompress(m, q, 1)
print(f"m_scaled:\n{m_scaled}")
r = cbd_vector(n, eta1, k)
e1 = cbd_vector(n, eta2, k)
e2 = cbd_vector(n, eta2, 1)
u = matrix_polyadd(matrix_polymul(np.transpose(A, axes=(1, 0, 2)), r, f, q), e1, q)
v = matrix_polyadd(matrix_polyadd(matrix_polymul(t, r, f, q), e2, q), m_scaled, q)
print(f"u:\n{u}")
print(f"v:\n{v}")
u = compress(u, q, du)
v = compress(v, q, dv)
print(f"u:\n{u}")
print(f"v:\n{v}")
# Decryption
u = decompress(u, q, du)
v = decompress(v, q, dv)
m1 = matrix_polyadd(v, matrix_polymul(s, u, f, q) * (-1), q)
print(f"Decrypted message before rounding:\n{m1}")
m_decrypted = compress(m1, q, 1)
print(f"Original message:\n{m}")
print(f"Decrypted message:\n{m_decrypted}")
print(np.array_equal(m, m_decrypted))
The above code generates polynomials in line with the CBD rules, relies on compress and decompress operations and generalises for Kyber512 (k=2), Kyber768 (k=3) and Kyber1024 (k=4) (the specification document contains the concrete k, eta_1, eta_2, d_u and d_v parameters for each Kyber version).
Is the Teenage Kyber above production ready? No, it is far from production ready code.
Consider the main ingredient in the algorithm:
f = np.zeros(n + 1)
f[0] = 1
f[n] = 1
el = np.polymul(X[i][k], Y[k][j])
el = np.polydiv(el, f)[1]
el = np.remainder(el, q)
In order to multiply two polynomials we have to multiply each coefficient from the first polynomial with each coefficient from the second polynomial. For polynomials of order n, the number of multiplications is n * n = n ^ 2, i.e. the complexity is O(n ^ 2). Also we calculate remainders modulo q = 3329. Adding, multiplying and shifting numbers are efficient and cheap processor operations. Division and modulo operations with arbitrary numbers are expensive processor operations.
Can we do better?
Yes we can!
The Number-Theoretic Transform (NTT) comes to the rescue. With the help of NTT and it's inverse we can multiply two polynomials with complexity O(n*log(n)). Note that for n = 256 (Kyber's polynomials length), NTT will speed up the whole process 32 times. NTT is such crucial algorithm today. It is used in many blockchain algorithms as well.
Efficient modulo operations can be achieved using Barrett and Montgomery Reduction. Now we talk!
NTT, Barrett and Montgomery Reduction will be the topic of the next article in this two part series.
Marjan
Software Engineering
1 年great read ??
Software Engineer == Akta Manniskor
1 年Hi Marjan, thank you for explaining the algorithms with the nice examples and your digested comments. Very insightful information. Thank you for sharing it.
Head of Cyber Security & Digital Forensics Department | Associate Professor at Military Academy 'General Mihailo Apostolski' - Skopje
1 年Najverojatno poodamna se dostapni za government.