You're Killin' Me, AWS! Part II
David Hazar
Certified SANS Instructor | IANS Faculty Member | Consultant | Founder | Public Speaker
To ingress, or not to ingress, that is my question
If you missed part one of my series on the "Load Balancer Club Sandwich", you may want to check that out first, as it will save me from having to repeat myself too much.
In this article, I want to discuss what I think is an exciting, newer service from the folks at Amazon Web Services (AWS) . The service is Amazon VPC Lattice (https://docs.aws.amazon.com/vpc-lattice/latest/ug/what-is-vpc-lattice.html), and while some may disagree with my assessment, if I had to come up with a tagline for the service, I would choose either "Kubernetes for the Cloud" or "Kubernetes, it's not just for containers anymore".
The reason I use this comparison is because you create services and attach them to a network of services and then you can use "Auth policies" to designate which services can talk to other services. This allows services in the same service network to talk to one another as long as it is allowed by the auth policy. While service networks are not exactly the same as namespaces and auth policies are not quite like network policies, it seems like a good enough comparison to me. There are also ways for services to be attached to multiple service networks if cross-network access is needed.
So, why not just use Kubernetes? Well, the great thing about VPC Lattice is it covers more than just Kubernetes. Here are the non-Kubernetes targets that can be targeted as services in VPC Lattice:
These targets along with any pods that need to communicate with one another can all be registered to the same service network.
You may be asking yourself, so what? I can already set up connectivity between all these things. How is this different? Well, what if your Lambda function is in one virtual private cloud (VPC), your application load balancer (ALB) is in another, and your Kubernetes pod in yet another? Well, VPC lattice doesn't care. Once you get them all connected to the same service network with the help of the resource access manager (RAM) service, where they exist in the cloud no longer matters.
VPC Lattice is an abstraction layer on top of your cloud environment. Forget peering, hub and spoke, overlapping IPs, etc., none of that matters (well, kind of). As long as these services only need to communicate over HTTP, HTTPS, or gRPC, they only need the service network. Each service will have their own unique service network-specific fully-qualified domain name (FQDN), but this doesn't resolve to any IP address defined for any of your subnets or VPCs. So, to what address space does it resolve? Let me show you:
领英推荐
Wait, 169.254.171.#, that can't be right, can it? That is the magic of VPC Lattice. Routing the traffic back to these link-local addresses will route back to the service network behind-the-scenes without needing to setup any additional routing, peering, etc. You do, however, need to set up your security groups to allow traffic to and from the Lattice service network as needed. This can easily be accomplished by using the source or destination of com.amazonaws.{region}.vpc-lattice and com.amazonaws.{region}.ipv6.vpc-lattice in your rules.
So, what's the problem? Well . . . this works great for service-to-service communication over HTTP, HTTPS, gRPC (no WebSocket support :-( ) within your cloud environment. But, what about external users of your applications. Wait . . . you want actual people to connect to your applications? Um . . . duh!
Ok, so what are my options? How do I enable ingress traffic to my public-facing applications via the service network? There have to be options, right? Gag . . . yeah, there are some options . . . I guess.
The first and probably best, and I use this word hesitantly, option is to set up a fleet of proxies behind a load balancer and configure them to translate the external DNS for each public-facing service to the internal service FQDN. The other option involves Lambda. For cost reasons, in our lab environment for SEC549: Cloud Security Architecture, I chose the Lambda function. However, that means every time one of our sites is accessed it triggers the Lambda function 10 times just for the homepage. Some of the pages will be cached after that, but those invocations would add up in a real environment.
For those interested, here is the code I am using which is a modified version of a sample from AWS found here (https://github.com/aws-samples/amazon-vpc-lattice-secure-apis/blob/main/api/src/client/fn.py). I am essentially just replacing the first "." character in the FQDN with "-service." to translate it to the FQDN of the service network and then I had to make some modifications to the example to handle images and other content types through this makeshift proxy (please go with option 1 in production).
import json
import os
import requests
import urllib
import logging
import base64
import botocore.session
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
from botocore.credentials import Credentials
logger = logging.getLogger()
logger.setLevel("INFO")
# initialization: environment variables
region = os.environ.get("AWS_REGION", "us-east-1")
# initialization: boto
session = botocore.session.get_session()
# helper functions
def build_response(output):
# headers for cors
headers = output["headers"] if "headers" in output else {}
# lambda proxy integration
response = {
"isBase64Encoded": True,
"statusCode": output["status_code"],
"headers": dict(headers),
"body": base64.b64encode(output["content"]).decode('utf-8')
}
return response
def parse_flag(event, flag):
response = False
if flag in event and event[flag]:
response = True
return response
def send_request(event, add_sigv4=False, debug=False):
headers = event["headers"]
headers["host"] = headers["host"].replace(".", "-service.", 1)
logger.info(headers)
querystring = f"?{urllib.parse.urlencode(event["queryStringParameters"])}" if "queryStringParameters" in event and len(event["queryStringParameters"]) > 0 else ""
endpoint = f"https://{headers["host"] + event["path"] + querystring}" if "endpoint" not in event["body"] else event["body"]["endpoint"]
logger.info(endpoint)
method = "GET" if "httpMethod" not in event else event["httpMethod"]
logger.info(method)
data = "" if "body" not in event else json.dumps(event["body"])
logger.info(str(data))
request = AWSRequest(method=method, url=endpoint, data=data, headers=headers)
request.context["payload_signing_enabled"] = False
if add_sigv4:
print(json.dumps({
"message": "sigv4 signing the request"
})) if debug else None
sigv4 = SigV4Auth(session.get_credentials(), "vpc-lattice-svcs", region)
sigv4.add_auth(request)
timeout = 5
output = {}
try:
print(json.dumps({
"endpoint": endpoint
})) if debug else None
prepped = request.prepare()
# throws requests.exceptions.ReadTimeout, requests.exceptions.ConnectionError
if method == "POST":
response = requests.post(prepped.url, headers=prepped.headers, data=data, timeout=timeout)
elif method == "DELETE":
response = requests.delete(prepped.url, headers=prepped.headers, timeout=timeout)
else:
response = requests.get(prepped.url, headers=prepped.headers, timeout=timeout)
# response is of type requests.models.Response
if response.status_code == 200:
# throws requests.exceptions.JSONDecodeError
output = {
"status_code": response.status_code,
"headers": response.headers,
"content": response.content
}
else:
output = {
"status_code": response.status_code,
"reason": response.reason
}
except requests.exceptions.ReadTimeout:
output = {
"status_code": 504,
"reason": f"request to vpc lattice backend timed out ({timeout} seconds)"
}
except (requests.exceptions.ConnectionError):
output = {
"status_code": 504,
"reason": "connection reset by peer and aborted"
}
except (requests.exceptions.JSONDecodeError):
output = {
"status_code": 200,
"reason": "no json returned in the response",
}
return output
def lambda_handler(event, context):
enable_sigv4 = parse_flag(event["body"], "sigv4")
enable_debug = parse_flag(event["body"], "debug")
logger.info(json.dumps({
"enable_sigv4": enable_sigv4,
"enable_debug": enable_debug
}))
output = send_request(event, add_sigv4=enable_sigv4, debug=enable_debug)
response = build_response(output)
return response
For those that read my last article, you probably already know where this is going, right? Why not allow you to target VPC Lattice services in the ALB? It would make things so much easier. Wait, but if we could just target an FQDN, wouldn't that work. Sure, but sadly, we cannot. I am hopeful though as I do appreciate AWS's velocity, when it comes to change and features, compared to some other providers. I won't name names here but browse through my articles and you might find a reference to a feature in another provider that was 4 years in the making.
One final note, VPCs, internet gateways, route tables, etc. do not go away completely. After all, your instances, pods, etc. still need to access the Internet. You can also still handle ingress the same way you always have, but now you have to worry about many of the things we were trying to avoid by using VPC Lattice.
VPC Lattice definitely reduces the need for peering, solves the IP address overlap problem, and reduces the complexity of routing, which may simplify our approach to application connectivity. It also allows layer 7 network policy controls vs. what you get with network access control lists (NACLs) and security groups. If they could just solve the ingress problem, I would be a happy consumer! In the process, they might just make it so I can go back to buying sandwiches instead of club sandwiches (see Part I).