How to SSH Tunnel into AWS EC2 and connect to DocumentDB using Python?
Source: Google Images

How to SSH Tunnel into AWS EC2 and connect to DocumentDB using Python?

Why it's needed?

Before I tell you why it's needed, I'd like to share why I had to do it. The answer is simple: to locally test things in our ML Codebase.

Now, coming to why it's needed:

Amazon DocumentDB is a managed database service that is designed to be secure. This simply means that the database is hosted privately onto something called Amazon Virtual Private Cloud (Amazon VPC). In simple terms, I like to think of it as Amazon's own private internet. So DocumentDB can be directly accessed by any AWS service within the same VPC or any other having required permissions.

SSH tunneling is needed when we want to access DocumentDB resources from outside the cluster's VPC, here on our local machine. To access DocumentDB from your local machine, you typically need to go through a bastion host (EC2 instance) using SSH. This extra layer of security ensures that your database connection is not directly exposed to the internet, reducing the risk of unauthorized access.

Source: AWS Docs.

What is SSH Tunneling?

SSH tunneling, also known as "port forwarding," is a technique used to secure and encrypt communication between two computer systems over an unsecured network, such as the Internet. It involves creating a secure channel (tunnel) through which data can be transferred between a local and a remote machine. In simple terms, it's establishing a VPN.

In our context:

  • The local machine is the one running your Python script.
  • The remote machine is an EC2 instance in your AWS environment.

The SSH tunnel allows secure communication between your local machine and the EC2 instance, providing a secure pathway for data to travel. Once the tunnel is established, you can use it to connect to DocumentDB securely, as if it were running on your local machine.


Code

Before that, you will need a few important constants you might need. I suggest storing them as environment variables for security purposes.

# SSH tunnel configuration
SSH_HOST=ec2-x-x-x-x.region.compute.amazonaws.com
SSH_USER=ec2-user
SSH_KEY_PATH=path to ec2-host-key-pair.pem file
LOCAL_BIND_PORT=3000 # any port of your choice

# MongoDB server configuration
MONGO_HOST=replica_db_name.*.*.docdb.amazonaws.com
MONGO_PORT=27017
MONGO_USERNAME=your_monogdb_username
MONGO_PASSWORD=your_monogdb_password

MONGO_DB_NAME=YOUR_DB_NAME
MONGO_COLLECTION_NAME=YOUR_DEFAULT_COLLECTION_NAME

# db parameters dict
DB_PARAMS = {
    "host": '127.0.0.1',
    "port": LOCAL_BIND_PORT,
    "username": MONGO_USERNAME,
    "password": MONGO_PASSWORD,
}        

  • SSH_HOST: is the public IP for your EC2 instance running in the same VPC as your DocumentDB.
  • SSH_KEY_PATH: path to your key-pair.pem file. This is used to authenticate your SSH connection to the EC2 instances.


NOTE: Whitelist your IP Address in your EC2 Security Groups before running the code.

from pymongo import MongoClient
from sshtunnel import SSHTunnelForwarder


tunnel = SSHTunnelForwarder(
            (SSH_HOST, 22),
            ssh_username=SSH_USER,
            ssh_pkey=SSH_KEY_PATH,
            remote_bind_address=(MONGO_HOST, MONGO_PORT),
            local_bind_address=('127.0.0.1', LOCAL_BIND_PORT)
        )

# start the tunnel
tunnel.start()

# get mongo client
client = MongoClient(
                directConnection=True,
                **DB_PARAMS
        )

# do something
db = client[MONGO_DB_NAME]
collection = db[MONGO_COLLECTION_NAME]

documents = list(collection.find(some_query))
print(documents)

# stop the tunnel and close the client
client.close()
tunnel.stop()

client=None
tunnel=None        

How does this work?

Here is a simple picture to describe it:

Source:

The figure presents a simplified overview of SSH tunneling. The secure connection over the untrusted network is established between an SSH client and an SSH server. This SSH connection is encrypted, protects confidentiality and integrity, and authenticates communicating parties.

The SSH connection is used by the application (our Python code) to connect to the application server (Mongo/DocDB Server). With tunneling enabled, the application contacts a port (= 3000) on the local host ('127.0.0.1') that the SSH client listens on. The SSH client then forwards the application over its encrypted tunnel to the server (EC2 Instance). The server then connects to the actual application server (DocumentDB) - usually on the same machine or in the same data center as the SSH server. The application communication is thus secured without having to modify the application or end-user workflows.

Source: AWS Docs.

References:


Tags: 亚马逊 Amazon Web Services (AWS)

Alvaro Fla?o Larrondo

Software Engineer at Amazon

9 个月

Nice post. One cool addition I've used is using AWS Session Manager and leverage SSH's ProxyCommand to avoid exposing the EC2 instance to the internet

要查看或添加评论,请登录

Amit Vikram Raj的更多文章

  • Layer Normalization

    Layer Normalization

    Layer Norm, Batch Norm & Covariate Shift: Continuing from my last post on batch normalization, Here are a few things on…

  • Bahdanau Attention Mechanism

    Bahdanau Attention Mechanism

    In my last NLP post regarding NMT(Neural Machine Translation), I shared about its architecture in a very intuitive…

  • NMT Architecture

    NMT Architecture

    In my previous post, I shared a higher level understanding of NMT(Neural Machine Translation) architecture. So…

  • Improving Predictions in Language Modelling

    Improving Predictions in Language Modelling

    Here is something that I picked up along the way on how we can improve our predictions of LSTM networks, specifically…

    2 条评论

社区洞察

其他会员也浏览了