Connecting Jupyter Notebook to a Google Cloud VM using only an Internal IP
Rubens Zimbres, Ph.D.
ML Engineer, Gen AI, Sec+, Google Developer Expert in AI/ML ^ Google Cloud
When you create a new VM (Virtual Machine) in Compute Engine you will face some choices: the region of the machine, its configuration (including GPUs), if it is based on a container image, which is the boot disk operating system (Debian, Ubuntu, Windows Server), its version, size and type of boot disk (HDD, SSD), the level of access to Cloud APIs, firewall predefined rules for HTTP and HTTPS and also networking and management.
Regarding networking, we have some basic options:?
Public IP addresses are the same as external IP addresses. A public (or external) IP address allows you to connect to the internet. A private or internal IP address is the address your network router assigns to your device. Each device within the same network is assigned a unique private IP address and devices on the same internal network can talk to each other.
By making it more difficult for an external host or user to establish a connection, private IPs help increase security within a specific network. Take printers as an example: you can print documents via wireless connection, but your neighbor can’t access your files. The machine we will create does not exist on the public internet, because it does not have a public IP. So we are reducing the attack surface to our development and production infrastructure and isolating our internal traffic from the external world, following Google Cloud’s security best practices.
As we will communicate via internal IPs and we may also need to access files in Storage, BigQuery, we also have to enable Private Google Access, that is a level of authorization that allows traffic within the VPC (Virtual Private Cloud) network.
Basically, if you create a VM in Compute Engine with external IP, a malicious hacker can run nmap command to scan and enumerate possible vulnerabilities of your VM, according to the Operational System, version, open ports, etc. Nmap is one of the first steps of a hacker invasion, so, if your VM does ot have an external IP, it will be safer. For nmap options, refer to: https://nmap.org/book/man-briefoptions.html
This nmap scan on the Compute Engine VM outputs open ports (22) and guesses about the operational system (a Linux Virtual Machine):
In this case, as we opted for a least privileged approach, the only open port is 22, still vulnerable to brute force attacks, but quite a difficult task. However, there are cases where you have to configure HTTP, HTTPS and other services. Remember: 70% of all cybersecurity incidents in the cloud occur due to misconfigurations, so we have to be especially careful with this.
Personal computers are an even tastier target, as they may have many more vulnerabilities not patched and open ports like port 21 (FTP), 22 (SSH), 23 (telnet), 80 (HTTP), 443 (HTTPS), 139 and 445 (Samba shares) and 3389 (RDP - Remote Desktop Protocol).
Back to our VM, besides these configuration options presented when you create a new VM in Compute Engine, we can also define a startup script in Management option and metadata in the format key-value (to identify VM costs, for example). Once the VM is created, we can add SSH public keys to it to connect to our on-premises computer that holds the correspondent private key. For this exercise, we won’t need to generate SSH keys. By default, when you connect to a VM using the console or gcloud CLI command, your SSH keys are generated automatically.?
So, let’s configure the environment to connect our Jupyter notebook to the VM. First, you create a Google Cloud project using Console. Name it testmachine-334455. After that, on the three bars on the top left blue area, drop down menu, you choose Compute Engine. Enable the API. Now let’s create the instance:
Click Create Instance, name it "machine-internal-ip", for machine configuration choose n1-standard-4, in boot disk click Change and choose Deep Learning on Linux Debian 10 with version Tensorflow Enterprise 1.15. Choose a disk size of 150GB. In Networking / Network Interfaces, on the option External IP, choose None, on the option Internal IP choose static Internal IP? and click Done. This will leave our machine only with a fixed internal IP, no external IP so it’s not exposed to the public internet. Leave all the rest as default, because this will follow security best practices of least privilege. This machine will cost you 0.15 USD per hour. Click Create. After a few seconds, your VM will be running.
Note that the machine has only an internal IP (10.128.0.2). Now we click the SSH down arrow - View gcloud command:
It will look like this:
gcloud compute ssh --zone "us-central1-a" "machine-internal-ip"? --tunnel-through-iap --project "testmachine-334455"
Let’s not use this for now. Press CTRL+ALT+T in your local Linux machine to open the terminal (or CMD in Windows). We need to install gcloud CLI (command line interface) (https://cloud.google.com/sdk/docs/install#deb). In Ubuntu, run:
$ sudo apt-get install apt-transport-https ca-certificates gnupg
$ echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] https://packages.cloud.google.com/apt cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
$ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -
$ sudo apt-get update && sudo apt-get install google-cloud-cli
It’s supposed that you have a Gmail account. If you have MFA (multi factor authentication) enabled via an Authenticator App, even better. Now, let’s login into Google Cloud. Run:
$ gcloud auth login
Select the Gmail account, in the new browser window click Allow. Now run the following command:
$ gcloud config set project testmachine-334455
Now, paste the gcloud SSH command of the VM into the Linux terminal in your local machine:
$ gcloud compute ssh --zone "us-central1-a" "machine-internal-ip"? --tunnel-through-iap --project "testmachine-334455"
It will update project metadata and add your key as a known host. You are now inside the instance, via IAP Tunnel. IAP (or Identity-Aware Proxy) lets you establish a central authorization layer for applications accessed by HTTPS, so you can use an application-level access control model instead of relying on network-level firewalls.?
When an application or resource is protected by IAP, it can only be accessed through the proxy by principals (users), who have the correct Identity and Access Management (IAM) role. When you grant a user access to an application or resource by IAP, they're subject to the fine-grained access controls implemented by the product in use without requiring a VPN. When a user tries to access an IAP-secured resource, IAP performs authentication and authorization checks.
Figure 1 shows how IAP (Identity Aware Proxy) works.?
Figure 1. Identity-Aware Proxy authorizations.
This is the first step of our task, run Jupyter in the VM via Identity-Aware Proxy. However, the Jupyter notebook will be running inside an internal IP and we need to bring it to our browser. This second part will be achieved by a solution called TCP Forwarding.
IAP's TCP forwarding feature lets you control who can access administrative services like SSH and RDP (Remote Desktop Procedure) on your backends from the public internet. The TCP forwarding feature prevents these services from being openly exposed to the internet. Instead, requests to your services must pass authentication and authorization checks before they get to their target resource.
Exposing administrative services directly to the internet when running workloads in the cloud introduces risk. Forwarding TCP traffic with IAP allows you to reduce that risk, ensuring only authorized users gain access to these sensitive services. Users gain access to the interface and port if they pass the authentication and authorization check of the target resource's Identity and Access Management (IAM) policy, as seen on? Figure 2.
Figure 2. TCP Forwarding.
Good, we can now run our Jupyter notebook. First, let’s do some basic config. Run:
领英推荐
$ jupyter notebook --generate-config
Writing default config to: /home/user/.jupyter/jupyter_notebook_config.pyg
Now, run ipython in the terminal and add a password to Jupyter::
Copy this sha1 value to a text file. You will need it later. Now let’s edit the Jupyter configuration file with nano:
$ sudo nano /home/user/.jupyter/jupyter_notebook_config.py
Add the following and paste the sha1 value:
c=get_config()
c.NotebookApp.ip=’*’
c.NotebookApp.open_browser=False
c.NotebookApp.password=’sha1:e98cdaff0fc8:61abe22b2381e09a72e8f435cdf608d8c33be078’
Your nano editor will look like this:
Now do CTRL+O (O the letter), Enter, then CTRL+X to save and exit nano. Start Jupyter:
$ jupyter-notebook --ip=0.0.0.0 --port=8080 --no-browser &
Now Jupyter is running but basically unavailable to us, as we cannot open the machine internal IP in our browser on port 8081 used by Jupyter. Leave Jupyter running and now open a new Linux terminal window (CTRL+ALT+T). We are going to configure IAP and TCP Forwarding, which will forward the Jupyter notebook to a browser port at localhost.?
For IAP connection, we have to check the following items:
Go to IAM / Roles / Create Role / Add Permissions. In the second filter below, type iap.tunnelInstances.accessViaIAP, then check and add. Click Create role. Its name will be “internal”.
Now, go to IAM, find the Compute Engine Service Account of your VM in this format “[email protected]”, click the pencil, Add another role, select Custom and add the role you just created (picture below). Click save.
Now, let’s go to Security / Identity Aware Proxy in the three bars at the top left in the console. Go to the menu SSH and TCP resources. As you will see, we have to edit a Firewall rule to allow connection through TCP for the source IP 35.235.240.0.?
Go to VPC Network / Firewall / Create firewall. Keep defaults, select ingress. In Targets, choose All instances in the network. In Source IPv4 ranges, add 35.235.240.0/20. In protocols and ports, check TCP, leave ports blank. Click Create.
You can also do this by running the following command in gcloud:
$ gcloud compute firewall-rules create allow-ssh-ingress-from-iap
??--direction=INGRESS \
??--action=allow \
??--rules=tcp \
??--source-ranges=35.235.240.0/20
Now we go to the drop down menu on the left of the Console, choose VPC Network and click on the subnet of our Compute Engine VM, in this case, us-central1. This will open the details. Click Edit, enable Private Google Access (On) and Save.
At this point, IAP configuration is ready. If not, configure the OAuth Consent under IAM / Identity-Aware Proxy / HTTPS Resources.
Now, run locally in the second terminal you opened:
$ gcloud compute start-iap-tunnel machine-internal-ip 8081 --zone=us-central1-a --local-host-port=localhost:9999
Where 8081 is the port being used by Jupyter and 9999 is the port you will open in your browser.
Open your Jupyter notebook instance at your browser in the following address:
https://127.0.0.1:9999
Great! Now you can work safer, as this configuration does not change absolutely nothing in Jupyter operation. If you need to access Storage, Private Google Access is enabled so you can easily do it (as long as Compute Engine has the authorization role to view/create objects). In case your VM needs to access other Google Cloud APIs and services, like BigQuery, you can easily create a specific role on IAM and attach it to the Compute Engine Service Account.
One last tip: ALWAYS remember to STOP the instance after using it, as it may incur in unnecessary costs. If the instance is stopped, you will only be charged for the disk space, in this case, 150GB. So the total monthly cost for disk space will be $0.040 USD per GB, that equals 6 dollars a month (150 * 0,04) if the instance is stopped.