Adding Python wheel dependencies to Glue jobs
Reference 1: Repost article
Reference 2: AWS Glue docs
I am sharing this in case someone faces a similar task. I had to run a AWS Glue Pyspark job that needed an import from the fire library (https://pypi.org/project/fire/). As Glue library does not include fire by default, I have to pip install it in the Glue executor or provide a whl file. The corporate network does not allow downloading python libraries on the fly from the internet - so, pip install using additional-python-modules was not possible.
Typically, I would find the whl file in pypi.org under the downloads. I will download those file into my S3 bucket and use the full S3 URI - 's3://bucket-name/folder/package-name.whl' under --extra-py-files
As the whl files have not been published under the download section for Fire, I could only download the fire-0.5.0.tar.gz file and that was not compatible with the Glue Pyspark job.
I had to find the source version of the library - this was open sourced by google - so, it was easy to locate: https://github.com/google/python-fire/tree/master
Note: this already has a setup.py file with all the information.
All I had to do was to download the git repo into my local machine (or EC2 machine)
and run the below command
pip install setuptools wheel
python setup.py bdist_wheel
领英推荐
Are you just starting to learn Python. Here is a Python crash course to get started on your Developer journey:
Creating a wheel file for our python code
I also had to do this for my own python library. Here I will try to give a template of a setup.py file:
from setuptools import setup, find_packages
setup(
name="YourPackageName",
version="0.1.0",
author="Your Name",
author_email="[email protected]",
description="A short description of the project",
long_description=open('README.md').read(),
long_description_content_type="text/markdown",
url="https://github.com/yourusername/yourpackagename",
packages=find_packages(),
classifiers=[
"Programming Language :: Python :: 3",
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
python_requires='>=3.6',
install_requires=[
"numpy",
"pandas>=1.1.0",
"scikit-learn==0.24.1",
"matplotlib",
"requests"
],
)
After you've created this setup.py file and placed it in the root directory of your project, you can then generate a wheel file by running the following commands in your terminal:
pip install setuptools wheel
python setup.py sdist bdist_wheel
This will create a dist directory containing your .whl file, which is the wheel package for your project. Remember to ensure your project structure is properly set up, with your Python packages and modules organized in a way that setuptools can recognize.