AWS Sagemaker Hybrid Development
Subhod Lagade
GenAI Architect | Sr Solutions Architect AI Platforms (AWS GCP) | Associate Director
Develop on personal computers, to train and host in the cloud
We use local development environments, such as PyCharm or Jupyter installations on their laptops or personal computer, and then connect to the cloud via AWS Identity and Access Management (IAM) permissions and interface with AWS service API’s through the AWS CLI or an AWS SDK (ex boto3). Having connected to the cloud, customers can execute training jobs and/or deploy resources.
领英推荐
Advantages -You have full control of your IDE in this scenario. You just have to open up your computer to get started. You can easily manage what’s sitting in your S3 bucket, vs what’s running on your local laptop. You iteratively write a few lines of code in your complex models, you check them locally, and you only land in the cloud to scale / track / deploy. This is ideal for frugal super users who thrive on managing virtual environments and software installation versions (more like a Linux systems admin than the average data scientist).
Disadvantages - Inability to scale beyond the compute resources of your laptop. That refers to dataset size, software versions and packages, model size, and number of experiments you are running. Lack of access to GUI-centric features like Autopilot, Data Wrangler, Pipelines. If your laptop dies and you didn’t back up externally, your work is gone! Difficulty in onboarding non-super user employees can increase over time as software, OS, and hardware versions change. This onboarding difficulty gets more and more painful as time goes by, in some worst-case scenarios, it leads to highly valued employees not getting access to Python or Pandas for multiple months!
When to move - While enticing upfront, local development is actually more challenging (and expensive) at scale. This can be scale in terms of data sets, in terms of breadth / depth of experimentation, and volume of team members. If you find yourself spending a significant portion of your time managing local compute environments, it’s time to move to the cloud. This movement will free up your team’s cycles and resources to focus on your business, not on your infrastructure. You may find your teams have the bandwidth to take on more projects as a result, can deliver your projects faster, or dive deeper into the analysis of their datasets.?
Reference - AWS white papers
???????? ?????????????????? ?????????????????? | ?????? ???? | ???????????? ???????????????????? ???????? (????????????????????) | Big Data & Cloud Data Architect | Experienced mentor to aspiring data engineers
3 年Nice article, Thanks Subhod Lagade for sharing