DeepSeek on AWS Bedrock
DeepSeek Logo

DeepSeek on AWS Bedrock

There is a lot of talk right now about DeepSeek. I am a bit scare about running any sort of model where I don't know where my data is going (this includes public for free models like OpenAI). So I want to experiment with DeepSeek in the safest way possible. For me that is using a Llama compatible version in AWS Bedrock.

Why is DeepSeek so exciting?

DeepSeek has created market turmoil in the tech sector and a slew of headlines but what is different about this large language model? There are a few things:

  • DeepSeek R1 is only the latest model from DeepSeek but is the first one that is on a par with other leading models. It has exceeded OpenAI's o1 model in some benchmarks
  • The DeepSeek model has been trained at a fraction of the cost and without access to teh latest hardware
  • The DeepSeek model is smaller so it is more efficient for inferences
  • DeepSeek is produced in China. It questions the US dominance of the AI market
  • DeepSeek is a research model. They have open sourced the model and published a lot of information about training methodology. They have been very open about what they have done so you can verify their claims. One of the anomalies of AI models is companies like OpenAI are not at all open.
  • DeepSeek has to a large extent come out of nowhere. They don't have a team of world leading AI researchers. It is a small company backed by a hedge fund.
  • There are concerns around privacy and pro-Chinese propaganda

To be able to intelligently comment on DeepSeek though I want to test it. First lets get the model set up...

Setup

The best way to set up DeepSeek is using a Jupyter notebook in Sagemaker. This is far more efficient than downloading the model locally and the uploading it to s3.

The total model size is about 17GB. Most of this is the model weights (split across 2 files)

Before you start you need a working sagemaker studio notebook that can download files and upload to s3. Because of the size of the model, make sure you have enough space. The default 5GB file system will not cut it. I set mine to 30GB (probably bigger than I needed).

You will also need a s3 bucket you can upload to.

Step 1 - install dependancies (you may not need this if using Sagemaker but it does not hurt)

!pip install huggingface_hub boto3        

Step 2 - Download the model from hugging face

from huggingface_hub import snapshot_download
model_id = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
local_dir = snapshot_download(repo_id=model_id, local_dir="DeepSeek-R1-Distill-Llama-8B")        

Step 3 - Upload to s3

Set the name of your s3 bucket

import boto3
import os
s3_client = boto3.client('s3', region_name='us-east-1')
local_directory = 'DeepSeek-R1-Distill-Llama-8B'
bucket_name = '...'
for root, dirs, files in os.walk(local_directory):
  for file in files:
    local_path = os.path.join(root, file)
    s3_key = local_path
    s3_client.upload_file(local_path, bucket_name, local_path)        

Step 4 - create a custom model import job

  • Make sure you are using a region that supports custom model upload like us-east-1
  • Go to Bedrock > Imported models in the console
  • Click on the Import Model button
  • Give your model a name like 'DeepSeekTest'
  • Use the browse section to find your s3 bucket and the folder with the model in
  • Create an execution role if you have not already got one
  • Click the import model button.
  • Go make a cup of tea (it takes about 10 minutes)

Note: You can't upload a model from the root of a s3 bucket. The job will fail after about 6 minutes.

Step 5 - experiment via playground

In the Imported Models section make sure you are on the models tab. Click on the name of your model and then click on open in playground.

Note: If you are too quick you will get a red banner about the model not being available yet. Your tea break was not quite lone enough. Wait a little while and try again.

One final point - This is using the smaller Llama model available. If you change all 8B references to 70B you will get the larger, more capable model. The only caveat is that one is 150GB. You will need to increase the maximum space allowed in the Sagemaker domain storage settings and then kick off a space with around 160GB for the download to work.

Evaluation I

First it is worths saying there are two Llama based version to try. I have chosen the smaller 8B version. I am going to do further evaluation with the 70B version later.

I have tried DeepSeek with a few different types of query. I have so far been impressed by the answers. Here are some of the things I have been trying:

  • Who is the best character in lord of the rings - The model provides a reasoned answer in two parts. There is a 'think' section and then a final answer. It starts with Bilbo but discusses other characters roles, how they contribute to the narrative, their personal journeys an level of heroism. It finally lands on Bilbo.
  • Tell me about Taiwan - I picked this particular query because of some of the internet talk on censorship, misinformation and spreading Chinese propaganda. I wanted to see if this was trained into the model or the result of guard rails applied separately. The model is clearly aware of the one-China policy and disputed status of Taiwan. It does not give a terrible answer, but it is clearly far more pro-Chinese than I would expect from any other LLM. Other queries about Tiananmen Square or China being an open society also produce similar responses. There is clearly some additional filtering or other differences being applied in the app version
  • A meaningless typo (I accidentally hit enter halfway through the first word) - I got a very strange reply about finding the best pizza in town. It seems to be mainly the reasoning and considerations like considering cost and delivery options. Overall a pretty well reasoned response but not to a question I was asking
  • What is 84 multiplied by 27 - As well as the correct answer, there is an explanation of how it breaks down the sum into 8 x 27 (and then add a zero) and 4 x 27. It hen adds the results together.
  • What is "how old are you" in French - It breaks this down and produces 2 possible translations, one formal and one informal. Sadly neither are correct.
  • Write a 200 word synopsis of the film Legally Blonde - First don't blame me, that was a suggestion I was given. Starting with the positive, there is a pretty well written synopsis with a detailed think section explaining how it selected what to include. However there is some weirdness. The first item was this was interpreted as write a synopsis of both Legally Blonde 1 and 2. The second is while it clearly knows the film, there are some inaccuracies and multiple hallucinations.

In my initial evaluation I have deliberately used very na?ve prompts. The goal was to use something that was easy to demonstrate. I have also tried using some of the prompt styes in the DeepSeek documentation on Github and experimented with some code generation.

Evaluation II

Based on my experience I decide I really had to try the larger model as there were a few items where it had not performed well. I tried again with the 70B version.

  • Who is the best character in lord of the rings - I tried this a couple of times. One of the times the model did get stuck in a loop and hit the maximum response length before finishing. It was also clearly hallucinating as it was describing 'Peter Jackson' and 'fan fiction' as characters. Other times it has given a well reasoned response but concluded different characters are the best with plausible reasoning.
  • Tell me about Taiwan - The response gave me a lot of information about the history of Taiwan. It did mention the one China policy but overall it was actually a quite balanced answer.
  • A meaningless typo - I could not exactly replicate what I did last time but I tried a few different typos. Each time the results were quite unpredictable. Some were about programming, a wide list of authors names or information about Spanish. I also tried a couple in Claude Sonnet 3.5 for comparison. Every time that said it did not know what I was asking and texted gracefully.
  • What is 84 multiplied by 27 - As before the maths worked well.
  • What is "how old are you" in French - As before It breaks this down and produces 2 possible translations, one formal and one informal. This time they were both correct.
  • Write a 200 word synopsis of the film Legally Blonde - This time the summary was more accurate. It only looked at the first Legally Blonde film and wrote a plausible synopsis.

I have also experimented with a few other tasks like summarising or code generation. It does seem to do very well at reasoning tasks rather than 'tell me about X' tasks where it may lack context.

Hopefully over the next few days I will be able to experiment more and do some direct comparisons.

My thoughts on DeepSeek

I have tried both the Llama distil versions of DeepSeek. So far I have been quite impressed. There are a few glitches. I have only been looking at the Llama distil versions which are significantly smaller than the actual r1 model. For the size though they are very impressive.

  • Llama 8B distil - about 15GB
  • Llama 70B distil - about 140 GB
  • Actual DeepSeek r1 - about 650 GB

The Llama 8B distil version is pretty portable at that size!

There are a couple of interesting things they have done that has made this model as performant as it is:

  • The model uses chain of thought reasoning. It is quite interesting as it outputs its reasoning process. The chain of thought process gives a better answer
  • They have used a multi step training technique with a very small (a couple of orders of magnitude) cold start process and then relied on unsupervised reinforcement learning for most of the training. They then use data from their previous v3 model and synthetic data for a final round of reinforcement learning.

One of the main things is how much they have achieved with a much smaller budget and resources. I think this may have 2 long term outcomes. The first is they have shown it is possible for more nimble smaller players to enter the market. The second is LLM model development and inference will become more efficient. They have also done this over a couple of months.

As all of this is open source (both the model and training approach) other model providers will be incorporating these techniques into their own model training. I would be surprised if there were not improvements in other models in the next few weeks.

My thoughts on Bedrock model import

Model import is a new Bedrock feature. It is only available in a limited number of regions. It supports a few model families (Flan T5, Llama or Mixtral). It is really easy to upload a model that from HuggingFace or a model you have customised locally. Overall it's very easy to use and you can start using compatible models with the benefit of the Bedrock managed infrastructure. It's a great new addition to Bedrock and has made this evaluation of DeepSeek very easy.

One issue that has cause a few problems is the model not being available. If a model is not used for a while then it appears it is no longer cached and has to be retrieved from colder storage. You could then catch the exception and wait a while, but this would not be any use for a service that requires an instant response (like a chatbot). This may change as the feature becomes more widely available.

If you are using custom imported models for any interactive applications you really need to used provisioned throughput. Unfortunately this does not appear to be an option for imported models yet. As imported models are a new feature, I expect it to improve over the course of the year some of the features that I hope to see are:

  • More regional availability
  • Provisioned throughput
  • Improved cold starts
  • More models supported

I am looking forward to other projects where I can use it!

Credits

I have based my work on this AWS community blog although I have had to make a couple of tweaks to get its to work.

https://community.aws/content/2sECf0xbpgEIaUpAJcwbrSnIGfu/deploying-deepseek-r1-model-on-amazon-bedrock

Also a big thank you to my colleague Tom Carmichael for the idea!

要查看或添加评论,请登录

Andrew Larssen的更多文章

  • Measuring the cost of Bedrock

    Measuring the cost of Bedrock

    Amazon Bedrock is a great product but it does come with one slight problem - attributing costs. At a very high level…

    2 条评论
  • Claud 3.7 Sonnet - Could this change things?

    Claud 3.7 Sonnet - Could this change things?

    First let's start with the obvious. Anthropic Claude 3.

    1 条评论
  • GraphRAG - What's it all about?

    GraphRAG - What's it all about?

    A while ago all the hype in GenAI was about RAG (Retrieval Augmented Generation). RAG is a technique to give LLM (large…

  • DeepSeek on Bedrock - the story continues...

    DeepSeek on Bedrock - the story continues...

    Just over a week ago I wrote an article about running DeepSeek on Amazon Bedrock. This is a follow on piece.

  • RAG for video

    RAG for video

    I have been looking at producing a chatbot able to answer questions based on a company knowledge base. Ideally it would…

  • Amazon Bedrock Model Distillation

    Amazon Bedrock Model Distillation

    Model distillation is quite a complex term. Before we look at the Bedrock product it is worth starting out by answering…

    1 条评论
  • ReInvent keynotes update

    ReInvent keynotes update

    There have been 2 keynotes so far. Monday Night Live with Peter DeSantis and the CEO keynote with new CEO Matt Garman.

  • AWS Resource Control Policies

    AWS Resource Control Policies

    In the last couple of weeks there have been a few announcements coming out of AWS. Normally at this time of year it…

  • Network security and AWS Transit Gateway

    Network security and AWS Transit Gateway

    There are a few ways you can improve your networking security using AWS Transit Gateway. If you are using AWS multi…

  • Advanced RAG with Amazon Bedrock

    Advanced RAG with Amazon Bedrock

    Recently I have been using Amazon Bedrock Knowledge Bases extensively. It really makes setting up a RAG solution very…

社区洞察

其他会员也浏览了