System design practice: distributed ID generation
The URL shortener problem is all about links. Photo by JJ Ying on Unsplash

System design practice: distributed ID generation

It can be hard to get experience designing complex systems if you don’t previously have that experience. For that reason, I recently started a series on my personal blog on scalability concepts. Today, let’s explore distributed ID generation in the context of a very common systems design question: building a URL shortener.

The problem statement

Build a service where users can input URLs, and the service will return a short URL. Later, that short URL should redirect to the original URL. Common examples of this type of service are bit.ly and TinyURL.

At its core, a URL shortener is a mapping from an arbitrary long URL to a short ID you store in your service. Given a short ID, you need to look up the long URL for redirection. (As part of the “requirements gathering state”, you might decide you want the reverse mapping too, in order to avoid generating two IDs for the same long URL.)

Once the mapping exists, doing the redirection is straightforward, so we’ll look at the ID generation.

Generating a short ID

The important part of the short ID generation is the short ID must be unique. If you give the service a long URL, whatever ID is generated must not be in use for another long URL. For a single-threaded, single server application, the requirement is easily met: generate a random short ID, check if it already exists in the database, then write to the database only if the ID didn’t exist.

The situation becomes more complicated as you handle more traffic. Let’s evaluate the different solutions in the blog post:

  • Moving the uniqueness constraint to a single database. The database can atomically do the test and write in one step, allowing you to scale to multiple threads and even multiple application servers, as long as you have one database. You would catch the constraint violation in your application and generate a new ID.
  • Centralized writes. The write portion doesn’t change if you have only one database handling your writes. However, you have to account for replication delays to your read replicas, and you still can’t scale to multiple write databases. I’ll talk about replication delays in a future post.
  • Centralized ID generation. Certainly a possibility, but because the IDs are not sequential, you may just end up with a database containing all your IDs to back your ID generation service! Probably not a good fit for this application.
  • Pick randomly from a large range. Not a possibility for a URL shortener, because you specifically want your IDs to be small!
  • Encode the partition in the ID. As you start scaling your application geographically, you’ll probably have no choice but to pick this approach, at least across different data centers. As I mentioned in the blog post, in a single data center, you might still use a centralized approach if that makes sense (sufficient randomness in your short IDs for example).
  • Semantic IDs. You could hash the original URL to get your short ID. However, you have to be careful you don’t end up with hash collisions. To prevent this, you would probably end up using one of the above mechanisms to guarantee uniqueness anyway.

The URL shortener problem is a classic, partially because it’s easy to explain, but brings about some interesting scaling challenges if you take it to its limit. A lot of developers will never encounter these challenges, which is why I don’t think this problem allows for adequate evaluation of all candidates. But if you’re looking to bootstrap your scaling knowledge, you’ll be better prepared for big company interviews.

This article was originally published on the Hiring For Tech website. If you want to read more content from me, please subscribe either by email or on LinkedIn. And please reshare with your networks so others can find out about Hiring For Tech!

Max Hodges

CEO at Japan Rabbit and Blackship.com

4 å¹´

>?as long as you have one database. You would catch the constraint violation in your application and generate a new ID. This often isn't as straightforward as it sounds. You can have a single db but with concurrent connections, your id generator can generate the same id in a second call before the first call has finished. There are ways to handle this of course, with locking, and the table itself can absolutely prevent a duplicate, but you may still have to deal with the problem of duplicates being generated due to concurrency.

Jonathan Freeman

Experienced Software Engineering Leader

5 å¹´

Insightful stuff as usual. These problem sets are more common than most new engineers think. A blog or newsletter may only deal with a few hundred or thousand entries over its lifetime. Over the last 10 years, breaking “maxint” has become a big problem for our healthcare company and a design consideration. You’d be surprised how easy it is to get to 2.147 billion of things at scale :) If they chose database in this design example, hopefully they haven’t used an INT for primary/identity key :)

要查看或添加评论,请登录

Avik Das的更多文章

  • Acing the system design interview

    Acing the system design interview

    It’s been a while since I last wrote, but in the last year, I’ve done a lot of system design interviews. I really like…

    4 条评论
  • "It's not peaches and cream either for men"

    "It's not peaches and cream either for men"

    I spend a lot of time talking about men’s mental health because it’s what I, as a man, know about. And like with…

    1 条评论
  • It's okay to not be okay

    It's okay to not be okay

    What I’m about to say applies to everybody, but with Movember and my own experience as a man in mind, I hope my words…

    2 条评论
  • What's still wrong with tech hiring

    What's still wrong with tech hiring

    Last year, I set out with a head full of disconnected thoughts about hiring and a vision to share those thoughts with a…

    15 条评论
  • One size does not fit all

    One size does not fit all

    I’ve talked about what seem to be two conflicting topics: standardizing your interviews and accommodating different…

    5 条评论
  • Formal interview training

    Formal interview training

    A running theme in this newsletter is the idea that good software engineers don’t automatically make good interviewers.…

    1 条评论
  • Interview apprenticeship

    Interview apprenticeship

    Software engineers are well-positioned to evaluate a candidate’s technical ability, but conducting an interviews is…

    6 条评论
  • Interviewing and pattern matching

    Interviewing and pattern matching

    For candidates, a full day of interviews is grueling, but in the context of demonstrating your technical skills and how…

    3 条评论
  • Technical skills every software engineer interviewer should have

    Technical skills every software engineer interviewer should have

    There’s a lot of discussion about technical skills candidates need to have, like algorithms, systems design, technical…

  • Prepare your story

    Prepare your story

    If you’re planning on starting or continuing your job hunt this year, the beginning of the year is a good time to…

    3 条评论

社区洞察

其他会员也浏览了