ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

System design practice: distributed ID generation

Avik Das

Engineering Leader @ LinkedIn

å‘å¸ƒæ—¥æœŸ: 2020å¹´3æœˆ23æ—¥

It can be hard to get experience designing complex systems if you donâ€™t previously have that experience. For that reason, I recently started a series on my personal blog on scalability concepts. Today, letâ€™s explore distributed ID generation in the context of a very common systems design question: building a URL shortener.

The problem statement

Build a service where users can input URLs, and the service will return a short URL. Later, that short URL should redirect to the original URL. Common examples of this type of service are bit.ly and TinyURL.

At its core, a URL shortener is a mapping from an arbitrary long URL to a short ID you store in your service. Given a short ID, you need to look up the long URL for redirection. (As part of the â€œrequirements gathering stateâ€, you might decide you want the reverse mapping too, in order to avoid generating two IDs for the same long URL.)

Once the mapping exists, doing the redirection is straightforward, so weâ€™ll look at the ID generation.

Generating a short ID

The important part of the short ID generation is the short ID must be unique. If you give the service a long URL, whatever ID is generated must not be in use for another long URL. For a single-threaded, single server application, the requirement is easily met: generate a random short ID, check if it already exists in the database, then write to the database only if the ID didnâ€™t exist.

The situation becomes more complicated as you handle more traffic. Letâ€™s evaluate the different solutions in the blog post:

Moving the uniqueness constraint to a single database. The database can atomically do the test and write in one step, allowing you to scale to multiple threads and even multiple application servers, as long as you have one database. You would catch the constraint violation in your application and generate a new ID.
Centralized writes. The write portion doesnâ€™t change if you have only one database handling your writes. However, you have to account for replication delays to your read replicas, and you still canâ€™t scale to multiple write databases. Iâ€™ll talk about replication delays in a future post.
Centralized ID generation. Certainly a possibility, but because the IDs are not sequential, you may just end up with a database containing all your IDs to back your ID generation service! Probably not a good fit for this application.
Pick randomly from a large range. Not a possibility for a URL shortener, because you specifically want your IDs to be small!
Encode the partition in the ID. As you start scaling your application geographically, youâ€™ll probably have no choice but to pick this approach, at least across different data centers. As I mentioned in the blog post, in a single data center, you might still use a centralized approach if that makes sense (sufficient randomness in your short IDs for example).
Semantic IDs. You could hash the original URL to get your short ID. However, you have to be careful you donâ€™t end up with hash collisions. To prevent this, you would probably end up using one of the above mechanisms to guarantee uniqueness anyway.

The URL shortener problem is a classic, partially because itâ€™s easy to explain, but brings about some interesting scaling challenges if you take it to its limit. A lot of developers will never encounter these challenges, which is why I donâ€™t think this problem allows for adequate evaluation of all candidates. But if youâ€™re looking to bootstrap your scaling knowledge, youâ€™ll be better prepared for big company interviews.

This article was originally published on the Hiring For Tech website. If you want to read more content from me, please subscribe either by email or on LinkedIn. And please reshare with your networks so others can find out about Hiring For Tech!

Hiring For Tech

55,403 ä½å…³æ³¨è€…

è®¢é˜…

Max Hodges

CEO at Japan Rabbit and Blackship.com

4 å¹´

>?as long as you have one database. You would catch the constraint violation in your application and generate a new ID. This often isn't as straightforward as it sounds. You can have a single db but with concurrent connections, your id generator can generate the same id in a second call before the first call has finished. There are ways to handle this of course, with locking, and the table itself can absolutely prevent a duplicate, but you may still have to deal with the problem of duplicates being generated due to concurrency.

èµž

å›žå¤

1 æ¬¡å›žåº”

Jonathan Freeman

Experienced Software Engineering Leader

5 å¹´

Insightful stuff as usual. These problem sets are more common than most new engineers think. A blog or newsletter may only deal with a few hundred or thousand entries over its lifetime. Over the last 10 years, breaking â€œmaxintâ€ has become a big problem for our healthcare company and a design consideration. Youâ€™d be surprised how easy it is to get to 2.147 billion of things at scale :) If they chose database in this design example, hopefully they havenâ€™t used an INT for primary/identity key :)

èµž

å›žå¤

2 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Avik Dasçš„æ›´å¤šæ–‡ç«

Acing the system design interview

2022å¹´1æœˆ10æ—¥

Acing the system design interview

Itâ€™s been a while since I last wrote, but in the last year, Iâ€™ve done a lot of system design interviews. I really likeâ€¦

4 æ¡è¯„è®º
"It's not peaches and cream either for men"

2021å¹´11æœˆ29æ—¥

"It's not peaches and cream either for men"

I spend a lot of time talking about menâ€™s mental health because itâ€™s what I, as a man, know about. And like withâ€¦

1 æ¡è¯„è®º
It's okay to not be okay

2021å¹´11æœˆ2æ—¥

It's okay to not be okay

What Iâ€™m about to say applies to everybody, but with Movember and my own experience as a man in mind, I hope my wordsâ€¦

2 æ¡è¯„è®º
What's still wrong with tech hiring

2021å¹´2æœˆ15æ—¥

What's still wrong with tech hiring

Last year, I set out with a head full of disconnected thoughts about hiring and a vision to share those thoughts with aâ€¦

15 æ¡è¯„è®º
One size does not fit all

2021å¹´2æœˆ8æ—¥

One size does not fit all

Iâ€™ve talked about what seem to be two conflicting topics: standardizing your interviews and accommodating differentâ€¦

5 æ¡è¯„è®º
Formal interview training

2021å¹´2æœˆ1æ—¥

Formal interview training

A running theme in this newsletter is the idea that good software engineers donâ€™t automatically make good interviewers.â€¦

1 æ¡è¯„è®º
Interview apprenticeship

2021å¹´1æœˆ25æ—¥

Interview apprenticeship

Software engineers are well-positioned to evaluate a candidateâ€™s technical ability, but conducting an interviews isâ€¦

6 æ¡è¯„è®º
Interviewing and pattern matching

2021å¹´1æœˆ18æ—¥

Interviewing and pattern matching

For candidates, a full day of interviews is grueling, but in the context of demonstrating your technical skills and howâ€¦

3 æ¡è¯„è®º
Technical skills every software engineer interviewer should have

2021å¹´1æœˆ11æ—¥

Technical skills every software engineer interviewer should have

Thereâ€™s a lot of discussion about technical skills candidates need to have, like algorithms, systems design, technicalâ€¦
Prepare your story

2021å¹´1æœˆ4æ—¥

Prepare your story

If youâ€™re planning on starting or continuing your job hunt this year, the beginning of the year is a good time toâ€¦

3 æ¡è¯„è®º

See all articles

System design practice: distributed ID generation

Avik Das

Engineering Leader @ LinkedIn

The problem statement

Generating a short ID

Hiring For Tech

55,403 ä½å…³æ³¨è€…

Avik Dasçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Rethinking FinOps with Platform-Anchored Solutions: The Tessell Advantage

System Design Blue Print

Demystifying Software Architectures. A Comprehensive Guide for Golang Developers

System Design Terminologies

System Design : CAP, BASE , SOLID, KISS Concepts

Traditional Monolithic vs 3-Tier Architectures

Building a Highly Flexible Control Plane with Kubevela and Tofu-Controller: A Step-by-Step Guide

Tale of Software Architect(ure): Part 20 (The Conclusion - Important Techniques & Components for Scalable System)

Software Architecture Foundations: Building Secure and Reliable Systems

Distributed Design Pattern: Eventual Consistency with Vector?Clocks [Social Media Feed Updates Use?Case]

The problem statement

Generating a short ID

Hiring For Tech

55,403 ä½å…³æ³¨è€…

Avik Dasçš„æ›´å¤šæ–‡ç«

Acing the system design interview

"It's not peaches and cream either for men"

It's okay to not be okay

What's still wrong with tech hiring

One size does not fit all

Formal interview training

Interview apprenticeship

Interviewing and pattern matching

Technical skills every software engineer interviewer should have

Prepare your story

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Rethinking FinOps with Platform-Anchored Solutions: The Tessell Advantage

System Design Blue Print

Demystifying Software Architectures. A Comprehensive Guide for Golang Developers

System Design Terminologies

System Design : CAP, BASE , SOLID, KISS Concepts

Traditional Monolithic vs 3-Tier Architectures

Building a Highly Flexible Control Plane with Kubevela and Tofu-Controller: A Step-by-Step Guide

Tale of Software Architect(ure): Part 20 (The Conclusion - Important Techniques & Components for Scalable System)

Software Architecture Foundations: Building Secure and Reliable Systems

Distributed Design Pattern: Eventual Consistency with Vector?Clocks [Social Media Feed Updates Use?Case]

55,403 ä½å…³æ³¨è€…

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†