System Design for URL Shortener

Pre-check :-

  1. Long Url - 2kb (2048 chars)
  2. Short Url - 17 bytes( 17 chars)
  3. Created-at timestamp - 7 Bytes
  4. Expired - 7 Bytes

Total:- 2.03 kb

If we assume there are 30M Users/month then atmost 60.7 GB/month ==> 0.7 TB/month ==> 3.6 TB/5year data will be needed.

Note- Url's will be created for a short duration.

The main idea is to create a 7 character id. e.g., www.shortto.com/ABC12qb

Approach 1:

Using Base 62 Algorithm:

Why Base(62)? -- 62^7--3.5 trillion && 10^7-- 10 Million

And also we can use a variety of character sets mentioned below.

  • A-Z. - 26
  • a-z. - 26
  • 0-9. - 10

Worried about base 62 conversion code?

Look at this:

No alt text provided for this image

Looking at which DB to use:-

RDBMS-

  1. ACID

NoSQL-

  1. Eventual
  2. Highly Available
  3. Scalable
  4. Consistency

If we have n App Servers A1, A2, A3 respectively. The long Url's goes as an input to these App servers and random numbers of length 7 are being generated then finally these random values are converted to Base(62) code and attached to servers in key-value pair fashion.

All these data are being stored in DB and short-URL's will be generated.

What if multiple long URL's generated the same short URL --- >worst case.

Solution- Trigger the DB action if absent unlike NoSQL doesn't have this option.

For High scalability, we need to find an even better approach.

Approach 2:

Md5 Hashing Algorithm:

Md5 algorithm generates a hash code of 17 characters. In this case, we will pick the first 7 chars only. What if both hash codes are the same. Then there is a chance of collision

Approach 3-

Counter Based Approach:

Maintaining a counter server and increment when a particular app server is working on.

What is the counter server goes down though we can add multiple counters for every one million? what if it reaches the maximum limit.

We need to maintain some admin server to keep track of all these operations so finally, it will increase the cost.

**This also finally gives us a Damn shitt.

Final Approach:

Zookeeper service:-

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. 

Features-

  1. Distributed Configuration Management
  2. Self Election / consensus building
  3. Coordination and locks
  4. Key-value store.
No alt text provided for this image
No alt text provided for this image
  1. Each Node is called a zNode.
  2. Each zNode has a path.
  3. zNode can be persistent and empheremal
  4. Each zNode can store data
  5. Each zNode can be watched for changes

The user requests from the client-side will be triggering the database via Load Balancer and respective services.

Zookeeper maintains the mapping of long URLs for every 1 million range since there is a provision of 3.5 trillion of free data.

As soon as servers are added to zookeper it allocates the range to them. There will be 10 billion ordered buckets.

We can also maintain a cache for faster retrieval of data. The final key-value pairs are stored in the DB.

We can use the two functions included in the services.

  1. CreateLongUrl()
  2. GetshortUrl()

Note-Redis can be used for caching

If eventually, some server goes down only that particular range will get affected. Since zookeeper monitors active and failed key-value map pairs it will remove the range which is not needed.

Irrespective of the DB used there will be no issue of collision and gives us eventual consistency.



mohammad kia

Work for my mind

2 年

try this and send me your feedback pls https://shorturls.link/ ??

回复

要查看或添加评论,请登录

Rahul Arram的更多文章

  • System Design for Google Docs

    System Design for Google Docs

    Collaborative editing is a process of writing and editing documents or projects by more than one person. Collaborative…

社区洞察

其他会员也浏览了