登录查看更多内容

System Design for URL Shortener

Rahul Arram

Data Engineering @Tesco | 6+ YOE | Polyglot Engineer | Ex- Wayfair, Turtlemint, Manhattan Associates, Hopscotch | Spark, Kafka, Airflow, Python, Superset

发布日期: 2020年6月16日

+ 关注

Pre-check :-

Long Url - 2kb (2048 chars)
Short Url - 17 bytes( 17 chars)
Created-at timestamp - 7 Bytes
Expired - 7 Bytes

Total:- 2.03 kb

If we assume there are 30M Users/month then atmost 60.7 GB/month ==> 0.7 TB/month ==> 3.6 TB/5year data will be needed.

Note- Url's will be created for a short duration.

The main idea is to create a 7 character id. e.g., www.shortto.com/ABC12qb

Approach 1:

Using Base 62 Algorithm:

Why Base(62)? -- 62^7--3.5 trillion && 10^7-- 10 Million

And also we can use a variety of character sets mentioned below.

A-Z. - 26
a-z. - 26
0-9. - 10

Worried about base 62 conversion code?

Look at this:

Looking at which DB to use:-

RDBMS-

ACID

NoSQL-

Eventual
Highly Available
Scalable
Consistency

If we have n App Servers A1, A2, A3 respectively. The long Url's goes as an input to these App servers and random numbers of length 7 are being generated then finally these random values are converted to Base(62) code and attached to servers in key-value pair fashion.

All these data are being stored in DB and short-URL's will be generated.

What if multiple long URL's generated the same short URL --- >worst case.

Solution- Trigger the DB action if absent unlike NoSQL doesn't have this option.

For High scalability, we need to find an even better approach.

Approach 2:

Md5 Hashing Algorithm:

Md5 algorithm generates a hash code of 17 characters. In this case, we will pick the first 7 chars only. What if both hash codes are the same. Then there is a chance of collision

Approach 3-

Counter Based Approach:

Maintaining a counter server and increment when a particular app server is working on.

What is the counter server goes down though we can add multiple counters for every one million? what if it reaches the maximum limit.

We need to maintain some admin server to keep track of all these operations so finally, it will increase the cost.

**This also finally gives us a Damn shitt.

Final Approach:

Zookeeper service:-

ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.

Features-

Distributed Configuration Management
Self Election / consensus building
Coordination and locks
Key-value store.

Each Node is called a zNode.
Each zNode has a path.
zNode can be persistent and empheremal
Each zNode can store data
Each zNode can be watched for changes

The user requests from the client-side will be triggering the database via Load Balancer and respective services.

Zookeeper maintains the mapping of long URLs for every 1 million range since there is a provision of 3.5 trillion of free data.

As soon as servers are added to zookeper it allocates the range to them. There will be 10 billion ordered buckets.

We can also maintain a cache for faster retrieval of data. The final key-value pairs are stored in the DB.

We can use the two functions included in the services.

CreateLongUrl()
GetshortUrl()

Note-Redis can be used for caching

If eventually, some server goes down only that particular range will get affected. Since zookeeper monitors active and failed key-value map pairs it will remove the range which is not needed.

Irrespective of the DB used there will be no issue of collision and gives us eventual consistency.

mohammad kia

Work for my mind

2 年

try this and send me your feedback pls https://shorturls.link/ ??

要查看或添加评论，请登录

Rahul Arram的更多文章

System Design for Google Docs

2020年6月17日

System Design for Google Docs

Collaborative editing is a process of writing and editing documents or projects by more than one person. Collaborative…

System Design for URL Shortener

Rahul Arram

Data Engineering @Tesco | 6+ YOE | Polyglot Engineer | Ex- Wayfair, Turtlemint, Manhattan Associates, Hopscotch | Spark, Kafka, Airflow, Python, Superset

Rahul Arram的更多文章

社区洞察

其他会员也浏览了

Deep Dive into Persist in Apache Spark

Harnessing the Power of Elasticsearch: boosting your search capabilities

What is Elasticsearch? Creating Google-Like Search Capabilities

Expedite Apache Spark Queries with Bloom Filter Indexing

What is Elasticsearch?

Advance Indexing with Couchbase and Node.js

Why Schema-less JSON Databases Are So Useful

Accelerating Spark: Databricks Photon Runtime

Spark Performance Tuning: Spill

Rahul Arram的更多文章

System Design for Google Docs

社区洞察

其他会员也浏览了

Deep Dive into Persist in Apache Spark

Harnessing the Power of Elasticsearch: boosting your search capabilities

What is Elasticsearch? Creating Google-Like Search Capabilities

Expedite Apache Spark Queries with Bloom Filter Indexing

What is Elasticsearch?

Advance Indexing with Couchbase and Node.js

Why Schema-less JSON Databases Are So Useful

Accelerating Spark: Databricks Photon Runtime

Spark Performance Tuning: Spill