When Publicity Gets in the Way of Scalability: Dreamport Case
Youtube screenshot

When Publicity Gets in the Way of Scalability: Dreamport Case

There’s no such thing as bad publicity. Right? However, in the case of apps, if the publicity has been unplanned for, it can be challenging to make sure the system can receive and process the influx of new users. When several Indian YouTubers independently listed Dreamport as a great way to start an individual travel business working from home, we were not ready for the million-view buzz it generated. Here’s what we learned from the experience and how we handled it.?

But first – what is Dreamport??

Dreamport is an online platform, first launched in India and Uzbekistan in 2023, that allows anyone around the globe to earn as an Independent Travel Manager (ITM) from the location of their choice. It features free and automated engagement, assessment, training, examination, and onboarding operations.?

Dreamport screenshot

Thanks to this high level of automation, we’ve been able to scale the number of our ITMs to over 2000, all of whom enjoy the benefits of working from anywhere, free training in travel, sales, and communication, as well as flexible hours and performance-based income.?

Dreamport application process

So, what happened??

During Dreamport’s launch, we used pre-existing user login and applicant profile management systems as ready-made in-house solutions for user management and authentication. However, these legacy systems were originally built for internal user needs only, and their capabilities could not keep up with the scaling requirements as the app’s popularity grew.?

HRM system view

And then, unbeknownst to us, several content creators had found Dreamport’s publicity materials and started including them and mentioning Dreamport in their videos about making money from home. Not only that – they started posting tutorials on how to basically speed-run the entire Dreamport training and assessment process.?

Youtube screenshot
Youtube screenshot

One after another, these videos popped up on YouTube and other social media, exceeding more than a million views combined! Had it been something we had planned, we would have been better prepared; however, the sudden load due to the virality of the videos was too much for our systems to handle. After the Youtube videos, we expereinced an 8-fold increase in registrations per week. What's interesting, though, is that the people coming from the Youtube videos were on average 20-30% more active in progressing through our automated steps, meaning we were having a higher percentage of registrations getting through to contracting?

Youtube screenshot

Why does it matter??

The biggest challenge the impact of these videos revealed for us is scaling, since Dyninno group plans on launching Dreamport in several new markets in 2024, thus increasing the expected load on the system five times. It was clear we could not proceed the way we had started. With the videos, we basically ran into a three-pronged problem that we solved with a three-pronged approach.?

Limitations:?

  1. All registrations went through our in-house build authentication system, which can only support limited loads.?
  2. In times of great demand, account creation requests had to be queued.?
  3. This, in turn, led to delayed credentials for applicants.?

Solution:?

  1. Social Login: With social login, applicants can directly sign up through their social (Facebook and Google) accounts. After the social login, only applicants who have passed assessment get to have our in-house legacy system account. After sign up, applicants can directly move to their personal space without needing to wait for credential emails.?
  2. Auto Deactivation: Termination of system users for applicants who are not progressing or are stuck at different stages in the training or assessment system.?
  3. Multiple Tech Initiatives.?

Dreamport login page

What did we do on the tech side??

We deployed multiple Redis queues to manage the load and ensure smoother working systems. Queues helped us separate the load for Dreamport’s critical functionalities: user creation, user deletion, creating training account. We also added specific delays to each request to give our systems more time to handle them. This ensured we are not overwhelmed with requests without impacting the user experience simultaneously.?

In addition to that, we implemented an interesting data structure called Bloom filter to minimize the impact on our in-house legacy system. Up until recently, we were dependent on the system for user creation and authentication. But we needed to relieve the load on the legacy system as much as possible, as anything going wrong with that could have bigger group-wide consequences, and we didn’t want to take our chances.?

Bloom filter

Bloom filter is a probabilistic data structure that is based on hashing, and it is extremely space efficient. It can be used to test whether an element is part of a set. So, when you initialize the Bloom filter, you have an array of bits and give them several hash functions. For every value you pass through the filter, it calculates the hash value of all the hash functions that you provided, and it inserts the bit in the corresponding storage space that we have. We store our in-house legacy system's IDs in Bloom filters, and, based on whether it's in the set or not, we decide whether to make the call to the database and the system. Its ability to reduce the request load coming into our in-house system was nothing short of remarkable. We found it extremely fast and storage efficient as compared to other possible storage spaces like mem database and mem cache. It also enabled us to reduce the load by more than 50-60% on the legacy system.?

Monitoring - the cornerstone of prevention?

When you are working with the public, and your product or service is public-facing, making sure your system is working and monitoring its status is extremely important, as it not only shows you its current status but also helps foresee problems and prevent them.?

So we ran all our critical system metrics via Prometheus to Grafana for real-time monitoring that also helps us to identify areas of improvement as well as debug any potential issues. We also use Sentry to monitor our Cron jobs for duration, error occurrence, etc. In addition to that, we have configured various alerts for each cron to get notified whenever there is an error in cron or cron run skipped so that immediate action can be taken if required.?

No cheating?

Since the image of our group is at stake, we must make sure that the people contacting clients in our name are up-to-par. Naturally, the videos posted on YouTube on how to pass the process quickly hindered the level of quality we are aiming for. So we implemented third-party solutions such as Testgorilla and Mindtickle in our assessment and training phases, as well as integrated an AI for certification and deactivation processes, based on the individual’s performance.?

Futureproofing our systems?

We aim to maximize request throughput, ensuring the system can handle as many requests as possible. To achieve this, several infrastructure-level improvements have been implemented, such as increasing the number of available pods to manage the load, transitioning Redis to persistent storage, and upgrading the database instance types. At the application level, various caching mechanisms have been introduced to minimize effort and time spent on repeated tasks. This allows many requests to be served directly from the cache, which is significantly faster than accessing the database or calling third-party APIs. Additionally, we've optimized inter-service communication by using Redis streams, enhancing reliability and reducing errors.?

Another step towards serving more user requests and providing a smoother user experience with minimal latency involves offloading heavy tasks and various background jobs to different services, such as back-office and Mindtickle internal services for reminders, rejections, evaluations, etc. This frees up resources on user-facing services for handling user requests. We also collaborated with third-party vendors to address the limitations of their existing APIs in handling sudden load surges. Together, we developed alternative solutions that process incoming user data in batches, which is a more efficient strategy than processing all data at once.?

What's next for Dreamport??

Having successfully resolved the scalability issues and made our systems sturdier and capable of withstanding virality induced popularity, our main initiatives for 2024 are:?

1. Automating the contracting process with our ITMs to eliminating the load on our internal teams and allowing applicants to do it on their own through their Dreamport profile;?

2. Streamlining online training, pre-live training and live training processes to squash long waiting times for applicants to get into timeslots available for those activities?

3. Extending Looker data availability to make better business decisions for Dreamport's platform operations and identify possible improvements.? ?

4. Improving traceability of issues and problems between various services and user requests using distributed tracing.?

?

Our stack? Frontend: Next.js, React, Material UI.? Backend: Nest.js, MariaDB, Redis, Elasticsearch, Docker.? DevOps: Argo CD, Grafana, AWS, Kubernetes, Prometheus, Kibana.?

?

About:?

Trevolution Group , which operates ASAP Tickets , SkyLux Travel , DreamPort , Oojo.com , Triplicity and other travel brands, has established itself as the market leader in the travel business, specializing in the visiting friends and relatives' segment. Over 840,000 unique airline tickets and vacation packages were sold by the companies under the Trevolution Group brand in 2023, making it the fourth-largest travel consolidator in the US.?

Mardiyat Muhammed

Teacher @ Arise and Shine School | Writing, Rapid Learning, Research Skills

3 个月

I couldn't complete my training at dreamport because my account got deactivated on Friday. My account was initially suspended due to suspected suspicious activity, followed by a SIM card issue that prevented access to authentication codes. These challenges, coupled with financial constraints affecting hardware acquisition, significantly impacted my ability to complete the training on time. Even though I have the equipment and am ready to work now, Its sad that I am unable to. I understand the importance of timely completion and I am truly disappointed to have fallen short of the program's expectations. I am heartbroken because this opportunity to become truly means a lot to me. I really wish to be given a second chance. I've sent mails pleading for 24 hours extension to finalized the training and join the live session but got no response. I understand that the dreamport team are always busy hence the delay in reply. I also understand that my inability to complete the training on time, must have given off the impression of unseriousness. The unforeseen circumstances that prevented me from completing the training on time do not reflect my aptitude for the program

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了