Developing Apps/Systems at Scale: Challenges in Large Engineering Organizations and Fast-Growing Startups
via Google Android Documentation

Developing Apps/Systems at Scale: Challenges in Large Engineering Organizations and Fast-Growing Startups

Recently, without any specific reason, I was asked how to build an app or a software system at scale. My initial response was to suggest using lambdas and autoscaling groups to accommodate the exponential growth of customers. However, I soon realized that the question was about building a system for large organizations, where hundreds of engineers work together in cohesion without interfering with each other's work. Having worked in a small startup for the past few years, I almost forgot about this challenge. It was only after the conversation ended that I recalled all the lessons learned from working in a big company with a great engineering culture.

So, I decided to write down the rules and best practices that I remember and turn my writing into this article to use it as a reference for myself and, hopefully, help others as well. This article will mainly focus on Android app development at scale, with some references to iOS app development. However, most of the points discussed here are applicable to software development in general.


In short, the development of any app or system at scale typically relies on four main pillars:

Scalability: The ability of the system to handle increasing amounts of work or growth in its size without sacrificing performance. This involves designing right architectures, processes, and technologies.

Maintainability: The ease with which a system can be maintained and updated over time. This includes writing clean, modular, and well-documented code, as well as establishing clear processes for version control, code reviews, and documentation.

Reliability: Ensuring that the system operates correctly and consistently under various conditions. This involves implementing robust error handling, monitoring, and testing practices to detect and address issues before they impact users.

Flexibility: The ability of the system to adapt and evolve in response to changing requirements, technologies, and market conditions. This includes designing architectures and processes that are modular, extensible, and easily configurable.

When you think about these pillars, it's clear that picking the right mix of strategies, processes, architectures, and tools is super important for creating a system that can grow, adapt, and keep running smoothly. Here are the options, grouped into two categories: must-haves and nice-to-haves.

Must-Haves

1. Version Control:

You have to use version control and choose what best suits your organization: multiple repositories or a monorepo approach. Also, define a branching strategy to manage features, releases, and maintenance (hot fixes).

Git-Flow-Branches

For a typical development cycle, you should have at least a dedicated release branch for each deployed/published version, a dev branch where all developers can check out and merge their feature or task branches, and, of course, the main branch that contains all releases marked by tags.

2. Code Style:

Ensure your codebase looks consistent by setting the same default style for all engineers in their IDEs. A consistent code style is the first line of defense, followed by tools like Lint, SonarQube, and Qodana for Android, as well as SwiftLint for iOS. These tools perform static code analysis, with the main objective of detecting and resolving potential problems before the code is compiled or executed. Additionally, this will make your codebase standard and familiar to each developer, increasing the probability of catching bugs during review and reducing issues with the app.

Code-Style-Okey-Wrong

3. Best Practices:

Promote best practices across the organization by hosting workshops or masterclasses, tech talks, and demos. Make onboard documentation short but maintained and up to date (probably the best way to update it is by newly arrived teammates if they find that a process or best practice has changed).

4. Modules:

Break down the application into smaller, manageable components or modules. One way is to split your app into as many feature-dedicated modules as possible. This will enable different teams to work on their features without affecting or being affected by other features teams. Experiment with the best way to slice your app, but remember that there will always be tradeoffs. For example, some decisions might result in a smaller app but longer build times or vice versa.

One of the ways to break your app into modules

The Shim interface design pattern can help untangle navigation dependencies between different modules if your app includes dedicated screens for different modules.

5. Adopt Clean Architecture:

Ensure the code is maintainable, scalable, and testable by adopting Clean Architecture. Choose one of the architectural patterns like MVP, MVVM, MVI, VIPER, etc.

6. Code Ownership and Reviews:

Designate code owners for each module and code/feature scope, requiring their review and approval for any changes in their code. No code should be merged without review by several other developers, preferably those familiar with the codebase. Enforce code reviews before merging code changes. This not only improves the quality of code but also spreads knowledge among team members.

7. Unit Tests and Coverage:

Maintain high unit test coverage across all modules. Remember that 100% unit test coverage isn't the ultimate goal; instead, the focus should be on delivering a high-quality app. Certain aspects of a mobile app may not be easily testable.

8. UI Automation:

In cases where unit testing isn't feasible, UI automation can be a valuable tool for ensuring quality. The basic tools to help with this for Android are Espresso and XCTest for iOS.

9. Integration Tests and OpenAPI Specification (Swagger):

Consider using Swagger, a set of open-source tools built around the OpenAPI Specification, to help you design, build, document, and consume REST APIs. Use Postman or write integration tests for backend components and endpoints to validate and ensure they function correctly within the larger system. To save developers' time in identifying the source of issues when a previously functional part of the app suddenly stops working.

10. Continuous Integration and Deployment (CI/CD):

Use CI tools such as Jenkins, BuildKite, or GitHub Actions to automate building, testing, and reporting. Ensure every merge request runs through this CI pipeline to catch issues early. Also, no merge request into a dev branch should be merged manually! Streamline the release and deployment process by automating it to reduce human errors in both iOS and Android. However, have dedicated engineers on duty to resolve conflicts and monitor release status. They should be prepared to halt the release in case of a major issue, identify the root cause or owner of the issue and help to develop a hotfix, and release it as soon as possible.

CI/CD Pipeline

The CI/CD pipeline works best in combination with manual code review. Here's an example of a proven workflow: Write code in the IDE, make it functional and run it, execute unit tests, create a pull request, and once the PR passes static analysis validations, request a design review followed by a code review. Ideally, incorporate manual or automated QA in this process. If your code fails at any step, go back to the first step, fix the issue, and repeat the steps. Once everything is okay, deploy to production.

11. Monitor App Stores and Performance Metrics:

Assign dedicated team members to monitor user reviews in Google Play and the Apple Store, promptly addressing relevant issues. Investigate crashes and ANRs (Application Not Responding), identify the responsible teams, and ensure that they address and resolve the issues.

12. Communicate Changes:

Keep stakeholders and other developers informed about changes, ideally before implementation begins. Announce timelines, consider who might be affected, and reach out to them proactively.

13. Architecture Decisions:

Discuss architectural decisions with a broad team, including architects, graphic designers, QA and project managers, to ensure everyone agrees on the changes and is aware of the timelines.

14. Developer Testing:

Require developers to test their work, not only passing build and the unit tests but also running the app and manually testing new features, as well as any potentially affected screen, page or feature.


Nice-to-Haves

15. Feature Toggles/Flags:

Implement feature toggles to test new features and easily disable them if issues arise, without the need for a release the update.

16. In-House A/B Testing:

Implement in-house A/B testing for greater flexibility and control over experiments.

17. Bug Reporting Tools:

Include a bug/issue reporting tool in your app, accessible to all employees and customers, allowing them to easily report issues or crashes.

18. Cross-Team Training and Collaboration:

Encourage developers from different teams, skills, or platforms to work together (temporarily exchange teams) for a few weeks to learn other technologies and better understand the entire codebase, process, flow and product itself. This can improve overall quality and enhance engineering competence across the organization.


This is obviously not everything, and each of those 18 rules can be expanded into a dedicated article, but referencing them is a good starting point.


Implementing these strategies, with careful consideration of the organization’s specific needs and goals, can significantly improve scalability, performance, and maintainability of the app. In other words, it will enable your teams to fix bugs, add new features, and test and maintain the app more easily while by avoiding even a few of them can slow the process and even turn it into the state when make any change is almost impossible.

While most of these guidelines might seem obvious to engineers and managers with experience in large organizations with strong engineering cultures, I hope they prove helpful to less experienced developers looking to improve their skills and understanding of the process in a scalable environment.

Version Control and Branching

https://www.atlassian.com/agile/software-development/branching

Code Style

https://developer.android.com/kotlin/style-guide

https://github.com/kodecocodes/swift-style-guide

Modularization

https://developer.android.com/topic/modularization

https://developer.apple.com/documentation/xcode/organizing-your-code-with-local-packages

https://github.com/android/nowinandroid/blob/main/docs/ModularizationLearningJourney.md

https://www.aboutwayfair.com/tech-blog/app-modularization-at-wayfair-how-we-unlocked-our-code-and-android-and-ios-teams-at-scale

Source Code Analyzers

https://developer.android.com/studio/write/lint

https://github.com/realm/SwiftLint

https://www.sonarsource.com/products/sonarqube/

https://blog.jetbrains.com/qodana/2024/03/what-is-static-code-analysis/

UI Automation

https://developer.android.com/training/testing/espresso

https://developer.apple.com/documentation/xctest/

Tools for API Development and Verification

https://swagger.io

https://www.postman.com

CI/CD Tools

https://docs.github.com/en/actions

https://buildkite.com

https://www.browserstack.com/guide/jenkins-for-test-automation

https://fastlane.tools

Sergey Neskoromny

Android and iOS Expert, Software Architect, Engineer, and LLM Applications Enthusiast.

5 个月
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了