My CodeDay Labs Internship: An Open Source Journey with Zitadel

My CodeDay Labs Internship: An Open Source Journey with Zitadel

1. About the Project

This summer I participated in an eight-week software engineering internship with CodeDay Labs in collaboration with CTI Accelerate. During this internship, I had the opportunity to contribute to an open-source project called ZITADEL. ZITADEL is an identity and access management (IAM) solution designed to meet the needs of modern applications, whether they are running on public, private, or hybrid cloud environments.

ZITADEL helps developers and organizations manage who can access their applications and services securely. In today’s digital landscape, where data breaches and unauthorized access are major concerns, having a reliable IAM solution like ZITADEL is crucial. It offers features like Single Sign-On (SSO), which allows users to access multiple applications with one set of login credentials, and multi-factor authentication, adding an extra layer of security by requiring more than just a password to log in.

The typical users of ZITADEL are developers and IT teams who need to manage user identities and control access to their applications. For example, imagine a company that offers a suite of online tools to its clients. With ZITADEL, the company can ensure that each client’s employees can securely access the tools they need, without having to manage separate logins for each tool. If one employee leaves the company, their access can be revoked across all tools instantly, helping to keep the company’s data secure.

2. The Issue

We addressed two issues within the ZITADEL project that enhanced its usability and functionality.?

The first issue was Issue #8129 (Add tooltip to indicator of the inherit button on "Feature Settings"). Within the "Feature Settings" of a ZITADEL instance, most features are set to inherit their values from programmatic defaults. However, it was unclear what these default values were, leading to confusion. Additionally, there was a red bubble next to the "Inherit" button, but users were uncertain whether this indicated that the default value was set to false or if it was just decorative.

To resolve this, we added a tooltip to the "Inherit" button, making the feature more intuitive and helping users understand its function without needing to consult documentation. This improvement enhances the overall user experience, making the platform easier to navigate. You can view our solution in PR #8238.?

Next, we tackled Issue #7966 ([cli/mirror] Allow file as destination and source). Previously, the mirror command was limited to database-to-database migrations, which restricted its flexibility.?

By extending the command to support file-based migrations, we significantly increased the versatility of the ZITADEL CLI. This enhancement allows users to move data between files and databases seamlessly, accommodating a wider range of use cases and making the CLI more adaptable to various environments. The solution for this issue is detailed in PR #8431.

3. Codebase Overview

Tech Stacks:

System Diagram:

(Source:

The diagram above illustrates the overall architecture of the ZITADEL system. It includes key components such as the GUI, HTTP server, various APIs, and the ZITADEL core, which contains the command and query handlers, event store, and projection spooler. These components interact within a CockroachDB cluster to manage identity and access management tasks.

Zitadel Mirror Command:

Our project involves enhancing the mirror command to handle data migration between databases and files. The diagram below shows the process of the mirror command. First, we define the source and destination databases. The command then systematically copies tables from one database to another, re-computing projections as needed. A verification step ensures that the migration is successful by comparing the number of entries in both databases.

Figure: Zitadel Mirror Command (Credit: Xiaoxuan Wang)

Workflow: Handling Mirror Command with Files

Let’s walk through the workflow of the mirror command in ZITADEL with the new file mirroring feature:

  1. User Action: The user initiates the mirror command via the CLI. They specify the source and destination, which could either be databases or file paths.
  2. System Processes: If the source or destination is a file, the system uses the isSrcFile and isDestFile flags to determine the type and path to the files. The process begins by initializing the necessary variables and performing error checks.
  3. Data Transfer: The mirror command calls the respective functions, such as copyAssetsToFile or copyAssetsFromFile, depending on the source and destination types. Data is systematically copied between the database and files. In the event of transferring data from one database to another, the system may recompute projections.
  4. Verification: Once the data transfer is complete, the system performs a verification step. This step ensures that the number of entries in the source matches the number of entries in the destination, guaranteeing data integrity.
  5. Result: The process concludes with a successful migration, either between two databases or between a database and files. The user receives confirmation that the data has been successfully mirrored and verified.

4. Challenges

One of the technical challenges we encountered was effectively adapting the mirror command to support file and database migrations without introducing unnecessary complexity. Initially, our proposed solution involved adding three new flags: --to-files, --from-files, and --path-to-dir. These flags would manage whether the mirroring process should operate on files or databases. However, we were uncertain if this was the best approach.

First attempt:

Our first attempt involved adding the --to-files, --from-files, and --path-to-dir flags. The idea was to let users explicitly specify whether they wanted to mirror data to or from files. This solution seemed straightforward, but we were concerned that it might overcomplicate the command-line interface and make the codebase harder to maintain.

Second attempt:

We reached out to the project maintainer for guidance on whether our flag-based approach was appropriate. The maintainer advised against adding new flags and pointed out that the system could inherently detect whether the source or destination was a file based on its type. This feedback was pivotal, as it helped us pivot away from a potentially cumbersome solution.

The picture below shows the destFile.yaml configuration, where the destination is specified as a file. This configuration is a part of our refined approach, where we no longer need additional flags.

Third attempt:

Based on the maintainer's feedback, we revised our approach. We introduced global variables (isSrcFile, isDestFile, and filePath) to determine if the source or destination was a file. This approach simplified the command structure and reduced potential errors by eliminating the need for additional flags. It also streamlined the code by enabling direct checks within the mirroring functions.

Here is a code snippet of the revised mirror command implementation:

In the code snippet above from line 3-9, we first check if the source is a file (isSrcFile). If it is, we only connect to the destination database since we are reading from a file. We then call the copyUniqueConstraintsFromFile and? copyEventsFromFile functions to handle the actual copying process from the file to the database.

On line 10-16, we handle the scenario where the destination is a file (isDestFile). We connect to the source database, read the data, and then write it to the appropriate files by calling copyEventsToFile and copyUniqueConstraintsToFile.

On line 17-27, this default case is for handling the traditional database-to-database migration. We connect to both the source and destination databases and call copyEventsDB and copyUniqueConstraintsDB to manage the data transfer between the two databases.

This terminal screenshot below shows the process of running the mirror command to import data from CSV files back into the PostgreSQL database, confirming the successful import with logs of data being copied back into tables.

Overall, This structure allows the copyEventstore function to flexibly handle different types of sources and destinations (either databases or files) without requiring complex flag management.

5. Solution?

As I have shown above, our final solution was to enable the mirror command to handle migrations between databases and files. We updated multiple files, including mirror.go, auth.go, config.go, event_store.go, system.go, and verify.go, to implement this functionality.

We defined the configuration files (destFile.yaml and srcFile.yaml) to specify whether the source or destination was a database or a set of files. For example, in destFile.yaml, the destination is configured as a local directory to store the database content as CSV files. The command now supports exporting data from tables like system.assets, system.encryption_keys, and others to CSV files, as well as re-importing this data back into a database.

The chart below illustrates the Reader/Writer Pipe mechanism, showing the process flow for copying data from files to databases (CopyFromFile) and from databases to files (CopyToFile). This mechanism ensures efficient data transfer while maintaining flexibility in the source and destination types.


Below is an example of the content in one of the CSV files generated by the export. This confirms that the data has been correctly migrated from the database to the file.

We also ensured that the reverse process—importing data back into the database from CSV files—was seamless. The terminal output confirms that data was successfully migrated from the files back into the database.


By doing so, we solved the problem of making the mirror command more flexible and user-friendly. Now, users can simply specify their source and destination, and the system handles whether it needs to read/write to a database or a file, eliminating the need for users to manage this manually.

Testing:

To ensure that our solution worked as intended, we conducted comprehensive testing across multiple scenarios:

  1. File to Database Migration: We verified that data stored in CSV files could be accurately migrated into the appropriate database tables. This was done by running the mirror command with a file as the source and a database as the destination. After migration, we checked the database to ensure that all records were correctly inserted.
  2. Database to File Migration: We tested the reverse scenario, where data from the database was exported to CSV files. Post-migration, we reviewed the files to confirm that the data was accurately captured.
  3. Database to Database Migration: To ensure that the original functionality of the mirror command was unaffected, we performed database-to-database migrations, confirming that the process remained seamless and efficient.
  4. Verification Step: A key part of our solution was the verification step, which compared the number of entries between the source and destination after migration. This step ensured data integrity, and we consistently found that the entries matched, proving the accuracy of the migration process.

In the screenshot above, we see the successful execution of the mirror command, which confirms that the data, including unique constraints, has been successfully copied into the PostgreSQL database. The log outputs provide details about the number of records processed and the time taken for each operation.

Below is a code snippet from the updated mirror.go that illustrates how we determine whether the source and destination are files or databases and execute the appropriate actions:

This snippet shows the conditional logic used to determine if the data should be copied to or from files based on the configuration provided. Depending on the setup, it executes the appropriate functions for file-to-database, database-to-file, or database-to-database migrations.

6. Conclusion

After implementing and rigorously testing our solution, our first pull request was successfully approved and merged into the main project. We are now awaiting the approval of our second pull request. This summer has been a fantastic learning experience, giving me valuable insights and chances to grow while working on real-world challenges in software development.

I’m incredibly thankful for the opportunity to work with CodeDay Labs this summer. Huge thanks to Tyler Menezes, Utsab Saha, and the Computing Talent Initiative for giving me this awesome opportunity. Big appreciation to Lalla Sankara for all the guidance and support. And a special shoutout to my amazing teammates, Andy Vo and Xiaoxuan Wang—collaborating with you both was a blast! I’m excited about what the future holds and look forward to continuing my contributions to Zitadel and other open-source projects.

Thanks for reading!


Related Links:

Issue 1:

https://github.com/zitadel/zitadel/issues/8129?

PR 1:

https://github.com/zitadel/zitadel/pull/8238?

Issue 2:

https://github.com/zitadel/zitadel/issues/7966?

PR 2:

https://github.com/zitadel/zitadel/pull/8431?

Presentation Slide:?

(Overview of our project and solutions)

https://docs.google.com/presentation/d/10TFRSlJ-Z6jQ6jh2K-L6MmbDnHcux2rvUqvhzDflzeY/pub?start=false&loop=false&delayms=5000?


Durley Galvan Jiménez

Latina l Junior data engineer @Inchcape | Computer Science | Python | Strong solving problems l AI enthusiastic learner

3 个月

Well done Ting!

Lalla Sankara

Solutions Engineer at Capital Group | Data and Analytics | Information Technology | ServiceNow |

3 个月

Well done Ting! It was an honor being your mentor this summer!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了