Every organization today is aspiring to become data-driven! Organizations that are data-driven are 162% more likely to surpass revenue goals and 58% more likely to beat these goals compared to their non-data-driven counterparts.
Most organizations today are data-rich but information-poor.
This is because extracting insights today is bottlenecked by specialized data talent that is required to make data consistent, interpretable, accurate, timely, standardized, and sufficient. One of the key goals of data platform modernization (especially by leveraging the cloud) is to make data self-service for both technical users (data scientists, analysts) as well as business users (marketers, product managers).?
While there are 100s of tools and frameworks popping up in a rapidly evolving data landscape, architects and technology leaders find it extremely difficult to navigate the plethora of technologies that are all positioned as the "next silver bullet."
Teams often get attracted to “shiny new technologies” instead of finding the right fit blocks to make data self-service based on their current use-cases, process, technology, team skills, and data literacy.?
This post helps you understand the technology landscape. My approach is to view the technology landscape in the context of the data user’s journey map as they transform raw data into insights. This post (hopefully) becomes your starting point to correlate the needs and pain points of your data users to available technologies in the landscape.
Addressing the specific needs of your data users and making them self-service is your golden path to becoming data-driven. Unfortunately, there are no shortcuts or one-size-fits-all solutions!
The rest of the blog gets into the details of the landscape of available technologies to make data self-service.
Journey map from Raw Data to Insights to Impact
The journey map for any data insights can be divided into four key phases: discover, prep, build and operationalize?
One of the primary goals of data platform modernization is to minimize the Time to Insight (ToI) for both technical and business users.
ToI represents the time it takes to take raw data (e.g., customer support calls log) and generate insight (e.g., products with the highest call volume, least satisfaction, geographic distribution).? Minimizing ToI requires automating the various tasks within the journey map as engineering complexity today limits accessibility for data users. Even advanced technical users such as data scientists spend a significant amount of time on engineering activities related to aligning systems for data collection, defining metadata, wrangling data to feed ML algorithms, deploying pipelines and models at scale, and so on. These activities are outside of their core insight-extracting skills, and bottlenecked by dependency on data engineers and platform IT engineers who typically lack the necessary business context.?
Several enterprises have identified the need to automate and democratize the journey by making data self-service. Google’s TensorFlow Extended (TFX), Uber’s Michelangelo, Facebook’s FBLearner Flow, AirBnb’s Bighead, are examples of self-service Data+ML platforms.
None of these frameworks are silver bullets and cannot be applied as-is. The right solution for an organization depends on use-cases, type of data, existing technology building blocks, dataset quality, processes, data culture, and people skills.?
The journey map can be further broken into 12 milestones as shown.
Find: Discover the existing datasets along with the metadata details?
Aggregate: Collect new structured, semi-, or unstructured data from applications and third-party sources
Standardize: Re-use standardized metrics that provide a single source-of-truth across insights
Wrangle: Cleanse and transform the data for building reliable insights?
Govern: Ensure the usage and access to data is within compliance?
Model: Manage the global data namespace to effectively update and share???
Process: Analyze data across multiple data stores?
Visualize: Build dashboards and reports for visual analysis and information extraction??
Orchestrate: Setting up end-to-end transformation pipelines from raw data into insight
Deploy: Continuous integration and rollout of transformation pipelines
Observe: Proactively monitor the performance, cost, quality of data applications and pipelines?
Experiment: A/B testing to ensure insights lead to the right business impact??
Automating each milestone should be considered as crawl, walk, run.
It is important to treat self-service as having multiple levels, analogous to different levels of self-driving cars that vary in terms of the levels of human intervention required to operate them. For instance, a level-2 self-driving car accelerates, steers, and brakes by itself under driver supervision, while level 5 is fully automated and requires no human supervision.
The 2022 technology landscape is as shown. The rest of the post goes into details of each of the milestones.?
Discover Phase
1. Find datasets
As data grows and teams scale, silos are created across business lines, leading to no single source of truth. Data users today need to effectively navigate a sea of data resources of varying quality, complexity, relevance, and trustworthiness.??
The landscape of existing technologies focuses on data catalogs, metadata management, crawling of datasets. The technologies aim to automate the following key tasks within this milestone:
Locating new datasets: Locating datasets and artifacts across silos is currently an ad-hoc process; team knowledge in the form of cheat sheets, wikis, anecdotal experiences, and so on is used to get information about datasets and artifacts. ?
Tracking Lineage: For a given dataset, lineage includes all the dependent input tables, derived tables, and output models and dashboards. It includes the jobs that implement the transformation logic to derive the final output. For example, if a job J reads data‐ set D1 and produces dataset D2, then the lineage metadata for D1 contains D2 as one of its downstream datasets, and vice versa.?
Extracting technical metadata of datasets:? Technical metadata is extracted by crawling the individual data source without necessarily correlating across multiple sources. Technical Metadata includes logical and physical metadata: Physical metadata covers details related to physical layout and persistence, such as creation and modification timestamps, physical location and format, storage tiers, and retention details. Logical metadata includes dataset schema, data source details, the process of generating the dataset, and owners and users of the dataset.?
Tracking Operational metadata: Tracking execution stats that capture the completion times, data processed, and errors associated with the pipelines. It also covers the state of the dataset in terms of compliance, personally identifiable information (PII) data fields, data encryption requirements, and so on. Operational metadata is not generated by connecting to the data source but rather by stitching together the metadata state across multiple systems.?
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy.?Start with basic data and metadata catalog and add capabilities as shown in the figure.
2. Aggregate data
Datasets from applications and transaction sources are sitting across different application silos and need to be moved into a centralized repository like a data lake or data mesh. Moving data involves orchestrating the data movement across heterogeneous systems, verifying data correctness, and adapting to any schema or configuration changes that occur on the data source.?
The landscape of existing technologies focuses on EL (Extract-Load) of the traditional ELT process by simplifying CDC (Change Data Capture), periodic batch ingestion, or real-time streaming event ingestion. The technologies aim to automate the following key tasks within this milestone:
Ingesting data from data stores/apps: Data must be read from the source datastore and written to a target datastore/Data Lake/Data Mesh. A technology-specific adapter is required to read and write the data from and to the datastore.?
Managing source schema changes: After the initial configuration, changes to the schema and configuration can occur at the source and target datastores. The goal is to automatically manage schema changes such that the downstream analytics are not impacted by the change. In other words, we want to reuse existing queries against evolving schemas and avoid schema mismatch errors during querying. There are different kinds of schema changes, such as renaming columns; adding columns at the beginning, middle, or end of the table; removing columns; reordering columns, and changing column data types.?
Ensuring compliance for PII (Personally Identifiable Information): Before the data can be moved across systems, it must be verified for regulatory compliance. For example, if the source datastore is under regulatory compliance laws like PCI, the data movement must be documented with clear business justification. For data with PII attributes, these must be encrypted during transit and on the target datastore. Data rights laws further limit the data that can be moved from source datastores for analytics.?
Verifying data quality of ingested data: Data movement needs to ensure that source and target are in parity. In real-world deployments, quality errors can arise for a multitude of reasons, such as source errors, adapter failures, aggregation issues, and so on. Monitoring of data parity during movement is a must-have to ensure that data quality errors don’t go unnoticed and impact the correctness of business metrics and ML models. During data movement, data at the target may not exactly resemble the data at the source. The data at the target may be filtered, aggregated, or a transformed view of the source data. For instance, if the application data is sharded across multiple clusters, a single aggregated materialized view may be required on the target. Transformations need to be defined and verified before deploying in production.?
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy. Start with traditional batch ingestion and move to the higher level based on use-case needs and data model maturity.
3. Standardize Metrics
The goal is to provide a repository of well-documented, governed, versioned, and curated metrics. Data users should be able to search and use metrics to build dashboards and models with minimal data engineering.?
The landscape of existing technologies focuses on providing a Metrics Store. This is also similar to a Feature Store for standardizing features used in ML projects. The technologies aim to automate the following key tasks within this milestone:
Standardizing metrics computation: Pipelines extract the data from the source datastores and transform them into metrics. These pipelines have multiple transformations and need to handle corner cases that arise in production. Managing large datasets at scale requires distributed programming optimizations for scaling and performance.
Backfilling metrics: Whenever the business definition changes, a data backfill is required for calculating the new metrics values for historic data.?
Tracking business definitions: Version-controlled repository of business definitions to ensure there is a single source of truth. Instead of implementing one-off and inconsistent metrics, using a Domain Specific Language (DSL) to define metrics and dimensions. The DSL gets translated into an internal representation, which can then be compiled into SQL and other big data programming languages. This approach makes adding new metrics lightweight and self-service for a broad range of users.
Serving metrics offline and online: Metrics can be consumed offline (batch computed) or online (computed in real-time).
Cataloging business vocabulary: Organizing data objects and metrics in a business-intuitive hierarchy. Also, business rules are associated with generating the metrics from the raw datasets
Ensuring operational robustness: Handling scenarios such as uncoordinated source schema changes, changes in data element properties, ingestion issues, source and target systems with out-of-sync data, processing failures, incorrect business definitions for generating metrics. Automation tracks change to metrics over time and alerts metrics owners when things change. If the lineage or definition of metric changes, the owner is immediately notified and can see why and how this change occurred. If a spike or a dip in the data is present, the owner already knows it happened and has answers for stakeholders. No more tracking down anomalies in the data warehouse.
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy.?Starting with tracking business definitions and moving higher.
Prep Phase
4. Wrangle
Data is seldom in the exact form required for consumption – it needs to be transformed via an iterative process curating errors, outliers, missing values, imputing values, data imbalance, data encoding. Applying wrangling transformations requires writing idiosyncratic scripts in programming languages like Python, Perl, and R, or engaging in tedious manual editing using tools like Microsoft Excel. Given the growing volume, velocity, and variety of the data, the data users require low-level coding skills to apply the transformations at scale in an efficient, reliable, and recurring fashion. Also operating these transformations reliably on a day-to-day basis and proactively preventing transient issues from impacting data quality.??
The landscape of existing technologies focuses on simplifying the process of wrangling for technical and business users. The technologies aim to automate the following key tasks within this milestone:
Scoping: The metadata catalog is used to understand the properties of data and schema and the wrangling transformations required for analytic explorations. It is difficult for non-expert users to determine the required transformations. The process also involves record matching—i.e., finding the relationship between multiple datasets, even when those datasets do not share an identifier or when the identifier is not very reliable.
Validating: There are multiple dimensions of validation, including verifying whether the values of a data field adhere to syntactic constraints like Boolean true/false as opposed to 1/0. Distributional constraints verify value ranges for data attributes. Cross-attribute checks verify cross-database referential integrity—for example, a credit card updated in the customer database being correctly updated in the subscription billing database.
Structuring: Data comes in all shapes and sizes. There are different data formats that may not match the requirements for downstream analysis. For example, a customer shopping transaction log may have records with one or more items while individual records of the purchased items might be required for inventory analysis. Another example is standardizing particular attributes like zip codes, state names, and so on. Similarly, ML algorithms often do not consume data in raw form and typically require encoding, such as categories encoded using one-hot encoding.
Cleaning: There are different aspects of cleaning. The most common form is removing outliers, missing values, null values, and imbalanced data that can distort the generated insights. Cleaning requires knowledge about data quality and consistency—i.e., knowing how various data values might impact your final analysis. Another aspect is the deduplication of records within the dataset.
Enriching: This involves joining with other datasets, such as enriching customer profile data. For instance, agricultural firms may enrich production predictions with weather information forecasts. Another aspect is deriving new forms of data from the dataset.
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy. Starting with basic exploratory tools and then moving higher to interactive and ML-based recommendations.
5. Govern
There is a growing number of data rights regulations like GDPR, CCPA, Brazilian General Data Protection Act, India Personal Data Protection Bill, and several others. These laws require customer data to be collected, used, and deleted based on their preferences. Data scientists and other data users want an easy way to locate all the available data for a given use case without having to worry about compliance violations. Data engineers have to ensure they have located all the customer data copies correctly and execute the rights of users in a comprehensive, timely, and auditable fashion. In addition to compliance, ensuring the right level of access control for the datasets is required.
The landscape of existing technologies focuses on simplifying data discovery, classification, access control, and enforcement of data rights requests. They aim to automate the following tasks:
Enforcing data rights for data deletion: Delete personal data from backups and third parties when it’s no longer necessary or when consent has been withdrawn. You need the ability to delete a specific subset of data or all data associated with a specific customer from all systems. Given immutable storage formats, erasing data is difficult and requires an understanding of formats and namespace organization. The deletion operation has to cycle through the datasets asynchronously within the compliance SLA without affecting the performance of running jobs. Records that can’t be deleted need to be quarantined and the exception records need to be manually triaged. This processing needs to scale to tens of PBs of data as well as for third-party data.?
Enforcing customer preferences for the usage of data: Manage the customer’s preferences for data to be collected, behavioral data tracking, use cases for the use of data, Do not sell data preferences. There are different levels of access restrictions ranging from basic restriction (where access to a dataset is based on business need), to privacy by default (where data users shouldn’t get raw PII access by default), to consent-based access (where access to data attributes is only available if the user has consented for the particular use case).
Tracking the lifecycle of customer data:? This includes tracking how data is collected from the customer, how the data is stored and identified, how the customer preferences are persisted, how data is shared with third-party processors, and how data is transformed by different pipelines.?
Continuously discovering usage violations: For example, datasets containing PII or highly confidential data that are incorrectly accessible to specific data users or specific use cases. Also, discover datasets that have not been purged within the SLA, or requests from unauthorized roles.
Ensuring Authentication, Access control, and Audit tracking: Independent of the compliance requirements, ensuring data access is secure and verified is critical. Also creating audit rules.?
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy.
Data lakes have become popular as central data warehouses, they lack the support for traditional tasks for data modeling and life cycle management. Today, multiple workarounds need to be built and lead to several pain points. First, primitive data life cycle tasks have no automated APIs and require engineering expertise for reproducibility and rollback, provisioning data-serving layers, and so on. Second, application workarounds are required to accommodate the lack of consistency in the lake for concurrent read-write operations. Also, incremental updates, such as deleting a customer’s records for compliance, are highly nonoptimized. Third, unified data management combining stream and batch is not possible.?
Solutions in this space address these pain points and simplify the organization, sharing, and data management tasks. In particular, they automate the following tasks:
Organizing namespace zones: Within a data lake, zones allow the logical and/or physical separation of data. The namespace can be organized into many different zones based on the current workflows, data pipeline process, and dataset properties. A typical namespace configuration that is used by most enterprises in some shape and form to keep the lake secure, organized, and agile (as shown in the figure).?
Managing specialized data-serving layers: Data persisted in the lake can be structured, semi-structured, and unstructured. For semi-structured data, there are different data models such as key-value, graph, document, and so on. Depending on the data model, an appropriate datastore should be leveraged for optimal performance and scaling.?
Data rollback and versioning: Data pipelines write bad data for downstream consumers because of issues ranging from infrastructure instabilities to messy data to bugs in the pipeline. For pipelines with simple appends, rollbacks are addressed by date-based partitioning. When updates and deletes to previous records are involved, rollback becomes very complicated, requiring data engineers to deal with such complex scenarios. Versioning is required for exploration, model training, and resolving corruption due to failed jobs that have left data in an inconsistent state, resulting in a painful recovery process.?
Managing data partitions: Large tables are inefficient to analyze. Partitioning helps shard the data based on attributes such as time to allow distributed processing.
Incremental data updates:? Big data formats were originally designed for immutability. With the emergence of data rights compliance, where customers can request that their data be deleted, updating lake data has become a necessity. Because of the immutability of big data formats, deleting a record translates to reading all the remaining records and writing them in a new partition. Given the scale of big data, this can create significant overhead. A typical workaround today is to create fine-grained partitions to speed up the rewriting of data. Solutions such as Databricks Deltalake, Apache Hudi (Hadoop Upsert Delete and Incremental), Iceberg which enables applying mutations to data in HDFS on the order of a few minutes.??
ACID transactions: Implementing atomicity, consistency, isolation, durability (ACID) transactions on the data lake. Today, this is accomplished by painful workarounds.?
Data sharing:? Sharing datasets both internally and externally is an operational nightmare requiring specialized data teams.
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy. Namespace management within the data lake is the essential starting point for most organizations.
Build Phase
7. Process
Focusses on effectively executing SQL queries and Big Data programs at a large scale. There are three trends that need to be taken into account for the self-service processing of data. First is the polyglot data models associated with the datasets. For instance, graph data is best persisted and queried in a graph database. Similarly, there are other models, namely key-value, wide-column, document, and so on. Polyglot persistence is applicable both for lake data as well as application transactional data. Second, the decoupling of query engines from data storage persistence allows different query engines to run queries on data persisted in the lake. For instance, short, interactive queries are run on Presto clusters, whereas long-running batch processes are on Hive or Spark. Typically, multiple processing clusters are configured for different combinations of query workloads. Selecting the right cluster types is key. Third, for a growing number of use cases like real-time BI, the data in the lake is joined with the application sources in real-time. As insights generation becomes increasingly real-time, there is a need to combine historic data in the lake with real-time data in application datastores.?
The landscape of solutions focuses on specialized query engines, unifying batch and stream processing, processing structured, semi-, and unstructured data across polyglot stores. The solutions automating the following tasks related to this milestone:
Simplify running query processing (batch, interactive, streaming, real-time) engines:? Allows running the transformation logic as a batch, streaming, interactive depending on the requirements of the use case. This involves automating the routing of queries across clusters and query engines. The routing is based on tracking the static configuration properties (such as a number of cluster nodes and hardware configuration, namely CPU, disk, storage, and so on) as well as the dynamic load on the existing clusters (average wait time, distribution of query execution times, and so on).
Provide federated query support: Analyzing and joining data residing across different datastores in the lake as well as application microservices. Data is typically spread across a combination of relational databases, nonrelational datastores, and data lakes. Some data may be highly structured and stored in SQL databases or data warehouses. Other data may be stored in NoSQL engines, including key-value stores, graph databases, ledger databases, or time-series databases. Data may also reside in the data lake, stored in formats that may lack schema or that may involve nesting or multiple values (e.g., Parquet and JSON).?
Execute batch+streaming queries:? As insights are becoming real-time and predictive, they need to analyze both the ongoing stream of updates as well as historic data tables. Data users can access the combined streaming and batch data using existing queries using time-window functions. This allows for processing data continuously and incrementally as new data arrives without having to choose between batch or streaming. Streaming data ingest, batch historic backfill, and interactive queries need to be implemented. Another pattern is adding streaming events with batch tables, allowing data users to simply leverage the existing queries on the table.
Scaling the processing logic: Data users are not engineers. There is a learning curve to efficiently implementing the primitives (aggregates, filters, group by, etc.) in a scalable fashion across different systems. To increase productivity, there is a balance required between low-level and high-level business logic specifications. The low-level constructs are difficult to learn, while the high-level constructs need to be appropriately expressive.?
Search and analyze unstructured and semi-structured data: Support for log analytics involving searching, analyzing, and visualizing machine data.?
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy.
8. Visualize
Visualization is a key approach for analysis and decision-making, especially for business users. Visualization tools come with a few challenges that make it difficult to self-serve. First, visualization is difficult given multiple dimensions and growing scale. For large datasets, enabling rapid-linked selections like dynamic aggregate views is challenging. Second, different types of visualizations are best suited to different forms of structured, semi-structured, and unstructured data. Too much time is spent manipulating data just to get analysis and visualization tools to read it. Third, it is not easy for visualization tools to help reason with dirty, uncertain, or missing data. Automated methods can help identify anomalies, but determining the error is context-dependent and requires human judgment. While visualization tools can facilitate this process, analysts must often manually construct the necessary views to contextualize anomalies, requiring significant expertise.?
Available tools in the landscape aim to address these pain points by providing no-code features for data slice-&-dice, descriptive and statistical analysis, ML-based predictive analysis, and forecasting. They automate the following tasks related to this milestone:
Visual representation: Enabling data analysis with charts, graphs, histograms, and other visual representation
Visual storytelling: Using visual storytelling to share, communicate, and collaborate on insights in the flow of analysis.
Reporting: Sharing data analysis to stakeholders for decision-making.
Descriptive analytics: Using preliminary data analysis to find out what happened.
Statistical analysis: Exploring the data using statistics such as how this trend happened and why.
NLP-based summarization: Using natural language processing to match data fields and attributes and describe the contents in a data source. This can help teams understand what the data is telling them, reducing incorrect assumptions.
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy.
9. Orchestrate
Orchestrating job pipelines for data processing and ML has several pain points. First, defining and managing dependencies between the jobs is ad hoc and error-prone.
Data users need to specify these dependencies and version-control them through the life cycle of the pipeline evolution. Second, pipelines invoke services across ingestion, preparation, transformation, training, and deployment. Monitoring and debugging pipelines for correctness, robustness, and timeliness across these services are complex. Third, the orchestration of pipelines is multitenant, supporting multiple teams and business use cases. Orchestration is a balancing act in ensuring pipeline SLAs and efficient utilization of the underlying resources.?
The solutions in the technology landscape aim to simplify the dependency authoring of pipelines and make the execution self-healing. They aim to automate the following tasks associated with this milestone:
Authoring job dependencies: In an end-to-end pipeline, the job dependencies are represented as a DAG (Directed Acyclic Graph). Missing dependencies can lead to incorrect insights and is a significant challenge in production deployments. Tracking changes in dependencies with changes in code is difficult to version-control; while the dependent job may have completed, it may have failed to process the data correctly. In addition to knowing the dependent jobs, production deployments need ways to verify the correctness of the previous steps (i.e., they need circuit breakers based on data correctness). The job dependencies are not constant but evolve during the pipeline life cycle. For instance, a change in the dashboard may create dependencies on a new table that is being populated by another job. The dependency needs to be updated appropriately to reflect the dependency on the new job.
Orchestrating synchronous and asynchronous job types: Orchestrating a wide variety of specialized jobs, such as ingestion, real-time processing, ML constructs, and so on. Deep integration to service-specific APIs can improve job execution and monitoring compared to executing as a vanilla shell request.
Checkpointing job execution: For long-running jobs, checkpointing can help recover the jobs instead of restarting. Checkpointing can also help reuse previous results if the job is invoked without any change in data. Typically, if there are long-running jobs with strict SLAs, checkpointing is a must-have.
Scaling of resources: The hardware resources allocated to the orchestrator should be able to auto-scale based on the queue depth of the outstanding requests. This is typically applicable in environments with varying numbers and types of pipelines such that static cluster sizing is either not performant or wasteful with respect to resource allocation.
Automatic audit and backfill: Configuration changes associated with the pipeline orchestration, such as editing connections, editing variables, and toggling workflows, need to be saved to an audit store that can later be searched for debugging. For environments with evolving pipelines, a generic backfill feature will let data users create and easily manage backfills for any existing pipelines.?
Distributed execution: Jobs are executed on a distributed cluster of machines allocated to the orchestrator. The pipeline DAGs are continuously evaluated. Applicable jobs across multiple tenants are then queued up for execution and scheduled in a timely fashion to ensure SLAs. The orchestrator scales the underlying resources to match the execution needs. The orchestrator does the balancing act of ensuring pipeline SLAs, optimal resource utilization, and fairness in resource allocation across tenants. Distributed resource management is time-consuming thanks to a few challenges. First, ensuring isolation across multiple tenants such that a slowdown in one of the jobs doesn’t block other unrelated jobs on the same cluster. Second, as the number of pipelines increases, a single scheduler becomes the bottleneck, causing long wait times for the jobs to be executed. Having an approach to partition the jobs across parallel schedulers allows scaling across the available resources. Third, given the heterogeneous nature of the jobs, there’s a need to leverage a range of custom executors for data movement, schema services, processing, and ML tasks. In addition to resource management, job execution needs to handle appropriate retry for job execution errors, and jobs need to be recovered when failures occur at the crashed machines. Finally, the execution needs to fail over and continue execution with the appropriate leader election. Remembering the state of the pipeline for restart is critical.
Production monitoring and alerting: Upon deployment of the pipeline in production, it needs to be monitored to ensure SLAs as well as to proactively alert on issues. In production, several issues can arise, from job errors to underlying hardware problems. Detecting these proactively is critical to meeting SLAs. Trend analysis is used to uncover anomalies proactively, and fine-grained monitoring combined with logging can help distinguish between a long-running job and a stalled job that’s not making progress due to errors. Monitoring the pipeline orchestration in production is complex. Fine-grained monitoring is needed to distinguish between a long-running job and a stalled job that is not making progress. Debugging for root-cause analysis requires understanding and correlating logs and metadata across multiple systems.
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy. Focusing on simplifying dependency authoring and tracking is the key starting point.
Operationalize Phase
10. Deploy
Data applications and pipelines are continuously evolving to accommodate the changing data schema and business logic. The process to integrate, test, deploy, and post-deployment monitoring is often bottlenecked by the data platform team. One of the growing trends (known as “reverse ETL”) is to deploy the processed insights from the system of records like a warehouse to a system of actions like CRM, ERP, and other SaaS apps to operationalize data?
The tools in the technology landscape aim to automate the following tasks: ?
Testing ETL changes: Feature pipelines are written as ETL code that reads data from different data sources and transforms them into features. ETL code evolves continuously. Some of the common scenarios are moving to new versions of data processing frameworks like Spark, rewriting from Hive to Spark to improve performance, changes to the source schema, and so on. The ETL changes need to be validated for correctness using a comprehensive suite of unit, functional, regression, and integration tests. These tests ensure the pipeline code is robust and operates correctly for corner cases. As a first step in the integration process, unit tests and a golden test suite of integration tests are run. These are also referred to as smoke tests, as they compare the results of sample input-output data. Ideally, integration tests should use actual production data to test both robustness and performance. Often, scaling issues or inefficient implementations are undetected in production. Today, tests can be written as a part of the code or managed separately. Additionally, if the features are consumed for generating a business metrics dashboard, the data users need to verify the correctness of the results (this is known as user acceptance testing). The approach today is ad hoc, and the validation is typically done using small samples of data that aren’t representative of production data.
Validating schema change impact: Data source owners make changes to their source schema and typically do not coordinate with downstream ML pipeline users. These issues are typically detected in production and can have a significant impact. As a part of the change tracking, source schema changes need to be detected and trigger the continuous integration service to validate the effect of these changes proactively.?
Creating sandbox test environments: Spinning up multiple concurrent environments to smoke-test code changes to data applications and pipelines?
Supporting shadow mode deployment: This mode captures the inputs and inference of a new pipeline logic in production without actually serving the insights. The results can be analyzed with no significant consequences if a bug is detected.
Canary model deployment: Canary testing allows you to validate a new release with minimal risk by deploying it first for a fraction of your users. It requires mature deployment tooling, but it minimizes mistakes when they happen. The incoming requests can be split in many ways to determine whether they will be serviced by the old or new model: randomly, based on geolocation or specific user lists, and so on. There is a need for stickiness—i.e., for the duration of the test, designated users must be routed to servers running the new release. This can be achieved by setting a specific cookie for these users, allowing the web application to identify them and send their traffic to the proper servers.?
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy.
11. Observe
The goal of observability is to ensure the Big Data Applications and pipelines complete within performance SLAs, cost budgets, and generate reliable results. Data users aren’t engineers, which leads to several pain points for writing performant and cost-effective queries/programs. First, query engines like Hadoop, Spark, and Presto have a plethora of knobs. Understanding which knobs to tune and their impact is nontrivial for most data users and requires a deep understanding of the inner workings of the query engines. There are no silver bullets—the optimal knob values for the query vary based on data models, query types, cluster sizes, concurrent query load, and so on. Given the scale of data, a brute-force approach to experimenting with different knob values is not feasible either. Second, given the petabyte (PB) scale of data, writing queries optimized for dis‐ tributed data processing best practices is difficult for most data users. Often, data engineering teams have to rewrite the queries to run efficiently in production. Most query engines and datastores have specialized query primitives that are specific to their implementation; leveraging these capabilities requires a learning curve with a growing number of technologies. Third, query optimization is not a one-time activity but rather is ongoing based on the execution pattern. The query execution profile needs to be tuned based on the runtime properties in terms of partitioning, memory and CPU allocation, and so on. Query tuning is an iterative process with decreasing benefits after the initial few iterations targeting low-hanging optimizations.?
Another key aspect of observability is cost. With data democratization, where data users can self-serve the journey to extract insights, there is a risk of wasted resources and unbounded costs; data users often spin up resources, and without actively leveraging them, which leads to low utilization. A single bad query running on high-end GPUs can accumulate thousands of dollars in a matter of hours, typically to the surprise of the data users. Cost management provides the visibility and controls needed to manage and optimize costs. It answers questions such as dollars spent per application? Which teams are projected to spend more than their allocated budgets? Are there opportunities to reduce spending without impacting performance and availability? Are the allocated resources utilized appropriately?
Finally, several things can go wrong and lead to data quality issues: uncoordinated source schema changes, changes in data element properties, ingestion issues, source and target systems with out-of-sync data, processing failures, incorrect business definitions for generating metrics, and so on. Tracking quality in production pipelines is complex. First, there is no E2E unified and standardized tracking of data quality across multiple sources in the data pipeline. This results in a long delay in identifying and fixing data quality issues. Also, there is currently no standardized platform that requires teams to apply and manage their own hardware and software infrastructure to address the problem. Second, defining the quality checks and running them at scale requires a significant engineering effort. For instance, a personalization platform requires data quality validation of millions of records each day. Currently, data users rely on one-off checks that are not scalable with large volumes of data flowing across multiple systems. Third, it’s important not just to detect data quality issues, but also to avoid mixing low-quality data records with the rest of the dataset partitions.?
The tools in the technology landscape aim to address these pain points providing a single-pane-of-glass for observability. They aim to automate the following tasks related to observability:
Avoiding Cluster Clogs: Consider the scenario of a data user writing a complex query that joins tables with billions of rows on a nonindexed column value. While issuing the query, the data user may be unaware that this may take several hours or days to complete. Also, other SLA-sensitive query jobs can potentially be impacted. This scenario can occur during the exploration and production phases. Poorly written queries can clog the cluster and impact other production jobs.
Resolving Runtime Query Issues: An existing query may stop working and fail with out-of-memory (OOM) issues. A number of scenarios can arise at runtime, such as failures, stuck or runaway queries, SLA violations, changed configuration or data properties, or a rogue query clogging the cluster. There can be a range of issues to debug, such as container sizes, configuration settings, network issues, machine degradation, bad joins, bugs in the query logic, unoptimized data layout or file formats, and scheduler settings.?
Speeding Up applications: An increasing number of applications deployed in production rely on the performance of data queries. Optimizing these queries in production is critical for application performance and responsiveness for end-users. Also, the development of data products requires interactive ad-hoc queries during model creation, which can benefit from faster query runs during exploration phases.?
Automatically tune queries: For common scenarios, the ability to have the queries automatically tuned. Recommendations for improving the query based on the right primitives, cardinalities of tables, and other heuristics.
Monitoring Cost Usage: Cloud processing accounts are usually set up by data engineering and IT teams. A single processing account supports multiple different teams of data scientists, analysts, and users. The account hosts either shared services used by multiple teams (interleaving of requests) or dedicated services provisioned for apps with strict performance SLAs. Budgets are allocated to each team based on business needs. Data users within these teams are expected to be within their monthly budget and ensure the queries are delivering the appropriate cost-benefit. This presents multiple challenges. In a democratized platform, it is important for users to be also responsible for their allocated budgets and be able to make trade-off decisions between budget, business needs, and processing cost. Providing cost visibility to data users is not easy for shared services. Ideally, the user should be able to get the predicted cost of the processing or training at the time they issue their request. Resources spun up by teams are often not tagged, making accountability difficult. A lack of knowledge of the appropriate instance types, such as reserved versus on-demand versus spot-compute instances, can lead to significant money wasted.?
Continuous Cost Optimization: There are several big data services in the cloud that have different cost models. Data users perform two phases of cost optimizations. The first phase takes place at the time of designing the pipeline. Here, options are evaluated for available pay-as-you-go models that best match the workload and SLA requirements. The second phase happens on an ongoing basis, analyzing the utilization and continuously optimizing the configuration.?
Automating Quality observability: Analyze data attributes for anomalies, debug the root cause of detected quality issues, and proactively prevent low-quality data from impacting the insights in dashboards and models. These tasks can slow down the overall time to insight associated with the pipelines.
Data profiling and anomaly detection: Statistical analysis and assessment of data values within the dataset and pre-built algorithmic functions to identify events that do not conform to an expected pattern in a dataset (indicating a data quality problem). Assessment of a dataset’s accuracy made using absolute rules based on the data schema properties, value distributions, or business-specific logic.
Proactive problem avoidance: Measures to prevent low-quality data records from mixing with the rest of the dataset.?
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy. Observability has evolved in phases -- the initial solutions focussed on "what" is going on by tracking key metrics. The next generation focussed on "why" it is going on by correlating metrics and logs. With advancements in AI/ML, we currently have Observability 3.0 solutions available that combine the "what", "why", and "how-to" resolve the problem.
12. Experiment
A new data product or ML model is typically not rolled out across the entire population of users. A/B testing (also known as bucket testing, split testing, or controlled experiment) is becoming a standard approach for evaluating user satisfaction from a product change, a new feature, or any hypothesis related to product growth. It helps compare the performance of different versions of the same feature while monitoring a high-level metric like click-through rate (CTR), conversion rate, and so on.
In A/B testing, the incoming requests can be split in many ways to determine whether they will be serviced by the old or new model: randomly, based on geolocation or specific user lists, and so on. One of the key ingredients is collecting, analyzing, and aggregating behavioral data, known as clickstream data. Clickstream is a sequence of events that represent visitor actions within the application or website. It includes clicks, views, and related context, such as page load time, browser or device used by the visitor, and so on. Clickstream data is critical for business process insights like customer traffic analysis, marketing campaign management, market segmentation, sales funnel analysis, and so on. It also plays a key role in analyzing the product experience, understanding user intent, and personalizing the product experience for different customer segments. A/B testing uses clickstream data streams to compute business lifts or capture user feedback to new changes in the product or website.?
The existing tools in the technology landscape aim to make experimentation and customer behavior analysis turn-key. In particular, they automate the following tasks related to this milestone:
Instrumentation for gathering customer behavioral data: Standardizing creation of beacons across multiple libraries and collection frameworks. The beacons have to be updated constantly to accommodate third-party integrations, namely email marketing tools, experimentation tools, campaign tools, and so on. The tracking schema typically has inconsistent standards of event properties and attributes, resulting in dirty data.
Creating sessions of raw clickstream events: A session is a short-lived and interactive exchange between two or more devices and/or users—for instance, a user browsing and then exiting the website, or an IoT device periodically waking up to perform a job and then going back to sleep. The interactions result in a series of events that occur in sequence, with a start and an end. In web analytics, a session represents a user’s actions during one particular visit to the website. Using sessions enables the answering of questions about things like the most frequent paths to purchase, how users get to a specific page, when and why users leave, whether some acquisition funnels are more efficient than others, and so on. A start and an end of a session are difficult to determine and are often defined by a time period without a relevant event associated with it.
Identity stitching:? Customers today interact using multiple devices. They may start the website exploration on a desktop machine, continue on a mobile device, and make the buy decision using a different device. It is critical to know if this is the same customer or a different one. By tracking all the events in a single pipeline, the customer events can be correlated by matching IP addresses. Another example of a correlation is using cookie IDs when a customer opens an email, then having the cookie track the email address hashcode.?
Bot filtering: Filtering bot traffic from real user activity, especially for use cases predicting the engagement of users in response to product changes.
Summarization of events data over different timeframes: For use cases that vary in their requirements for individual event details versus general user activity trends over longer time periods.
Enriching customer events: To effectively extract insights, the clickstream events are enriched with additional contexts, such as user agent details like device type, browser type, and OS version. IP2Geo adds geolocations based on IP address by leveraging lookup services?
Executing experiments: The experiment is kicked off and users are assigned to the control or variant experience. Their interaction with each experience is measured, counted, and compared to determine how each performs. While the experiment is running, the analysis must answer two key questions: a) Is the experiment causing unacceptable harm to users? and b) Are there any data quality issues yielding untrustworthy experiment results?
Analyzing experiment results: The goal is to analyze the difference between the control and variants and determine whether there is a statistically significant difference. Such monitoring should continue throughout the experiment, checking for a variety of issues, including interactions with other concurrently running experiments. During the experiment, based on the analysis, actions can be suggested, such as stopping the experiment if harm is detected, looking at metric movements, or examining a specific user segment that behaves differently from others. Overall, the analysis needs to ascertain if the data from the experiment is trustworthy, and come to an understanding of why the treatment did better or worse than control. The next steps can be a ship or no-ship recommendation or a new hypothesis to test.?
Depending on your current state, look for solutions in the landscape that get you to the next higher level in the crawl, walk, run hierarchy. Start with basic customer behavior data collection and analysis before moving higher towards advanced experimentation and real-time optimization.
To summarize, you hopefully now have a better understanding of the data platform landscape compared to when you started reading this post! Looking forward to your comments and any upcoming companies for adding to the landscape!
Fantastic job Sandeep Uttamchandani, Ph.D. You have defined the areas really cleanly. This is so hard as many areas overlap. But this is a good perspective and I am thankful you mentioned Flyte.org
Jili 200 casino withdrawal
online slots games for real money
winhq.ph casino
Slots go casino Login
Philucky app download for android latest version
July 9 zodiac sign compatibility
Jili22 login download
Bonus 365 app download for android latest version
Jili lodi login
7 juli jarig
online casino games canada
91059 water tank
Golden empire jili online
peraplay.com login register
Jili 365 bet withdrawal fee
Franck Muller Crazy Hours replica
555 online casino
Ph646 ph login register
5 jili casino login register philippines app apk
Rehistro ng jili h1 download free
Okebet168 slot withdrawal
377 JILI casino Login registration
Anvil Fittings
Jili money coming cheat android
Phil lucky game apk
Jolibet php login password
Paano ka mananalo sa mga fruit slot download
slots 777 apk
Eternal Slots no deposit bonus free spins
Jiliasia online casino register
I met a pretty girl na taga is meaning
HB888 Casino Login
Global Games 2024 Taup艒
Casino Frenzy login register mobile
Matukio ya leo VIDEO Download
Jili8 login philippines withdrawal
Bonus Hunter casino
Super Sic Bo prediction software
Maraming tao sa panaginip
PH cash casino real money
casino online games real money
JILI slot jackpot app
Super Ace slot 777 login register
Sobrang alas libreng laro login
Elden ring more talisman slots reddit
Phdream 777 slot download
Old school casino slots online free
Free GSN games list
Wizard of Oz Slots Free Scratchers 2024
Jugar gratis Pharaoh's Fortune
Royale jili withdrawal
Toledo bend lake country cabins
Roulette simulator Unblocked
Infinity 88bet app
Super bingo jili demo apk
Super rich casino real money
Jelly cake design for Birthday
MERKUR Slots online UK
Slotxoz1688 register
35phfun
Betso login philippines
Slots Palace Casino promo code 2023
Libreng laro ng online slot machine real money
Extreme gaming 888 download
Jili official app ios apk download
Double Diamond Wheel of Fortune slots free
PHLBOSS online casino
Hot 646 slot real money
567 slots online
Yes jili com login registration online philippines
How old is Leon Kennedy in RE6
Demo jili free play demo no deposit
Ii89aa philippines
Maxjili com login philippines
Lodigame 1 login ios
Ubet63 jili slot online login app
Baccarat online casino
jili h1 register
Mega ace slot demo download
Ube halaya koi in english
Jili t7 register philippines online app
How to win at Cache Creek Casino
Slots how to win online
Go88 casino ios
Bulelani jili wikipedia harvard university
Funny casino Instagram captions
Best online slots philippines no deposit bonus
Fortune Gems 3 Jili
How to create transaction pin
Mwplay888 net login password reset
Slots ug real money
Jili q25 register download
Www 90 jili com login register philippines
Lucky Neko slot PNG
Royal casino game login register
Slot machine pictures cartoon
Jili free 100 new member apk
Alberta online casino no deposit bonus
Cc6 online casino login philippines
Gogo jili 777 login philippines sign up
winhq.com online casino
Fc178 download app apk
拢3 deposit bingo
Tongits online pc windows 10
casino plus customer service number 24/7
Galaxy88casino net login philippines
Fb777 win apk
JILI live casino login Philippines
Jiliplay login Register
Hot 646 ph login register download
Pin lucky game gcash download
Ph 646 casino login download
Free unlimited bingo card generator
Fc178aa review
CB1 and CB2 receptors
Jili club apk
Jiliko online casino pagtaya registration
When is pasig day 2021
Jili app casino download for android latest version
Gates of Olympus vs Gates of Olympus 1000
Biofloc fish farming book
Vegas7Games free credits
Jollibee Delivery Mix and Match
JB CASINO fb
X570 a pro m 2 slots manual
Golden joker jili withdrawal app
747 Live app download for android latest version
5 jili casino login philippines
July 8th 2024 weather
olympus tg-7 release date
FF16 Joshua companion
Ano ang kahulugan ng halimbawa
Lucky cola online casino games philippines
Online casino jili philippines real money
Bingo plus mines cheat android
Wilde Wealth Management
Jili 49 dot com login app
Julie's Bakeshop description
Is gambling illegal for minors
Slot Attendant salary in the philippines
Is jilivip legit philippines
Jili x superace88 login philippines
啶啶澿 啶曕啶?啶膏ぞ 啶班い啷嵿え 啶す啶ㄠえ啶?啶氞ぞ啶灌た啶?
Slot machine games online no download
Wowph casino login
What did the Council of Nicaea do
Olympic casino online games no deposit bonus
Dragon Cash slot app
啶掂啷嵿ぐ啶ぞ啶?啶曕ぞ 啶ぐ啷嵿く啶距く啶掂ぞ啶氞 啶多が啷嵿う
How many days until July 3
Www jilino1 club registration
Philwin download apk
Pagpapanatili ng jili download apk
Jili h1 register philippines app
Old IGT slot machines
Tadhana slots 777 apk download latest version
Ajili in swahili meaning
online slots that pay real money
Atwood Water Heater parts near me
6s app casino login
Up 777 casino login download
Restore slotomania download android
Casino slots online real money
royal 777.in login
Pros and cons of gambling
Tadhana jili slot real money login
Ezjili login register philippines
Fishing app earn money
How to withdraw money from OKBET
Zynga Game of Thrones Slots support
Betjili apps download apk
Yesjili com app ios
Philadelphia News today
Noir cowboy TDS
Gogojili redemption code 2024
Jililuck download ios
Jackpot meter jili download apk
Slot777 casino login no deposit bonus
Railway Sar Sangrah Khan Sir book pdf in Hindi
106 jili casino withdrawal
QQ international sign up with email
Fb777pro login registration
Best free slot play no deposit
jili real money
Treasures of egypt slots free games download no download
Evolution Gaming lawsuit
7 libreng online na slot machine legit
CG777 Casino login register
Https slotbet com home game login
Pinakamahusay na oras upang maglaro ng jili slot
49 jili queens withdrawal form
Https ii89phn com download
Betjili app download
Jili libreng 100 login register
Play casino games online for free without downloading
Super ace jackpot pattern
LiveBet prediction
Official Journal of the European Union PDF
Maritime Industry Authority function
Marvel bet app download for pc
Journal of jilin university multidisciplinary journal impact factor
49jili apps download free ios 2021
Mitran de boot mp3 song download mr jatt pagalworld
Best free slots treasures of egypt no download
Angelina Jolie children Vivienne
Jili voucher code free chips 2021
啶掂啷嵿ぐ啶ぞ啶?啶膏 啶啶距さ 啶曕 啶溹ぞ啶ㄠ啶距ぐ啷€
Kabibe Game code 2024 free
Feestdagen Belgi毛 2024
DIY feminine wash for odor
49 jili apps philippines login
Brick Alpha
Jilivip 02 apk
Jili 49 login
Award winning chili recipe Allrecipes
online casino games like luckyland slots
Arena plus apk
Super ace hack download apk
Where There's a Will FF16
Jili777 oi login
Phwin777aa login
Betvisa Philippines login
Jollibee menu c1
Jili amazing withdrawal
Phrich download
Fish Farming in Bihar in Hindi
Top 10 best online slots in the world
Jiliasia 49 login
Ano ang pagsasalin pdf
"casino" casinomeister complaint
Jollibee promo 75
Jili city 829 apk latest version
Golden empire casino login download
Online casino games free money no deposit
Bet999bet login download
1xBet casino bonus
Casino Plus promo code today Philippines
Cow 888 Casino login Philippines
Peso63 login philippines app
MNL777 download free APK
Fake gambling Plinko
63win Casino
Jili city download apk
777pnl casino link download
Ilunsad ang Kraken demo
Kerri Strug ankle injury
Video poker online free play no download
Slotomania update
Jili 200cc login password philippines
White Rabbit slot
Tracksino Crazy coinflip
Euro casino slots no deposit bonus
xxjili live
Slots 999 casino online
SM Sale schedule June 2024
Paano maglaro ng slot para kumita register
Thunderkick slot apk
Spina bifida ultrasound newborn
Jiliasia app Download for Android
Kit timefree ph login register
USA online casino no deposit bonus
Phlwin Mines Game
Pay777 log in
5-ingredient vegetarian chili
King game888 register
Demo jili try out free
Jilibay VIP login password
Pci slot vs pcie gaming
Mines game hack scanner ios
Best casino for free slots
Falconplay web download
Sigeplay online casino register download
Scatter philippines withdrawal
Ano ang super 6 sa baccarat strategy
Baccarat card game strategy pdf
Ox jili casino login Register
ez jili app download apk
Fachai88 login app
Mines signal App
188 jili com login philippines
Yeriko BORA Injili download
Wild chili Scoville
Super ace jili slot login
bonus free casino
Casino frenzy app download ios
J jill promo code july 2024
49 jili road register app
100 free spins no deposit codes
Jili event app apk
Pnxbet philippines registration
Barrel bonanza slot demo hack
Jili t7 login registration online
Libreng computer video poker free download
QQ jili casino login registration
How did this part of the epic poem Beowulf end
Orion stars slots apk
Free online games jili philippines
Phlove Casino Login Register
Casumo - Live Casino & Slots
Mini Phone Touch Screen
Jiliko747 slot game login app download apk
Online pokies Australia real money no deposit
Lodibet com login password
devil fire jili slot
Lucky 777 apk old version
How to play Lucky JILI Slot
774pub register online
Super ace slot free play download
Windows 10 download
gogo jili log in
Yes jili free 68 login philippines apk
Hugph1 login password
777 pub online casino games downloadable content apk
釣€釣夺灍釤娽灨釣庒灱 online
Sloto kahibangan casino login
Scatter game jili download
Lucky calico casino login philippines register
Tongits Go Mod APK Unlimited everything
Mines predictor online free
New free slot machines with free spins
Deli zone boulder menu
Slots zone apk
Libreng paglalaro ng video poker online withdrawal
777 jili casino login registration
APaldo slot Login
Pp77 bet download
baba wild slots casino - free coins
Game slot 777 online apk
Release the Kraken slot review
Bagong jili register app
New slot machines 2024
Julie's bakeshop wikipedia biography
Lodi VIP bet
Jeetbuzz 168
5jili online casino philippines
Yy777aa app download
Ano ang fruit party?
Lodigame app download latest version
Popular online Games in the philippines 2024
J jill petites online
Good luck wishes for match
Online casino game dealer philippines
Best online pokies Australia real money
online gambling for real cash
phil168web
Kk jili free 58 login app
Jollibee Burger Chicken
Masaya si jili real money philippines
Julie's bakeshop history pdf
Casino online free philippines
Winph111 login bonus
Free slots online free games no download for android
NN777 Slot login
GOGO Jili casino login registration Philippines
Jili opisyal na website register philippines
Temple slots com login
Philadelphia State
Apollo game download
Jili 999 casino login philippines
888php login app
88casino
Osm gcash login problem
Cazino Zeppelin Reloaded demo
Free online slot games win real money philippines
5jiliorg download
Jili games free no deposit bonus
Big bass splash sam rayburn 2023 results
slots you can win real money
Gg777 download
777 lucky jili slots casino download apk
Dinosaur tycoon jili download apk
Free slots 777 apk latest version
888php casino login philippines
Bingo jili slot download
Jili slot 777 login register online download
Www mwgames188 com login download apk
Aratbet online casino register
Slot games for real money philippines
Wild Wild Riches
VIP slot online
Walang 1 jili login password
啶ぞ啶ㄠじ啶苦 啶班啶?
Casino games slots free download
Jili club login download
Bwenas 999 Live Register
Winph222 login download
Maxjili casino
Poker machines online
Jili999 register app login
jili9889
Jil monthly theme
Ruby Slots free spins no deposit Plentiful Treasure
1 kilo ube halaya recipe
Best gambling slots
Tamabet app download
nice88 legit
matinding amazon big bass
Paano mag withdraw sa jili games
Jili50aa review
Macau casino minimum bet reddit
Bigballer club log in
July 3, 2024
Best smelling homemade laundry detergent
Jili 188 no deposit bonus
Lucky 777 login app philippines
Jiliko online live
291 bet casino withdrawal
Reusable ice cubes IKEA
Jelly App tik tok
Queen777 casino no deposit bonus
啶掂啷嵿ぐ啶ぞ啶?啶膏 啶啶距さ 啶曕 啶溹ぞ啶ㄠ啶距ぐ啷€
Royal888 deposit bonus codes
Jili free 100 register download philippines
Tapwin 2024 login
60 jili login philippines register
337 jili live casino
FF777 casino Login
Phil Online Service Center
PanaloKO referral code
111jili login
Best lenses for sports photography Nikon
Sm 777 casino login Philippines
Big bass Splash Guntersville 2024 Results
Mwgooddomain com login download
Online casino games usa real money
Gogo jili casino login download free
What is PCI in computer Architecture
Nn777 slot jili online real money download
Is July 2 a holiday in Pasig City
Geely gx3 pro engine review
Pagal Khana drama cast tina
Is Calico Spin affected by luck
Hot Vegas Slots Free coins
Majili clan names
lodi291 online casino games gameplay
Ff777 casino link app
Mga kahinaan ng mga pragmatic slot machine login
FB JILI Login
Fijne dag meaning
download jili
MPL PH
Jlbet 26 register
Jilibet Promo code Philippines no deposit bonus
Fg777 pro login philippines
Video poker games free download no download for android
Konnyaku jelly ingredients
Ph646bet app
Lucky Tiger 777
21.com casino no deposit bonus
Charge Buffalo free play
Super jili 777 casino Login
Royal 888 casino app
Jili slot 777 free 100
Jilibet promo code 2024 philippines
Jili live app download apk old version
online casino video slot games
Slingo originals free download
Slots the game download
118 jili casino login
Phjl55 philippines
646 jili
Ijility trabaho address new york
Rush Fever 7s Deluxe
Slot machine simulator online
Tetris free
Jili777 online casino login
Winjili ph login registration
Jili 53 casino login download
Y777 jili withdrawal limit
Ijility las vegas warehouse jobs salary
Flush Fever video poker online free
Libreng jili games login registration
ck jili casino
Pay 777 casino login register philippines
Ye7 login philippines
Casino Royale 88 login register
Please complete the required turnover for withdrawal tagalog meaning
Osm Jili Official Website
Hacker keyboard download
Ijility llc milton ga address
Jili999 register philippines download apk
List of Aristocrat slot machines
Transaction password example gcash
SUPERX Casino app
Jili ez apk mod
FBM bingo Pilipino online login
Mnl168 link login
Crown88 login
Sugal777 app apk
megapanalo
Jili update philippines today
Superaccess industrial login
Esball Online Casino com
July 9 bts song
Nexus gaming slot login download
Bingo jili ph download
Tg777aa philippines
Libreng paglalaro ng video poker online app
Lv bet app login
Jili slot machine real money legit
Jili rich download for pc
200 jili casino login register philippines
mayari ng jili
Lucky 777 Login app
Kumuha ng jili app ios apk
188 Jili Casino login Philippines
Hack mines game
Lodi 291 online casino register app
laro ng pera ng dragon
No cash in online casino
Best online casino slots kenya real money
ILI bibliography format
777 casino login register philippines download
Jiliplay 9 today
Jackpot meter jili download apk
Jili 777 lucky slot login register download
30 free slot games online slot machine no deposit philippines
Jiliko casino online games philippines
Bmw casino slot app
Osm jili gcash register online download
Yahoo daily horoscope Scorpio
BET999 Login Register
Dragon Link slots online free download
WINPH com casino
Free slots treasures of egypt no download
X570 AORUS ELITE WIFI price
Kk jili login registration app philippines
Online casino games to win real money philippines
Hot 646 ph online casino register
Mahal si jili casino login register
Lodi 291 online casino games free chips
Tongits offline mod apk
www.scatter slots.com
Casino game real money free play
3rd hand slots
Gamebato alternative
101 jili com login philippines
puwang ng dragon hatch
Pagal Khana Episode 28
Virtual browser online free download
Phlboss888 app for android
slots nigeria
JB Music moa
Crazy 777 jili login download
Yono Slots APK download latest version
Best free online slots fake money no deposit
1xBet online casino free download
Platincasino Deutschland
JILI 646 PH login
Jili 747 casino login register philippines
Zodiac Casino app
Gogo jili App download apk latest version
Play to win Casino registration online real money
Ace demo slot free download
Mahjong ways 2 tricks
Top 10 free online casino games philippines
Side quest ni jill
6bet com redeem code philippines
777 lucky slots casino login
how online casino games work
usajili yanga 2023/24
Okbet 168 login password
Jili 464 login register philippines
Casino frenzy app download for android
Jili games apk old version
Fire Joker free spins no deposit
Manila online casino
Jlbet33 login
60win asia
Free 100 casino 2024
X570 AORUS MASTER drivers
200 JILI cc
Book of ra free game apk
Good Luck Guys Netherlands
Kk jili login registration online 2021
Jilibay pro withdrawal
Baliw 777 jili login download
Chili pepper
Q25 jili login app
Slots of Vegas $300 no deposit bonus codes 2024
Tp777 download apk
Boxing king slot png free download
Coffee jelly ingredients and procedure
magicjili
Best online casino games philippines gcash
Philucky official casino
Jili cc login philippines
Jili lucky slots real money philippines
Jili super ace hack download apk
Jili777 free 100 no deposit bonus Philippines
Asia jili register mobile
Jili games gcash real money
Online casino no minimum deposit philippines gcash
LIMBO Mod APK
Jilibet download app for android latest version
Ano ang ibig sabihin ng time slot brainly
Play Dice and Roll free online kaz
777 casino real money login
Betpawa Games today Football match live
Kirin games online casino download
Www 90 jili com login register
Jili rich login philippines
Betjili bangladeshi saiet login
Dbx777 login philippines registration download
J Jill coupon codes $50 off
Helens 777 Casino login download apk
4 talisman slots elden ring bug
Jili online slots apk latest version
JILI official GCash
Jackpot Party apk
49jili casino official site philippines
Quick hits slots free download apk
Lol646one download
Kkjili com 777 login password
Wow88 malaysia login register
Golden Empire Gcash
Ano ang speed roulette online
Who invented mobile phone in which year
Jili code free 2021
Best slots free
49 jili queens register app
Jili turnover calculator philippines
Jili referencing indian law pdf
Slots 213 apk
Slot Super Ace Jili Games gameplay
Jili gcash register link
Golden empire free demo no deposit
Best slot machines to play at the casino for beginners
49jili vip login download
Electronic Bingo tablets
Jackpot meter slot philippines
Jili city 829 login password
JILI casino PH
Double Ball Roulette rules
49jili casino slots login download
Jili irich bingo app free download
49 jili today philippines login
49jili login to my account register philippines
Love Jili online casino
What day is july 2nd 2024 holiday
How to withdraw jili casino philippines
Helens gogo jili register app
Jili 365 casino login registration philippines
50jili fun withdrawal
Peso 888 register bonus
Espanyol to Tagalog words
Jili tryout free
Pagal Khana Episode 26
Ice wild slot real money
Double Rainbow game cgebet
Jili scatter download
Crazy Hour Watch price
Big bass splash strategy
Jili easy win download apk
Jilibet020 com login Register
FB777 PH login
Maritime Industry Authority function
60 jili login register mobile
Blackjack rules not 21
XXXtreme Lightning Roulette
Bloxflip Mines predictor discord
Sg777 bet login philippines app
99bet app login
Pb777 login register mobile
1xSlots no deposit bonus
Libreng slots treasures of egypt download
Mini777 download apk
Phjl casino app download
365 jili casino login philippines download
July 12 holiday Philippines proclamation
Jili8 COM log in
Super JILI asia
10 online casino games philippines
Okebet168 com login password
Jili7 jili slot register
Get jili app login philippines download
Nakakatawang palaro sa mga bata
vegas7games play online casino games https //m.vegas7games.com
BBM777 free 188
Infinity Games free 100 download
Casino Filipino Coin
El filibusterismo kabanata 30 buod
啶椸ぐ啷嵿ぎ 啶ぞ啶ㄠ 啶膏 啶溹げ啶ㄠ 啶ぐ 啶曕啶ぞ 啶侧啶距え啶?啶氞ぞ啶灌た啶?
Jili178 promotion philippines
Irich bingo slot login
Jili slot 777 real money
88jili login registration
188 jili casino login app download
Xtreme gaming casino login
Best online penny slots real money
Jili online casino apk mod
Euro slot packaging
FF16 Phoenix, Heal Thyself
Lucky Tiger Casino no deposit bonus
Royal777 slot apk
Betso88web login
Dermaplaning powder Spray
Apps na pwedeng kumita ng pera legit 2023
Singilin ang kalabaw jili withdrawal
best online casino games that pay real money
Win99 slots game real money
jili com
Jili online slot real money app
Jelly cubes food
Lodivip4 com login password
Solid bet777 com login philippines
Jigsaw Puzzles - Puzzle Games
Jili opisyal na website login philippines
8k8 online casino games downloadable content philippines
Aceph 99 review
Jili tv login
Pure swerte99 live login register
188 jili
How to get badlands cowboy skin
Demo jili try out apk mod
Jili official website login register
Jili Slot 777 login register online no deposit bonus
Jilibay pro withdrawal
Free 60 pesos online casino
Ano ang pinaka kumikitang diskarte sa baccarat?
Online casino games example for students
Heart of Vegas Slots casino
Cowboy Slots best slots
Ph sabong go perya login registration
S888 org live betting app
218aceph com login register
FC777 register
wow888 casino login
Www jilibet888 com login app
Swcup6 net live login Register
Jili 646 register philippines
Bet88 agent
1p slots Foxy games
Jili777 login register online philippines
Golden Temple JILI Slot
Journal of Tianjin University Science and Technology impact factor
Live casino slots online philippines
Pisobet88 philippines
Is casino legal in India on land
Casino Jackpot Slots early access APK
PG gaming slot login
Jili kilig casino login download
Phl vip slot download
Halimbawa ng online slot na pagsusugal app
online slot machines for fun
Max jili casino login
Zeus casino game free download
Good luck in Hindu
Jilino1aa philippines
GSN Casino free Tokens 2024
Jackpot Wins gift code list today
Phtaya download free
49jili casino games download ios
byu games casino 968 online casino
Lol646pro review
Wagi 777 download for android
yyy777web
49 jili quartz withdrawal
Please complete the required turnover for withdrawal phdream login
Voslot apk download for android
Paano maglaro ng slot88 withdrawal
Ano ang pinakamalakas na kamay sa blackjack cards
Jili jackpot 777 login app download
Jili yes casino login download
XBet app
Tmtplay pro apk
Jili live slot
Deepwoken wiki
Slot machine Plants vs Zombies
Phbwin com login password
Best online casino philippines gcash real money
online casino free games on slots
Jili link casino no deposit bonus
Pasig gems slot register
Baccarat table philippines
Jili 8888 real money login
Casino slot free no deposit
Slots Ninja match bonuses
Tadhana jili slot apk download old version
Turnover not met cannot withdraw amount meaning
How to deposit in philucky Online
How to cash out in JILIBET
Max jili App
joy slots
Taya365 bet
41 jili withdrawal
337 jili com login register mobile
Jili 8998 login register download
Winehq slot online login register
Alberta online casino games no deposit bonus
Jili999 withdrawal fee
Best free online pokie games with free spins
Rummy Culture
Saan maglaro ng baliw na coinflip?
Jilibet download for android
How to make a gel ice pack without rubbing alcohol
177bet cc register
gille helmet full face price
Jili 178 ph register app
Teen Patti Gold old version
Play Dragon Mighty Cash free
s888aa
Ggbet net registration
啶掂啶ぞ啶ぞ啶?啶啶?啶膏か啶侧い啶?啶曕 啶侧た啶?啶曕啶?啶膏ぞ 啶班い啷嵿え 啶оぞ啶班ぃ 啶曕ぐ啷囙
772 pub withdrawal
88JL Login
Qq jili ph register online casino
Jiliasia withdrawal app
Legit online casino games philippines real money
Take Action pill
Slot online game free play no deposit
Yugioh forbidden Memories Ultimate Dragon Ritual
Lucky 778 casino no deposit bonus
Mr Fortune casino login
Gogojili old version
Jili deposit 50 philippines legit
Empire slot machine free chips
9y game city casino real money
Z790 ram slots specs
JILIHOT register download
49 jili tv shows 2021 philippines
Hb888 casino login
royal ace casino "hidden" coupons
Most expensive helmet in the philippines
Dragon Link slot machine app
337 jili live
Zeus casino game free download
PHMACAO apk free download
Mnlwin game login philippines
Poki unblocked github io
J jill promo code free shipping no minimum
Example of TV show in the Philippines
Super PH casino online real money
King game Casino free 100 no deposit bonus
Pragmatikong dula pdf
Dahilan at epekto ng suliranin sa pangingisda
Jili 999 casino login registration download ios
Dream 111 login forgot password
Zili app video download apk latest version
All games free download
Real money online casino Ohio no deposit
Jackpot World free coins code
Kkjili casino login register
Tesla Roadster
Agilaplay login philippines
Egypt slots no deposit bonus codes
Scatter free play
Best slot sites for real money philippines
Yes jili com login registration form download
Boeing aircraft price
God of Wealth slot game
Tesla inventory
Helens 777 Casino login download ios free
Quick hit slots app cheats android
Taya777 bet app
SLOTVIP Download app
Jili reward login app download
Casino score Crazy Time
Jili joy casino login philippines download
777d online casino register
Mga larong wild classic slots sa casino download
Mi777 login password free
Jili188 tw no deposit bonus
Yaman777 download
啶ぞ啶椸啶?啶氞ぎ啶曕ぞ啶ㄠ 啶曕 啶熰啶熰啷?
Online betting casino real money
Vipph casino login
Bet199 APP
DALI 777 Casino legit
S888 org live betting login registration
Tesco Hampers sale
What National Day is July 10
Sizzling sevens slot machine price
Phwin666
Anong uri ng laro ang Dragon Tiger?
Igt slots download
GTA Online slot machine trick
PHLOVE Casino link app
QQ Jili Casino login
E isang verdad traduction english pdf
FF777 Casino Login Register Philippines download
Pinakamahusay na mga site ng slot register
Phbwin com login register mobile
66pgslot
Abc Jili download free
Big win 777 PAGCOR Casino login registration Philippines
Is jp7 still made reddit
Recall balance meaning
Cheat Engine slot
Superball Keno online
Legacy of Dead free spins no deposit
Jili jackpot register mobile
Lodi888 login philippines
Golden empire free demo no deposit
Jollibee philippines menu price
Stake Crash strategy
free buffalo slots
Fortune gems real money philippines
Swerte Win
Jiliko register philippines login download
July 20, 2024 Mike Tyson
Gsn laro sa casino real money
Girl andrew lyrics
Ezjili code free ios
Ano ang diskarte sa power blackjack online
Pb777 login register mobile number
Ace casino real money
Jili isa login registration
Hqwin slot app
568 Slots yono apk download
Lumulutang na dragon megaways demo apk
Lion Slots Free Spins
Jili999 online casino login app philippines legit
100 free spin and win real money
How many days till July 8th
Ano ang pagsusugal
Jili app casino download for android ios
Jiliph club withdrawal
Quick hit slots unlimited coins hack
8m8 casino login register
Starmania slot real money
Yes zili app download apk old version
best online casino games in kenya
Online casino games not real money reddit
Royal fishing demo hack
Gambling online, free
Galaxy casino login philippines
Jili 11 casino login
Pb777 login app download for android
Betso888aa register login
online slot machines nz
Galaxy Casino Frenzy
Panalo99 ph register
milton 888 casino login
RTP Gorilla Kingdom
Videoslots freeroll no deposit bonus
Jilipark login register philippines download
63win withdrawal app
335 jili casino login register
Best alkansya for paper bills
Unli scatter super ace hack download
Jili mine casino login app
Best slot machines to play online
啶班ぞ啶多た 啶班い啷嵿え 啶曕 啶ㄠぞ啶?
free 100 sign up bonus no deposit
55 JILI casino Login
Play Alberta Free Spins
J jill facebook shoes
Fruit Party slot
Khan Sir Railway Book pdf
Which RAM slots to use for 2 sticks
Jlph3333
Pop Slots free chips 4m+ today
Live RTP slot
Jili slot free try out no deposit
Jili 369 login download apk
Halimbawa ng pagganyak sa filipino
Listahan ng laro ng skillz apk download
Super Ace game download
Jili999 login Register philippines download
crown89ph.com net
Slots 555 no deposit bonus
Portuguese to english dictionary
Pragmaticplay com legit
Win99 casino no deposit bonus
Bonus 365 login register mobile
Deli zone menu boulder pdf
Online casino games for real cash philippines
Lvbet com register
Bingo Plus download
Fufafa technology ltd co register
Yes zili app download old version apk
Jili no 1 com withdrawal app
Jili tv casino
Himala director
Tongits online casino
Wild West Gold download
Mnlwin free 100 login
BetOnline Reddit
Nn777 login philippines download
Bmy88 login password
Jili city login password
335 jili casino Login
888 casino - withdrawal problems
5e sorcerer spell slots reddit
Big Bass Splash registration
Jili super ace free play app
Slot synonym and antonym
Jili fun888 login app
Is casino jackpot slots legit for real money
Games for girls 2
Bmy888web app
Jili 365 casino login register download free
C9TAYA Facebook
Lucky wheel spin and win
Get jili app login registration philippines
Royal 888 ph login register download apk
Malaking bass bonus
PG gaming casino login
Lucky jili casino login download no deposit bonus
https://www.dhirubhai.net/feed/update/urn:li:activity:7167204483209412608
Partner@Kpop Nara, [email protected]
2 年Thanks for putting this together Sandeep Uttamchandani, Ph.D.! This was super helpful coverage of the entire data workflow and landscape.
Co-Founder and CEO at Union.ai | Flyte.org
2 年Fantastic job Sandeep Uttamchandani, Ph.D. You have defined the areas really cleanly. This is so hard as many areas overlap. But this is a good perspective and I am thankful you mentioned Flyte.org
fantastic overview of the landscape. Thanks Sandeep Uttamchandani, Ph.D.
Finance | Product Leader | Innovation | Emerging Technology
3 年The blending on Product and Data was impressive!