Analytics Shift towards new uniformity
SIbendu Nag

Analytics Shift towards new uniformity

Every now and then, we are surrounded with multiple clickstream datasets in our work and often we have to compile that datasets to some wonderful presentations for better understanding and usability. The technologies we use today for clickstream analytics do have some amazing BI interfaces, e.g., Adobe Analytics workspace does have set standards around data visualization. However, when it comes to analyzing the data with multiple sets of reporting and sources, there arise the trivial task of data uniformity. If we consider having analysis with one source of clickstream data, then it is fine to have the BI interface pull and share the data presentations. But what if you need to have a glance into some other sources like offline datasets, mobile applications, kiosks, over-the-top (OTT) platforms, or other technologies like Salesforce, Hubspot, etc.? And yes, not to ignore the big limitations especially around data prototype and speed. We need to even understand that when we have multiple set of reporting coming from technologies like Adobe Analytics, Marketo or Salesforce, it is so critical to build the connectivity between all the system to have one single truth of data.?

Now, we all know that the data set that the clickstream data from Adobe Analytics produces is a raw data and is quite huge. If we can map this data with some other sources and explore this to its core, will be of definite use considering that lot more meaningful inferences can be made based on this. Almost 95% of time, we have the explanation that the clickstream data is quite robust, and to come out of the same old excuse we can definitely leverage Adobe’s Data Feeds here.?

Today, importing data from a source to a destination is considered to be one of the most complicated tasks. If there is a process in place, you can easily upload, transform and present a complex set of data to your data processing engine. Recently, I had to work around transferring the Adobe Analytics data to Snowflake, however main challenge has been the fact that Adobe doesn’t provide this out of the box, though you might plan to use ETL tools such as Fivetran, Alooma, Stich, etc.?However, the question is, do you want to go for some connector or you want to manage this internally with some of the in-build processes in the organization? Here the idea is to use AWS S3 as a medium, so that we can build a pipeline that goes from Adobe Analytics to S3 to Snowflake.?

With this point in view, I decided to create an approach that can help in the process of ingesting data from Analytics interface to AWS S3 and finally into the Snowflake environment. There are many ways to import data from Adobe Analytics into Snowflake. However, I find the below solution more useful. Most importantly, you don't have to go for any connector and make the process dependent. Here are some of the process steps that can help design the entire flow from Adobe Analytics > S3 > Snowflake.

Basic Execution: Setting up the AWS S3 Bucket

Very first step, would be log in to the AWS console and set the S3 bucket with some unique name. Then we can follow the settings as per the requirements. Once the bucket is created, you can check the files and can upload/download files to the bucket.

No alt text provided for this image
No alt text provided for this image

Next would be to set the Adobe Analytics (AA) user on AWS with S3 permissions. This will give AA the read and write access to S3. The same credentials would be required while setting the automated feed from Adobe. When the user is given a name, we need to also check the box labelled “Programmatic access”. And we do need to have the user credentials define here, especially the Access Key and Secret Access Key.?

Getting the data from Data Feeds

As a next step, we can now proceed with the Adobe Analytics data feed, where we need to set the recurring data feed. Based on the requirements, we can select the frequency for the daily feed, also need to be sure about the size of the files as well.

Process to implement the same will start from Data Feed Manager in the AA interface (Analytics > Admin > Data Feeds). We need to be doubly sure about some of the details mentioned below.

  • Proving an email for notifications (similar to how classification notifications are shared once completed)
  • Select Type as S3
  • Bucket to be named as that of the one selected for AWS S3
  • Provide the Access Key and the Secret Key, as has been associated with the Adobe user created above.

No alt text provided for this image

Once all the information are provided, save the data feed. To transfer the data to S3, especially the first hour of data it may take upto 30 mins. In the Job Feeds, the details can be tracked about the progress. Once done, you can check the S3 bucket for the files that are migrated.?

Once the process of S3 migration is done successfully and we are able to eventually populate the files, we can now move towards the final process of taking this data to the warehouse.

Creating an external Stage

Now as a next step for bulk loading the data into Snowflake, we can use the existing buckets and folder paths that of S3. The process of loading from S3 bucket is mentioned below:-

  • In most cases, the data files have already been staged in an S3 bucket, once it goes from AA to S3. If that doesn't happen, we can upload interfaces/utilities provided by AWS to stage the files.
  • Next, we can use the COPY INTO <table> command (shown below) that can help load the contents of the staged file(s) into the database table in Snowflake. Though it can be done directly, however Snowflake recommends to create an external stage that can help reference the bucket and finally using the external stage for the migration.

No alt text provided for this image

This process of data migration, which we can term as ?Analytics Shift?is to make the process of data modelling much easier and efficient. Whatever the file size is (rows and columns), businesses can now easily manage a shift towards reading and interpretating data more accurately and that too in no time. The shift from one of the most demanding Analytics tool to Amazon Simple Storage Service (Amazon S3) is surely not a one day course and would definitely need some understanding as to data transfer frequency, actual data and lookup tables and file format. But once you are ready with the schema specifically around the server-side and client-side encryption, you are then all set to access data that is more business-oriented to solve any level of complexity and duplicity. Now why wait, start supercharge your business decision making process with automated reports and pertinent information that goes directly from your Snowflake environment to the BI tools.

-------------- Kick start with the Integration Path ------------------

Nicely explained. Thanks for taking the pain of putting it together. Kaushik, Rattanender you may want to go through it.

Santosh Krishna Venuturupalli

Principal Data Architect - Digital Analytics | Helping make sense of data - The Why of the what !

2 年

Thanks for posting, wonderful read

Gagandeep Singh

Digital Data Governance & Compliance | Adobe Analytics | Enterprise Digital & Data Solutions | American Express | Driving Innovation in Digital Experiences

2 年

Good read. Thanks for sharing ??

Abhinav Gautam

Senior consultant at EY | Digital Analyst/Web Analyst | Uncovering Actionable Insights to Amplify Online Presence and Drive Business Growth | Adobe Certified

2 年

Very knowledgeable Sibendu Nag ?

要查看或添加评论,请登录

Sibendu Nag ?的更多文章

社区洞察

其他会员也浏览了