The rollup field is your friend

The rollup field is your friend

Data engineers or report writers, has this happened to you recently?

You are querying a table to build a report, and you get an additional requirement from a stakeholder for one more thing.

“Can you add the count of (sales, users, something) per account to this report?”

In many business intelligence tools, adding this relatively simple count turns out to be a harder problem than you would expect.

When you join the additional table to the data set and group by the key (let’s say the account), you’ll get all of the rows in the set. This means that if you are getting sales data by account it’s not that hard to get a count of the number of closed won sales.

What can be challenging is adding that “roll-up” field –?called a roll-up because it is counting an aggregate for a key like accountID – to a report that contains other fields that you don’t want to group.

Grouping typically creates too much information

In most business intelligence tools, a grouping action forces you to have extra rows or to view only the aggregate information along with the main key.

Take this example from a Salesforce development environment. When you sum the opportunity count by account, you see the record count by account.

When you add an additional field to your group, it’s possible to make a new aggregate, summing the records in your expanded group.

But when you want to add ungrouped fields, you get more rows. This makes sense and is typical for the way that databases work.

The challenge happens when your stakeholder would like you to add another piece of information – for example, counting the number of engaged or product-qualified contacts at each account.

Answering this other question requires another report that uses the same shared key (the account ID) to find the number of engaged contacts for each account based on some criteria. The output is a single number, along with the same unique key you used in the first report.

Now, you’d like to union these two reports so that the field you looked up in report two (the number of engaged contacts) can easily be added to the first report (counting the number of opportunities) for each account.

It turns out this is not an easy thing to do.

Salesforce makes this possible by creating a joined report, which solves part of the problem. You can see the joined data, but it’s not easy to export it in an easy-to-use format.

If you know the key of a related record, why can’t you just append a field to your dataset?

BI tools don’t expect linked questions (but should)

In a perfect world, you would expect to be able to define a roll-up as a related metric on each object. In this case, the object is the account and the rolled-up information consists of attributes that you calculate independently for each account and then can add as a field.

In the Salesforce universe, many people use Rollup helper, a free tool that periodically calculates these metrics for the objects you specify:

  • e.g. a skinny table with the account ID and the opportunityID, amount, close date, and stage
  • Then you have the current YTD sales for that company and can add it to a company record

Rollup helper runs this query as a background item and populates a field in your object (the account, for example) so that you can reference the output of the roll-up query as an inline field rather than a set of rows that you add up.

In other BI tools like Sigma, you can write a query directly in SQL and then use it as the input for another report, using the key in that query as join criteria.

This is a great approach because instead of the messy grouped report we looked at above, you can simply add one more field to your dataset. Within reason, you can keep grouping adjacent datasets that have the same account ID.

I know DBAs everywhere are rolling their eyes at me. ??

Schema design informed by ad-hoc queries is not the way databases should be designed or built. ??

But if you put your product hat on, you’ll see that this is a very common use case for users who need to add “just one more field” to their reports.

?? Making this use case possible (and even delightful) is a worthy goal. ??

What would it look like to anticipate linked metrics?

The way that Sigma approaches this problem is a pretty solid model:

  1. Lets you write a query against known data, essentially creating a mini-model
  2. Determine how often this should run (ideally, as a view that can provide instant information as you run other reports)
  3. Name and save this dataset as a structure you can union against other data in your environment by specifying the join type and join field

What if we were to build a roll-up feature to offer more capability to a business intelligence tool?

We would want this to feel basically the same as the effort required to add a tab to your spreadsheet with new information, designate one column to be the unique (and related) key, and suggest potential roll-up combinations.

I would expect this feature to do the following:

  1. based on the unique key for the current base report, suggest reports that group information using that same unique key
  2. if there are secondary unique keys also present in that base report, optionally suggest groupings for those keys too
  3. allow you to add one of these external groupings at a time
  4. if the combined grouping is invalid, warn me that it results in too many or not enough records

To support this, you’d need a way to bucket these queries to be related based on the key, run them periodically to know if they are in an error state, and keep track of how often they are used or joined to other queries.

A bonus: snapshots of changing data

One of the side benefits of checking queries on a schedule and validating that they still work is the ability to create a “snapshot”, or a stored view of the report at a certain period in time.

For example, you might want to run your “active user count” report every week and stamp that result with the date of your combined report. Salesforce makes this capability available with snapshots so that you can build a time series for changing data.

But what if you don’t use Salesforce? There’s an interesting niche here to run reports on a schedule and store these generated reports using a common schema and then be able to visualize changes over time.

You could also use these past reports for analysis, unioning additional datasets on these historical reports for analysis. This is an example where Sigma Computing shines.

What’s the takeaway??Roll-up fields add a very useful dimension to existing reporting and enable you to create deeper analysis and filtering. By creating “skinny tables” of information you can join on a unique key, you make it possible to answer almost every “just one more thing” question your stakeholders ask about a dataset.

要查看或添加评论,请登录

Greg Meyer的更多文章

  • Create a pacing graph with Google Sheets

    Create a pacing graph with Google Sheets

    As an operator, how many times do you get asked: “how are we doing this month vs last month? (Or vs. some previous…

  • In support of "boring" software

    In support of "boring" software

    I am an unabashed technology fan and an early adopter of new things. As a kid, I loved (and still love) science fiction…

  • 5 ways to make your low-code automation more effective

    5 ways to make your low-code automation more effective

    When I started my first software job, I remember thinking two things: I am definitely not the smartest person in the…

    2 条评论
  • Turning daily improvements into milestones

    Turning daily improvements into milestones

    You’ve seen the statistic. 1% improvements daily for a year yield a 37x return.

    2 条评论
  • Building Diagrams with Computers

    Building Diagrams with Computers

    Ethan Mollick writes about AI that “the only way to figure out how useful AI might be is to use it.” This is not…

    2 条评论
  • Redefining the Customer Journey

    Redefining the Customer Journey

    Have you ever played RevOps detective? ??? The story goes something like this. There’s a closed-won (or a closed-loss)…

  • Going from 0-1 in Data Operations

    Going from 0-1 in Data Operations

    Imagine you are starting a new venture and need to describe all the data tasks that need to happen to get you from…

  • An ode to console.log()

    An ode to console.log()

    Some of the first programs I ever wrote on a computer used PRINT to echo a line to the screen. Using BASIC, I filled…

    1 条评论
  • Great performance demands mental preparation

    Great performance demands mental preparation

    The coach will see you now When I was younger I wanted to be a professional baseball player. Professional baseball…

    2 条评论
  • Data Operations, revisited

    Data Operations, revisited

    When I started writing about data operations In 2020 I suggested an example definition that focused on data shared…

社区洞察

其他会员也浏览了