The True Cost of an Excel Loader
Do you work in a regulated industry, with large volumes of source data, performing analyses in Excel? Do you join those analyses with other departments' data, for aggregating into group reporting results?
Perhaps you are being burned by broken links, voting to veto VBA, or otherwise making a migration from manual macro driven processes into a secure and scalable solution for ad-hoc business data management. You need to get your Excel data, into a database, fast.
You ask around.
Can You Integrate This Excel Into Our Group Reporting Flow?
Sure! You may hear. 5 minutes. Maybe less. There are 1000 quick fire ways to do this. In fact, our database has a button that lets you load data directly from Excel. It's easy - my cat could do it.
Except... In the corporate world... It's never easy. There are two main approaches to load Excel into a cloud database, or any data platform - the first is the quick-fire method above. Or should I say - quickly fired. Whilst fantastic and convenient for ad-hoc server-side analysis, such "uncontrolled" ingestion of business data into a Data Warehouse or Reporting Layer will tend to quickly spiral into a "data swamp" of undocumented, highly duplicated data that nobody dares to delete as nobody really knows where any of it came from, what it means, or what it might break. Being outside of IT governance, such reporting processes are typically classified as "End User Computing" with associated risks and additional audit costs.
The true cost of uncontrolled Excel uploads therefore include:
You could say that this approach is "buy now, pay later". So what to do instead?
You need to get your Excel data, into a database, fast.
A Rock and a Hard Place
The second way could be described as the "Hard Place". It's the IT route. Yes - it's easy to load Excel into IT using a plethora of enterprise tools. The development part is trivial. The hard part, particularly in large organisations, is the governance and due process - submitting requirements, testing, promoting through multiple environments, representation at Change Advisory Boards, setting up SLAs, waiting on key stakeholders.
At the end of it you will have a 'hardened' ETL process that typically expects an Excel file to manually arrive on a (hopefully secured) shared network drive, on a specific date, with a specific filename, columns on a specific sheet in a specific order with correct data types. What could go wrong?
Of course, all problems can be addressed, and a range of scenarios dealt with, it's just... expensive. Not just from a delivery and deployment perspective, but also the opportunity cost - all that time spent convincing IT to load your spreadsheet, and then for them to actually do it, could have been spent delivering other strategic projects. Especially for simple spreadsheet uploads.
The result of this friction is that a large amount of business data will never make it into IT controlled databases or data lakes.
The Third Way
What if you didn't need to write code or jobs to match columns in a spreadsheet with a target table?
What if business users could drag & drop Excel into IT Controlled databases using only a browser?
And what if that same application would enforce an approval workflow to assign responsibility for data changes, and trigger additional jobs to run on data change events?
This is not the stuff of dreams. It's one of many features delivered by Data Controller for SAS?, which makes use of the powerful SAS/ACCESS engines to let business users load data safely into databases such as Postgres, Oracle, SQL Server, Teradata and Redshift, not to mention regular SAS datasets and CAS tables.
The basic flow looks like this:
领英推荐
An Editor changes some data - either by loading a spreadsheet, a CSV, or making changes directly in the web interface. Assuming the validation checks pass, the data is transferred securely to a backend Staging Area.
One or more Approvers must approve the requested change, by reviewing the differences between the target table and the uploaded data (if 6000 rows are uploaded but only 6 are changed, only 6 rows are reviewed). Following approval, the database is updated, and any post approve jobs are triggered.
An Auditor can view the change history of any table, see who submitted and approved them (and any reason supplied), and even download a copy of the original Excel file.
Is it Secure?
Data Controller is significantly more secure than staging data with shared network drives, or letting end users update target tables with their own credentials.
Authentication is handled using SAS Logon - the same as any other SAS Web app, fully integrated with enterprise authentication providers such as LDAP.
Most importantly, separation is made between the owners of the data model (IT), and the owners of the data itself (the business).
We don't change the data. It comes from source systems.
If this is truly the case, that all your IT Controlled data comes 'untouched' from source systems, with no business inputs - then your reporting system is almost certainly based on Excel or a business owned database.
Even if you have somehow built a regulatory reporting system that requires no ad-hoc data, there may still be a need for adjustments to deal with data quality issues or manual overrides. The following video illustrates a potential business situation.
Our data is precious. Can we trust this?
Data Controller is battle hardened with dozens of deployments worldwide. Esoteric gotchas such as high precision decimals, encoding differences, short numerics, multiple metadata repositories and more have been captured and resolved. We have many layers of testing in the product itself, and the core components have been running on customer sites for over 10 years. We have official reviews from Der Touristik, Siemens Healthineers and Allianz Insurance.
What else does it do?
The following features are also built in:
Most importantly, separation is made between the owners of the data model (IT), and the owners of the data itself (the business)
Summary
If you are:
Then Data Controller is for you. Our plans start at 5 users - if you need less than that, there's nothing to pay. To arrange a demo, installation, or discussion - do get in touch!
Military Retiree,Management Scientist, Public Speaking, SAS Data Scientist, University Lecturer,Catechist
2 年well said. I left my employment with SAS 10 years ago and stepped out of the corporate working data analysis world 3 years ago - so I dont know the current state of affairs. but I do see Microsoft spreadsheets continuing as the primary tool when SAS and other data management tools offer so much better integrity and analysis capabilities
Professional Engineer, Business/Data Analyst, Energy Researcher (Renewables), Storyboard Creator
3 年Great insights, Allan. More to come, eh?
Meet next generation financial product management
4 年Mort, I thought I knew all SAS PROCS , gents you are adopting legacy thinking, time to look at WPS
Unlock SAP in a SNAP | Find, Fix and Enrich ECC6 and S4/Hana | Master SAP data, Update SAP transactions, Automate SAP processes - Report, explore and validate instantly without code - Finance, Operations & Supply Chain.
4 年I like option 4. Leave the data in the operational / ERP system like SAP as is and virtualise it in Excel only - If you have to export data from excel to a DB then something is very wrong - This way Excel is derisked, the users get the precision accuracy they need, data warehouse is not stuffed with pointless redundant data and SAS consultants can focus on Proc Genmod and Proc Inbreed where they belong. It takes a few hours to set up and requires no consultancy... What's not to like?