登录查看更多内容

How to generate complex test data

Andrew Magerman

Requirements Engineer | Software Developer | Event Organizer | Python | CI/CD. I help companies kill complexity.

发布日期: 2018年3月29日

I had the pleasure in 2017 of working closely with Josef B?sze and Ralph Schibli, both partners at the boutique consultancy itopia.ch. The project I was on involved the production and configuration of synthetic data.

In this case, the customer, a bank, was rolling out a new version of its core software and needed a reliable source of test data.

the rapid advances in machine learning are making true data anonymisation impossible

Cloning and anonymising production data could not be used, first and foremost because banks are understandably very protective of their customer's data (and the rapid advances in machine learning are making true anonymisation almost impossible), and secondly because since we were testing a new system which had a different data model than the previous one, some historical data simply did not exist.

The solution is synthetic data, i.e. data created from scratch. itopia created a tooling suite, iSynth, which enables you to scalably synthesise your test data.

iSynth creates several abstraction layers above the actual data structure by creating a model of your system. Generating test data for the system then becomes an exercise in manipulating the model.

Practically this means that, once the system has been set up and configured, one can generate a wide variety of use cases by manipulating the model objects, instead of dealing directly with data. It means that, once you have done the intellectually challenging work of modelling the data structure, you can generate any sort of weird combination accurately and very quickly.

The somewhat cumbersome work of mapping the model to actual database tables is the time-consuming part of the endeavour. You need local experts who know the underlying databases and interfaces off by heart - what does that particular flag in that column do, exactly? How is data consistency enforced?

It's the availability of these experts which is more often than not the bottleneck for generating correct test data.

Once the iSynth system is set up, however, you don't need that expert knowledge to such a continuous extent anymore (whereas other traditional systems of test data generation need to consult with these experts whenever there is a new requirement)

any new requirement is usually met in a matter of hours

The upside to all this is that any new requirement is usually met in a matter of hours, not days, and without any expert knowledge of the underlying systems.

iSynth also lends itself to automatic generation of test data for training purposes. The use cases which we actively implemented were the generation of datasets for training purposes, on special training systems. The beauty of this is that the result is deterministic - once you have determined the testing use cases, generating all the required data is just one click away.

The other immediately recognisable use case is in the generation of test data, useful in the case of load testing (five million records, anybody?), but also - and here iSynth really shines - the generation of rare data constellations - perhaps your biggest customer has a really complex setup, with multiple exceptions? No problem for iSynth.

The next stage of the development is integrating the generation of synthetic data within a continuous integration application lifecycle, with unit tests which can be run against a known set of test data. iSynth can be distributed as a separate docker instance.

I had a most enjoyable experience working with the itopia team; to my knowledge this approach is unparalleled and very promising.

If you have complex requirements for synthetic test data, have a look at iSynth's fact sheet on www.itopia.ch/synthetic-data. I'd be glad to introduce you to either Ralph Schibli or Josef B?sze at itopia.

要查看或添加评论，请登录

Andrew Magerman的更多文章

So long - The Swiss Notes User Group (SNoUG) says goodbye

2022年4月5日

So long - The Swiss Notes User Group (SNoUG) says goodbye

Founded in May 1993, the Swiss Notes User Group was one of the oldest user groups focused on the Notes, Domino…

24 条评论
Digital privacy is a public issue

2021年5月28日

Digital privacy is a public issue

It’s time to walk away from WhatsApp and Facebook In much the same way that vaccination against COVID-19 is not only a…

1 条评论
Profi-Tastatur für Entwickler: Das Swiss Developer Keyboard

2018年8月31日

Profi-Tastatur für Entwickler: Das Swiss Developer Keyboard

Das Swiss Developer Keyboard ist eine speziell für Schweizer Programmierer und IT-Nerds entwickelte Tastatur. Keiner…

13 条评论
Modernization of Ortho-Team's Reservation System, a success story

2018年6月10日

Modernization of Ortho-Team's Reservation System, a success story

In 2017 Ortho-Team released a new version of their website, and the optics of the reservation system clashed with the…
Thankyou for having attended SNoUG - here are the photos

2018年4月29日

Thankyou for having attended SNoUG - here are the photos

hanks to our generous sponsors, IBM, Groupwave, We4IT GmbH, BCC, stanoc, and Midpoints GmbH for making it possible!…
Insights into Machine Learning at Jazoon

2017年11月3日

Insights into Machine Learning at Jazoon

Jazoon is a tech conference in Zürich which manages to bring brilliant speakers to talk about a particular subject - on…
6 Insights on Artificial Intelligence

2017年9月10日

6 Insights on Artificial Intelligence

At the 44th DNUG conference in Berlin, Tim Bunkus made an excellent introduction to AI and the currently available…

2 条评论
How to be a better Software Developer - Five Insights

2017年4月23日

How to be a better Software Developer - Five Insights

I attended the Voxxed Days Zurich 2017 Developer conference and it was a refreshing change of scenery. There were more…

1 条评论
Thanks for making the Swiss Notes User Group conference a success!

2017年4月13日

Thanks for making the Swiss Notes User Group conference a success!

We wanted to thank all of you who came to our event in Zürich. A big thank you to the generous sponsors, without which…

2 条评论

See all articles

How to generate complex test data

Andrew Magerman

Requirements Engineer | Software Developer | Event Organizer | Python | CI/CD. I help companies kill complexity.

Andrew Magerman的更多文章

社区洞察

其他会员也浏览了

What is the difference between raw and processed data?

What is Data Preprocessing?

Reproducible Data Analysis Workflow in R

Outlier Detection in Data Science: Techniques and Use?Cases

Introduction to Reproducible Analytics: Ensuring Consistency and Reliability in Data Science

The Art of Data Analysis ????

Elevate Your Data Game: Mastering Data Cleaning and Preparation for Accurate Analysis

How can data scientists use their leadership skills to improve quality?

Applied Data Processing Process for any ML Project

Data Analysis: Unlocking Insights from Raw Information

Andrew Magerman的更多文章

So long - The Swiss Notes User Group (SNoUG) says goodbye

Digital privacy is a public issue

Profi-Tastatur für Entwickler: Das Swiss Developer Keyboard

Modernization of Ortho-Team's Reservation System, a success story

Thankyou for having attended SNoUG - here are the photos

Insights into Machine Learning at Jazoon

6 Insights on Artificial Intelligence

How to be a better Software Developer - Five Insights

Thanks for making the Swiss Notes User Group conference a success!

社区洞察

其他会员也浏览了

What is the difference between raw and processed data?

What is Data Preprocessing?

Reproducible Data Analysis Workflow in R

Outlier Detection in Data Science: Techniques and Use?Cases

Introduction to Reproducible Analytics: Ensuring Consistency and Reliability in Data Science

The Art of Data Analysis ????

Elevate Your Data Game: Mastering Data Cleaning and Preparation for Accurate Analysis

How can data scientists use their leadership skills to improve quality?

Applied Data Processing Process for any ML Project

Data Analysis: Unlocking Insights from Raw Information