How is data flow related to SOLID principles?
Piotr Kruczkowski
Principal Solution Architect AI/ML @Autodesk ?? | @SoftwareMinimalist on YouTube and BuyMeACoffee ?? Data Scientist, Coach - Agile, Lean, Kaizen, Value Stream Mapping, Capability Mapping
You can find the complete code with examples in my GIT repository, available behind the link below. I will be grateful for any feedback and collaboration opportunity. Please send me pull requests, issues, or comments. ;)
https://github.com/Primary-Key/LabVIEW-SOLID-Examples/tree/master/SOLID-Dataflow-Template
Introduction
I was working recently on a project for one of our customers. I was responsible for designing a module in LabVIEW.
Requirements were something I saw already hundreds of times :)
· Wrap a device driver with an API for taking new types of measurements.
· The wrapper should be callable from TestStand.
· It should be extensible for easily adding new measurements using the same or new hardware types.
· It should decouple modules and enable easy unit testing.
· It should allow for performance optimization of configuration, execution, and processing of results.
· It should allow for simulation and injection of test doubles (mock objects).
I wanted to design an architecture that could be easily explained to a team of developers working on their first real project.
I have participated in many architect summits and online discussions. I noticed that when people say “architecture” or “framework”, they often mean a message-based, parallel-process communication solution. Parallel-process communication is only one aspect of software, where the rest are frequently dismissed as "less interesting" problems to solve. Almost no one ever talks about architecture on the level of individual software modules, e.g. how to create a SOLID Measurement Abstraction Layer in LabVIEW, a language different than those by-reference text languages that introduced the need for SOLID in the first place.
How do SOLID principles work on the low level?
What is the architecture of the small?
Beyond on-boarding new developers, I decided this project should also showcase to the LabVIEW community how a well-designed, low-level module works. It should show architecture, not on the level of parallel threads, but on a level of individual modules. After all, only about 5-10% of modules in your design will abstract parallel processes, The majority are just computations flowing by-value between functions.
SOLID Is Not Good Enough
My main influence in this low-level design was a realization that object-oriented code in dataflow languages behaves differently than similar code in by-reference languages. Application development in dataflow more closely resembles development done in functional languages. I believe dataflow programmers should embrace this different way of thinking instead of trying to adopt SOLID as it is used in by-reference languages. Although the SOLID principles are universal, the particular ways to achieve them, often listed in literature, are not. The particulars were invented to mitigate problems caused by by-reference passing of data in text languages like C, C++, C#, Ruby, JavaScript, etc. Their aim was to protect against direct memory access in the global scope of the application by limiting the scope into nicely organized subsections of the applications that interact with each other through encapsulating interfaces.
The spaghetti in by-reference languages is caused by the inability to understand which function is causing what side effects and when. Mutable state is the source of all evil.
By-reference languages have a historical tendency to make code harder to write, understand, debug, and parallelize. Their natures encourage developers to couple modules, which slows down development and makes refactoring risky. Essentially, developers have freedom to make arbitrary changes to global program state from anywhere in the code, so rules had to be developed to limit that freedom. The entire idea of object-oriented programming (OOP) is an attempt to limit that freedom through encapsulation.
OOP only mitigates the problem in those languages but doesn't solve it because OOP does not come from an understanding of what the problem actually is. The SOLID principles give good guidance on generalities, but they do not tell programmers:
"Hey, y'all, the problem we are trying to solve is mutable state and direct memory access. Please make sure you treat all data and operations in the same way you treat integers and addition. That means creating functions that are pure in mathematical sense, do not modify the original objects passed in, but instead produce new objects, with the originals still being fundamentally accessible.”
Now if this makes you uneasy because of all the memory copies, please note two things: a dataflow compiler can optimize reusing the same memory locations because it understands the data dependencies flow, and some dataflow languages additionally use persistent data structures as smart copies.
The reason SOLID doesn’t specify these remedies is because those specific remedies are historically difficult in by-reference languages: there are no smart memory management techniques in current compilers that can analyze the flow of data and decide when particular data is not needed anymore or when it needs to be copied. This means that by-reference code can adhere to all the SOLID principles and still have problems with correct execution because by-reference is the fundamental problem!
I believe every software developer benefits from being well versed in dataflow concepts. But understanding dataflow means working in traditional languages will then feel like working with ball and chain. You will be aware of how much slower you are when programming by-reference, especially when debugging. If you want to future proof your enterprise, you should be transitioning to functional, dataflow languages e.g. Clojure, Haskell, F#, or LabVIEW.
Memory management and code modularity are ORTHOGONAL problems.
This article is not an explanation of functional programming, and I do not go into detail on SOLID principles. My explanations for the decisions behind the design for this template will assume that you have this knowledge.
By-Value Functions and Partial Application
Maintainable code comes from separating where a function is defined, where its arguments are provided, where it is executed, and where it returns the results. If you enable this separation, you are on the right path.
One of the fundamental techniques used in functional languages is passing functions as arguments into other functions. These functions-as-parameters are by-value objects with their own local state. They may be invoked as functions by supplying all of their input parameters. Alternatively, they can be transformed into new functions through partial application, wherein a caller supplies only a subset of their input parameters. But even when all input parameters are supplied, the function is not required to execute at that moment. The function may still be passed around by-value, deferring execution until the moment when the results are actually needed. That is called lazy evaluation.
These two techniques can be used to trivially implement three of the five SOLID principles: single responsibility, interface segregation, and dependency inversion. Languages that allow by-value functions encourage developers to do things right the first time.
Because LabVIEW is a dataflow language, and functional languages are also dataflow languages, LabVIEW programmers are in pretty good place to start with. But LabVIEW is missing these two specific features: by-value functions and partial application. We can mimic those by creating classes whose only job is to encapsulate a single method and its parameters. We can name them callable classes or functors.
This template will show that using callable classes we can get Reference, Measurement, Configuration, and Result abstraction layers in a single, clean design. This design can be a starting point in every project you work on. Modules produced with this design have much higher chance of following SOLID principles, and it is all inspired by functional programming.
The Design
This architecture example contains 4 interfaces and a single top-level class responsible for combining, calling and caching callable classes in maps.
The implementations of these interfaces flow by-value into and through the top-level class which gives shape to their interactions.
We have here an interaction and interdependence of abstraction layers:
- Reference - encompasses the nature of working with references, generating side-effects while limiting the reference interaction to one API. Should be used for files, databases, hardware, shared memory etc. This should also be the only abstraction which works with side-effects in your design.
- Action - given specific reference, abstracts the actions which can be performed on that reference e.g. generation, measurement, reading, writing, setting, getting, triggering, waiting for trigger etc. It is not reference based itself, but rather instantiates pure by-value flowing objects.
- Config - given specific reference, abstracts the configuration values and modes of operation of the reference e.g. path for file access, database connection details, power setting for hardware generation etc. It is not reference based itself, but rather instantiates pure by-value flowing objects.
- Result - produced as an output of an Action acting on specific Reference, represents the Results which can be produced by the action. It is not reference based itself, but rather instantiates pure by-value flowing objects.
Configuration, Action and Result classes are closely related and defined to interact with a specific Reference class. I was thinking about also adding an Alarm abstraction, but didn't see an obvious place to put it. It's one of those things depending on requirement details, so it's your choice where it needs to be :)
You might not always need all these interfaces in your design, certainly not for every module, but this is always a good place to start. This level of flexibility handles a vast majority of requirements you will see in real applications, and can always be scoped down or expanded when needed. For example, if no configuration is required, simply delete the method invoking Config Core.vi and delete the interface, or just keep it and don't implement the interface (there is no NotImplemented exceptions in LabVIEW... yet). Choice is yours.
Additionally I prepared some implementations of these interfaces in template example. Look inside the virtual folders.
The keen eyed might recognize the Channeling Pattern here, where the Top Module is responsible for defining common functionality around core polymorphic method calls, specified by different abstraction layers. I use the word polymorphic in the object-oriented sense, not LabVIEW sense ;)
When building a final application you will integrate multiple different Top Modules. Each of them will have clearly separated responsibilities, and they will interact by passing by-value configurations, actions and results to one another.
In your final design, composed of multiple modules following this template, there is no hard-coded dependence plus side effects are controlled and isolated. It makes it easier to integrate modules into bigger designs, and reuse them between projects.
Since I figured out this approach all of my projects are broken down into such modules, and they all begin from the presented template.
Lets now discuss the abstract class interfaces in more detail.
TopModule.lvclass
The responsibility of Top Module is to establish the relationship between all interfaces and prepare common methods for the modules functionality. The internal data of this class contains maps collecting the configurations, actions and results storing the data required to work with the reference abstraction.
The methods expose the interface of those maps. We have here:
Set Reference - constructor of the Top Module defining the specific class implementing the Reference interface. This will be the Reference used by Config Core and Action Core polymorphic methods, defined by interfaces.
Define Config - defines possible configurations in Config map. Examples are file paths for File implementation of Reference interface, database connection details, hardware refnum, waveform to generate, power and measurement details etc.
Apply Config - from the map of available Configs extracts one by name and dispatches the Config Core method on the reference defined in Set Reference. Notice that both the reference and the config object itself might be modified in the core, since both object are returned back into Top Module data.
Define Action - defines possible actions in Action map. Examples include measurements, generation of signals, setting and getting values in shared memory, writing and reading values into files, databases etc. One unsolved question is what to do when Action does already exist in the map.
Run Action - from the map of available Actions extracts one by name and dispatches Run Core method on the reference. Returns a Result object and stores it in Result map with the same name as the Action. There is also a minimal additional functionality of monitoring timeout and only preserving the results if there was no timeout. If the requested action name does not exist in the action map, an empty action interface will be called. You might want to modify that behavior too.
Obtain Result - from the map of available Results extracts one by name and returns at the output.
By no means is this example a complete framework or architecture. There are many things missing, but since you can use it in any way you want it is more reusable. It is simply a starting point for new development. Some things missing are application specific error handling, your own interface implementation classes, optimization and maybe your final design might not need all the example interfaces.
IReference.lvclass
The interface starting point for the Hardware Abstraction Layer. It has no methods, and obviously it has not data. It is intended to be either implemented with your specific hardware abstraction inheritance tree. It depends strictly on your requirements, which is the reason I didn't specify any methods. It is however the class used to "construct" your Top Module, which wraps and encapsulates HAL and passes it into Core methods of other interfaces.
IConfig.lvclass
An interface defining two methods Prepare Config and Config Core.
Prepare Config - can be dispatched outside the Top Module methods as a standardized way to initialize all configurations e.g. when you need to allocate a big array or do any single-shot heavy operation.
Config Core - dispatched inside the Apply Config method of Top Module. Each Core will expect to work with a specific reference class, and if an incompatible reference/config combination is invoked, it will result in an understandable error.
NOTE - This is the first place where we see how important the relationship between a reference and its' configs, actions and results is. These classes cannot live without each other, which is the only reason we can safely use the To More Specific function inside their cores. This intimate relationship was required to build a template inspired by functional programming and first-class functions passed by value.
Both reference and the config itself are returned from Config Core and their new values are stored again in Top Module, since the application of configuration might modify both.
IAction.lvclass
Defines the Action Core method interface. Depends on the IResult interface, as the Action Core method returns an IResult object.
It was reasonable to include a Double timeout input representing seconds and Boolean timeout output, informing the user if the Action succeed or not. This was based on an assumption that it's needed in most designs.
Another assumption was that the actions are single-shot and not continuous. If you have an action related to continuously waiting for some conditions, the template might require a modification. This modification should be a separation of Top Module threads into two, one calling the Action Core and an asynchronous one, waiting for the return of Result objects. This modification was large and I would be afraid I wouldn't be able to get my point across. Lets leave it for another time.
IResult.lvclass
An abstract wrapper over data returned by Action Core, required by static typing. It is not very important for the design and a string XML or JSON representation of returned data might do an equally good job, however I put it there for logical consistency. It also enables dispatching a Process Result method, which is analogous to Prepare config method.
Taking a step back
Analyzing the template and the by-value design in more depth you can see three fundamental types of nodes: sources, transformations and sinks. This is the functional inspiration for the design. If a module does not need to produce side effects all of its' methods should be considered pure transformations.
Sources - This is the entry point from side-effect driven world of references into pure dataflow world of values and immutable data. You can see this behavior in all control terminals in LabVIEW, but also all by-reference read operations like DAQmx calls, reading data from file etc.
Sinks - Similar to Sources, without side-effects there would be no reason for Sinks to exist. They are the indicators, the write operations, the signals generated on hardware.
Transformations - Pure dataflow functions, working with data from clusters, arrays, sets, maps and simple scalars. It might seem we are modifying variables, when we make our methods bundle new data into clusters, but we are in fact creating new clusters by transforming the originals. The old cluster is still available, when we simply fork the wire before our operation. This simple concept of variables not being needed due to dataflow is profound and important. Every functional programmer strives to build applications following the dataflow model of computation. LabVIEW requires it.
A correct dataflow design strives to put Sources and Sinks on the edges of your algorithm, with all Modifiers being purely manipulating data between them. Between the side-effect based edges we do not need any references. As everywhere the 80/20 rule applies here, or maybe it should be 95/5 rule, because side effects really don't need to be used that often.
LabVIEW has great language constructs and patterns to help you think like that e.g. Queue based producer consumer, event structure and many more. Yes, you can create data value references, but that is often counter-indicated, so please don't :)
Application Example
https://github.com/Primary-Key/LabVIEW-SOLID-Examples/tree/master/Template-Logger
Taking the template as a starting point, I will be implementing an example module for event logging. It includes a reference abstraction for different logger types, configuration abstraction for settings for these file types, action abstraction for different types of events being logged and result abstraction for errors on logging.
The video visualizes how quick it can be to switch between configurations or abstractions in an module designed this way. You gain the interfaces required to protect you from the changes in your requirements.
If you did not have the reference abstraction layer, you would need to redesign your module to add another logger type. If you didn't have the configuration or action abstractions you would not be encouraged to design in an extensible way and your design would be dominated by case structures. There is nothing wrong with case structures for small jobs, but the more options you have and the more you want to distribute your work in a team, the poorer they behave.
Nested Polymorphism
It is important to understand the need for nested dynamic dispatch setup in this example. Whenever you have a hardware abstraction layer and configurations or actions on top, you will have a choice. You can either hard code the requirement that one class depends on another and use the To More Specific Class function, or you can either create a nested dispatching functionality. This requires the classes to have clean interfaces between each other. You can see this happening in the Run Core.vi
Overkill Features?
There are many features of this template the current Logger example does not benefit from, which might be viewed as overkill, gold plating or some other creative name. I totally agree. If you do not need features like, storing configuration or results in maps, get rid of those.
On the other had if you would like to store all possible hardware abstraction in a map, this is a feature you can add to your Top Module.
Additionally the current design of the Tester.vi , and using the template API might suggest that this is the way the package is intended to look like for end users. This is not the case. You can either simplify the API or just pack those Define Config, Define Action, Run Action etc. into SubVIs.
The ideas shared in this article can be transferred to any language.
Software Architect at Siemens Gamesa | CLA | LabVIEW Champion
4 年Nice Article Piotr.? Very interesting approach, more even with the use of Interfaces. One of the complaints from non-OOP programmers is the so-called mutable state and the consequences of it. I also see this problem when people try to use by-reference classes in LabVIEW.? The power of LabVIEW is on the dataflow principle, and I believe we can leverage it more now that we have Interfaces.
SpaceX Principal SW Engineer/LabVIEW's Aristos Queue
4 年"test doubles" is not a phrase I'm familiar with. I assume this is equivalent to either "test values" or "mock objects"?
Principal Solution Architect AI/ML @Autodesk ?? | @SoftwareMinimalist on YouTube and BuyMeACoffee ?? Data Scientist, Coach - Agile, Lean, Kaizen, Value Stream Mapping, Capability Mapping
4 年Stephen, Joerg, Fabiola, Allen, Steve I view you as the apex-programmers in LabVIEW. I would greatly appreciate your feedback on the template presented above. Thank you so much!
Hey Piotr :) , I think it is a very nice article, so congrats on that - it is always a pleasure to read something well thought through, something which makes you just as well think and question. Regarding the concepts exposed -> I would have some comments to make: 1. I believe that as it was described here, this approach, breaks the concept of object encapsulation. Instead of having all of the functionality packed in a single object, you spread it across multiple types of objects, which operate on another object...so you get something akin a "horizontal" hierarchy. You are taking the "traditional" SOLID view point and steer it in another direction. Personally, I am still looking forward to see how can this improve the workflow when working on larger scale projects. 2. If I am looking through the project templates that you posted - very good examples btw - and if I am taking a configuration object - for example a "FileConfig.lvclass" object. When you are applying this configuration in the "Config Core.vi", you still need to "know" the exact "hardware" reference you operate on. The way to do this in LabVIEW, is ofc to typecast to a specific class of object, but this is also creating a static dependency between the configuration object, and the reference object that it targets ... honestly, I don't see why in this case it wouldn't be better for it to actually be a method of the HW reference object. 3. Passing functions by value - well, this is something very much done for example in JS. JS is also largely a by value programming language. Passing functions around is somewhat of a necessity in JS (actually, in any text based programming language) - primarily, you need a way to define callbacks when specific events occur, but also a lot of core JS functionality is based around this. However, since JS is a weakly typed programming language, passing functions as parameters in other functions is quite easy and it's common practice. In LabVIEW, as you also mentioned in the article, things are somewhat different. But I still don't think you actually must wrap VIs into objects to pass them as parameters - you can open a dynamic reference to a VI, then pass that reference to another VI that needs it and can execute it. This can yield the needed code flexibility, without the extra complexity. That's my two cents on this ... otherwise, it was a pleasure to read through the article and put some thought into this topic :D .