Names adventures
Andrey Lyashin
Co-Founder & CTO @ Pruvendo | Formal Verification, Blockchain security
Today I would like to visit with some ideas on naming when performing audits and verification of software.?
As perfectly written in “Category Theory for Computing Science” by Michael Barr and Charles Wells: “Would you recognize Einstein's equation if it said p = HU2?”. That is actually what we see in program verification…
Say, we have a voting software system and somehow calculate the votes for candidates, and want to sum them to show global invariant. Then suggest we have two different procedures which give the same sum but with permuted order. Suggest that to show that invariant holds in terms of math, we need to prove in some sense the fact?
votes_for_candidate1 + votes_for_candidate2 = votes_for_candidate2 + votes_for_candidate1
That is true for sure. We know we may forget about the real amount of votes and quantify the equation for both votes values. That reads:
? (votes_for_candidate1 votes_for_candidate2: N),?
votes_for_candidate1 + votes_for_candidate2 = votes_for_candidate2 + votes_for_candidate1
where N stands for natural numbers. We know that we may change the names of quantified variables to any allowed strings. So from the mathematical point of view there is absolutely(!) no difference between the previous expression and expression like that:
? (number_of_legs_of_elefant price_for_oil_barrel: N),?
number_of_legs_of_elefant + price_for_oil_barrel =?price_for_oil_barrel + number_of_legs_of_elefant
As there is no difference, we can simply apply the later theorem to our goal about votes and perfectly prove it. However, I hardly believe that someone with a clean mind would formulate the addition commutativity property taking the names for variables like that. And again, from the mathematical point of view, there is no difference which names will be taken for quantified variables. I would like you to imagine the case when you are trying to give a formal specification for a program where votes for Candidate1 are denoted as number_of_legs_of_elefant, and votes of Candidate2 - price_for_oil_barrel. I claim that the process of writing such a specification will take more time and even more mind force to accomplish.
As computer language variables’ names usually also have no additional meaning, a correspondent program can be absolutely correct and verifiable in terms of formal properties. So we do not have any theoretical restrictions here - there are no implicit or strict rules in computer science to give specific names to variables. But not in physics… Recall two potential fields laws of Newton and Coulomb:
领英推荐
Again from a mathematical point of view there were absolutely equivalent formulas, but we know that the first one is about gravitation and the second one is about electricity. And from the first look they do not have any intersections in physical meaning. So if we pretend that we speak about gravitational force showing Coulomb's law, we’d very likely remain unclear for an audience. But if we study mathematical properties of those equations, we may easily change any name even for a number of elephant legs denoting mass or charge of the first element and oil price for the second - nothing will change in the mathematical properties. So what is the difference?
I propose that both a computer program and a mathematical formula are in some sense a modeling reflection of the suggested reality, whatever it means. And the model hides the naming internal semantics, and it implicitly forces us to denote a mass by m, and electrical charge by q. Just to be understood! Mathematics and computer programs at the same time are subject to understanding by computers, not only human beings. And when dealing with models we can forget initial “real” objects until we’ll try to return and explain to someone their revealed properties. So we can think of math as a model for physics, and programs as a model of business processes, for example.?
That means that we need to imagine reality when specifying computer programs as well as we do that when studying physics. Physics laws have no meaning without a clear understanding of objects they are about.?
Now we all know that physics is fully satisfied with one-character notations for most of the objects it studies. But that is not true for programs… Usually a developer needs to give more self-explaining names for variables in their software than just one symbol. Are business processes so different from physics? Can they have universal notations in theory? Those are open questions for me. However, I can give one more example of computer program which is satisfied with short names:
foldr f z [] ? ? = z?
foldr f z (x:xs) = f x (foldr f z xs)
That is the list right fold, expressed in Haskell. You see that almost all names are one symbolled. Why can we still understand that, does it resemble just a math formula? Suggest the same function written like
remove cat passport [] ? ? = passport?
remove dog institute (granary:bicycle) =
dog granary (remove dog institute bicycle)
Do you still understand this as well? What about
nvbekjv verrg ewfer [] ? ? = ewfer?
nvbekjv ferwf jyuytyy (ferwf1:kuikmh) =?
ferwf ferwf1 (nvbekjv ferwf jyuytyy kuikmh)? ?
I think that humans cannot get those stuff uniformly. What happens here? From a mathematical and compiler point of view, all three formulas are specifying a list right fold. And yes, they definitely are!?
Finally, I would like to emphasize that the specification of computer programs is much closer to physics than to pure mathematics; it can still be done without a clear understanding of the "real" world, but that helps not more than explaining the senses of a person fallen in love by chemical reactions in brain:)