The best of notebooks and the best of Excel, all in one?
Jose Luis Hidalgo
It's (mostly) not my fault that AI will probably kill us all (let alone render us obsolete and irrelevant)
TL;DR: Pluto is new a notebook-like environment for working with the Julia language. While still young, it provides some of the best features of Jupyter or RMarkdown notebooks (the document format, the ease and power of a real programming language, the separation between code and data, etc.) and of Excel (the always-consistent state, the interactivity "for free", etc.), thanks to some functional programming concepts and to the performance of the Julia language. PlutoCon 2021 is the first ever conference about Pluto, currently going on.
The two most common environments to perform data analysis nowadays are, by far, Excel sheets and Jupyter or RMarkdown notebooks. Many professionals go from a sophisticated use of Excel to notebooks, as they decide to "up their game", and what they experience during that transition (which I happen to witness many times, in my capacity of data science professor in a business school) is quite revealing of the strong and the weak points of each environment.
On the plus side, commonly mentioned benefits are the power of what you can do, the size of the data you can work with (and "all at once!", without having to perform thousands of error-prone copy&paste operations) and the ease of querying your data and making simple visualizations. Of course, all those things are real and are a good part of the reason why more and more data analysis tasks are being performed in Python or R everyday.
On the minus side, and perhaps more interesting, some very common complain is that they "get lost", that it is hard to know what you are doing when you are changing things and that notebooks are somehow unpredictable (often you execute the same cell twice and it gives you different results, and you might or might not know why). In short, they complain about notebooks having "hidden state" (just not using those words), and that is of course is a very real problem. Another common complain is the feeling that a notebook, even a nice one, is not a document that can be shared with end users, it is not "a deliverable" in consulting speech. This is not only because it's unfamiliar, but also because the end user would not be able to "play with it", hence the popularity of solutions like Streamlit of Shiny. Let's call those "the hidden state issue" and "the deliverable issue".
I discovered Pluto, a new notebook system for Julia, during JuliaCon last year, not long after it was initially released. I was curious, but at first I didn't really understand the need for a new notebook system for Julia. After all, one of the best notebook systems, Jupyter, not only works fine with Julia, but even has Julia in its name! (Jupyter stands for JUlia, PYthon and R). Moreover, the Julia ecosystem is great and is growing fast, but there are still lots of gaps to fill, which makes "reinventing the wheel" look like a frivolous waste of effort. It and it looked nice, it had some interesting features (like being completely written in Julia, which makes it much easier to evolve by the Julia community itself, and being "reactive", which provides a shorter feedback look and a "snappy feel") and it seemed to me to be focused mostly in "the deliverable issue", particularly for academic use (it is very easy to include math symbols in Pluto both in markdown and in Julia code, reproducibility is good, Julia is a lot less verbose than R which helps a lot to make cleaner notebooks, etc.). But nothing that justified the need for a whole new system, in my mind.
I checked Pluto again a few weeks back, and I've been watching it in use in the excellent MIT "Intro to Computational Thinking" lectures (highly recommended!), and I think I'm starting to get it. It's not just the deliverable issue, it is also the hidden state issue, and in a way that can be a real game changer. But to understand why, we need to go back to the much maligned (by data scientists at least) Excel.
Something that most of its users don't understand is that Excel sheets are programs, thus Excel itself is a programming language, albeit a strange one. And a purely functional one, no less. (Simon Peyton Jones, a key figure in the Haskell world, explains it nicely in this long presentation about the biggest Excel innovation ever that nobody cares about). Being a functional language means, among other things, that functions are pure (given the same inputs, a function will always produce the same output) and that values are inmutable (the program cannot change the data, only generate new data). Those two things are so deeply ingrained in the way Excel works that they feel completely "natural", "logical", to Excel users. You put data in cells, you put formulas in other cells and it gives you new data, that you can then use it in other formulas and so on. If you change some of the data or some formula, things get updated automatically as necessarily, but otherwise things don't ever change. There are not "partially completed results", there are no non- up to date versions of some cells (at least in Excel sheets that don't need to be manually recalculated), there are no cyclic calculations (Excel complains immediately if you try), etc. When the users lose that because they move to notebook type of environments, it is understandable that they get lost.
Julia is not a functional programming language , but Pluto notebooks behave like one and in them Julia "feels" like it. Pluto cells are automatically updated everything anything changes, just like Excel (this, by the way, is possible in large part thanks to the excellent performance of Julia: the first time you execute a cell it needs to compile the code and thus it takes some time, but from that moment on, re-executing it happens at C- like speeds). Variables cannot be reassigned, since that would mean the variable has some value "for some cells" and another value for "other cells" and that would mess with the consistency of the notebook. If you delete the cell where a variable is created, that variable is no longer available, so there is no hidden state: everything that the notebook uses is present in notebook cells. In that regard, it behaves much more like Excel than like Jupyter or RMarkdown. The experience using Pluto is rather different than using other notebooks, and actually rather pleasant once you get used to it, cause you feel a lot more "in control".
There is another added benefit of that: interactivity is a lot easier to achieve. You define a variable at a single place, and if you change it, everything gets updated (only recalculating what is required, which might be a lot less than it seems, since purely functional code can always be memoized), including charts and so on. You might modify it in the code if you want, or you might simple bind it to a control in the very same place that you define it, and it will be available to be changed at any time even by anyone, even without looking at the code (in that regard it is similar to how Streamlit works, but with a big difference in performance and ease of use... I love Streamlit, but being fair, as soon as calculations get a bit complex or execution paths are not lineal, you have to be rather careful and often use ugly hacks). Creating interactive notebooks in Pluto is not just "easy"; it is something you do naturally, without thinking about it, just like you do in Excel, and that makes a huge difference.
In my mind, Pluto addresses the two biggest issues of notebooks for data analysis tasks, and in a very elegant and clean way. It is relatively new and it surely needs to be developed further, but is is already a perfectly valid environment to perform data analysis, as show in many of the PlutoCon 2021 conferences. More importantly, it is a real pleasure to use. But of course there is an elephant in the room: it only works with the Julia language. As much as I like it, Julia is a lot less popular that Python or R and it will keep on being so at least for quite some time. And it is not an accident that it only works with Julia: it depends heavily on some of the features and the strengths of the Julia programming language, you cannot just "create new kernels" for new languages like you do with Jupyter.
Will the relative lack of popularity of the Julia programming language limit the popularity of Pluto? Or might Pluto become the "killer app" that gives people new reasons to try Julia, and eventually convert, and make the ecosystem grow, and close the popularity gap? It is hard to tell (but what it's not hard to tell is which of those two options I'd be happier to see).