Stable Diffusion
I've been slowly getting the various "popular" or "flavour of the month" AI things up and running on our Graphcore systems. This week I got LLAMA 2 running which was surprisingly easy, although in the process I've determined that the issues I had with making a "Dolly" based chatbot work on the system earlier were defintely due to a clash between langchain and the way Graphcore's own pipelines work (one to play with in more detail later).
On Wednesday I decided to try Stable Diffusion. Stable Diffusion is of course controversial for all sorts of reasons - the potential for people to use it instead of paying artists, the extremely fuzzy situation on copyright (eventually it may not be legally fuzzy but I'd argue it's at best morally fuzzy) - and the fact that it and related tools are being used by extremely weird and unpleasant people to generate adult content of non-consenting people - but the goal here is merely to prove that it works on this platform and move on.
And yes, it works. I had to do some fiddling with library versions on our system, probably because we have newer versions of Poplar than the examples were written with.
I also decided to try a version of Stable Diffusion (1.4) that was not in the examples.
I discovered a couple of interesting things.
The first is that Stable Diffusion 1.4 and 1.5 seem to be "better" than 2.0 on the same prompt and settings. In order to trigger a "graph compile" step (which you only have to do once), the examples tell the model to generate an image for the prompt "apple".
Now, obviously this particular fruit is poisoned by the company existing, but many times the older models generated actual fruit. 2.0 basically never did. There's also a big style difference going on. Overwhelmingly, the output of 2.0 is more cartoonish, looks more like (dare I say it) the output of previous image generative models. I gave all three models the same prompt "an Atlas battlemech from the battletech universe liberating an alien city, oil paint". They all did not know enough to draw the correct mech, however, it was clear that the style of 2.0's output was much worse:
领英推荐
(note that in all of the above I generated 12 images and picked the best one)
What is going on here? Is the actually worse? One friend on twitter suggested that later versions had "copyright problematic" material pulled from the training set, which is possible.
The other thing I discovered quite by accident (the issues with the wrong libraries) is that inference of Stable Diffusion is perfectly fine on CPU as long as you have a fairly beefy CPU - it takes 30 seconds or so on 96 cores of AMD Zen2 and about 2 minutes on 36 cores of Intel Cascade Lake Xeon.
For comparison, once the initial graph compilation has happened, it takes less than a second on 8 of the IPUs in our Pod16.