A bug that made me a better developer.
Bugs are evil.?
No developer likes working on bugs because it's boring and not rewarding.?
Not all bugs are equal. While majority of them don't involve special skills, some will require true engineering mindset and creativity.
Here is a little story about?such a special bug I recently experienced myself.
I built a minimalistic anonymous photo sharing app on my spare time -- https://www.wisaw.com?
The app is dead simple -- take a photo with a mobile device, no registration required, the photo gets uploaded to the cloud automagically, and everyone is able to see it right away. The unique aspect of this app --?it's crowd moderated. Anyone can delete any photo they don't like at any time.
The first MVP was built in 7 days. The stack I used for the MVP:
Took me another couple of years to optimize performance, and streamline the UX.?
I was re-factoring to build Minimalistic Product on Minimalistic Architecture.
My little app was nice, and simple, and beautiful. Keeping things simple always takes some extra effort.?
Everything was going great. A little slow at times, which is typical for a pet project built in spare time. In Jan 2021, I started to notice something strange -- my iPhone would randomly crash while I was using my little app! There were no Crash reports in AppStoreConnect. There weren’t any exceptions in the logs -- the screen would simply turn black and show a spinner for 30 seconds. Then it would bring you to a locked screen asking to enter the?PIN?to unlock the device.?Meanwhile, the Android app worked just fine. And on top of that, it only seemed to affect prod devices -- I was never able to reproduce the issue in dev.?
Hard to say when exactly it happened -- I started to notice it after upgrading React-Native stack to Expo, and the first thought was that there is a bug in Expo.
I also implemented my own Image Caching solution, because react-native-fast-image does not work with expo managed workflow. My caching solution worked extremely well for me, which I open-sourced?https://www.npmjs.com/package/expo-cached-image .??
It would take between couple of days and couple of weeks for the issue to start appearing, and the only way to make it go away?was to delete the app from the device and install it fresh from the app store. Restarting the app, or rebooting the device would not help.
All this lead me to believe -- there is some state that is accumulating on the file system, which eventually causes the device to crash. And I was indeed accumulating a lot of state in Image Cache, which was persisting to Storage.
I reached out to Expo for advice and a new forum topic was created: My expo app is crashing in prod, how do I troubleshoot it?
The Expo team was super helpful and explained how to get logs from a production device.?Unfortunately, these logs were cryptic and not very useful to me -- I'm not an Operating System engineer, I'm an app developer:
Jun720:29:29kernel[0]<Notice>:1805.088 memorystatus:killing_top_processpid604 [securityd] (vm-pageshortage1)6480KB - memorystatus_available_pages:7069
Jun720:29:29kernel[0]<Notice>:1805.094 memorystatus:killing_top_processpid616 [trustd] (vm-pageshortage1)3184KB - memorystatus_available_pages:6715
Jun720:29:29wifid[353]<Notice>:__WiFiServerClientTerminationCallback:Clientcarkitdterminated,cleanupstate
Jun720:29:29kernel[0]<Notice>:1805.096 memorystatus:killing_top_processpid355 [assistantd] (vm-pageshortage1)9696KB - memorystatus_available_pages:5276
Jun720:29:29kernel[0]<Notice>:1805.100 memorystatus:killing_top_processpid391 [biometrickitd] (vm-pageshortage1)2512KB - memorystatus_available_pages:5013
Jun720:29:29kernel[0]<Notice>:1805.102 memorystatus:killing_top_processpid324 [mediaremoted] (vm-pageshortage1)2976KB - memorystatus_available_pages:5042
Jun720:29:29kernel[0]<Notice>:1805.103 memorystatus:killing_top_processpid383 [cloudpaird] (vm-pageshortage1)3760KB - memorystatus_available_pages:5038
Jun720:29:29kernel[0]<Notice>:1805.104 memorystatus:killing_top_processpid483 [suggestd] (vm-pageshortage1)11616KB - memorystatus_available_pages:5079
Jun720:29:29kernel[0]<Notice>:1805.106 memorystatus:killing_top_processpid384 [searchpartyd] (vm-pageshortage1)5952KB - memorystatus_available_pages:5065
Jun720:29:29kernel[0]<Notice>:1805.109 memorystatus:killing_top_processpid331 [nanomediaremotelinkagent] (vm-pageshortage3)2752KB - memorystatus_available_pages:5078
Basically, this log indicated that at the moment of crash,?iOS thought some application is using too much memory, and silently killed it.??
I went back and forth with the expo team, insisting that it has to be something with the Storage, while they were pushing back that there is a difference between RAM and Storage, and in my case the app is using too much RAM, and that's why iOS kills it.?
It turns out we were all correct in our own ways -- the issue was related to both,?the RAM and the Storage (continue reading to the end).
But before the mystery was solved, I had to take few extra steps.?
I stored cached images on the file system, so I tried to use Expo's
领英推荐
FileSystem.documentDirectory
Instead of?
FileSystem.cacheDirectory
Strange thing about FileSystem.cacheDirectory -- you never know how much Storage it utilizes. It's another one of those mysterious iOS things (like RAM) which is handled automagically. I even went on a rant with the Expo team trying to convince them that there is some issue with how FileSystem.cacheDirectory utilizes resources -- you never know how much storage it uses per app. The iOS can clean up the files in this folder as needed, but you never know when that's going to happen, and the amount of storage utilized by the FileSystem.cacheDirectory per different app is never reflected anywhere in the device runtime stats. Of course the Expo guys pushed back again and said -- everything is fine with how FileSystem.cacheDirectory is implemented.??
All this made me question if there is something wrong with how I render the thumbnails -- and surely that's exactly where the problem?was.?
I always thought that thumbnail were invented back in the web era to conserve the network bandwidth.?Here was my excuse for being lazy with the mobile app: I thought, if I have a full size version of the image already available locally, I can simply stick it into the thumbnail view. Adding the full size image to the local cache for thumbnail URL?would also save an extra trip to the server next time. The only problem with this approach was that on iOS, rendering an image on the screen will take an amount of memory proportional to the size of the underline image file, regardless of the dimensions of the image on the screen. In other words, in order to render an image, iOS has to bring it into the memory (RAM), and it will load the whole image file, regardless of how small the image appears on the screen. And since the memory is a scarce resource -- iOS reserves the right to silently kill the app that uses too much memory.
This is what I thought was happening:
But this is what was really going on:
Finally I was able to consistently reproduce the issue.?
Here is the sequence that would cause the crash:
So, the expo team was right after all -- it was a memory issue. I was also correct, because the state (the image cache) was accumulating in the Storage.?
This issue was particularly difficult to troubleshoot, because it would only impact devices of most active users -- someone who takes?a lot of photos frequently enough to make the thumbs with underline full-size images?dominate the screen on local device. If you end up mixing these large file thumbs with the other users thumbs which have to be downloaded from the server before they cached -- the use of the memory would go up, but it would not go up high enough for iOS to kill the app.
The solution -- if you do not have a thumb version available, always resize it to the dimensions of the image on the screen before rendering.
Lessons learned:
# 1 -- Never give up.?When this issue first happened,?I had no idea where to start. I tried so many different things, which lead to drastically improving the application performance and the UX. If I knew exactly what's causing my issue in the first place -- I may have never put the same amount of effort into my app, as it was?already good enough.
# 2 -- If not you, then who? It's tempting at times to push back -- redirect the blame to 3rd party or someone else's code. I'm convinced once again -- if there is a problem, I can always find a solution. Never go it alone, always reach out for help, ask lots of questions, even if you do not understand every answer right away -- eventually the light bulb will go off. But it's always up to you to keep pushing forward.?It's simply a matter of time. Sometimes it may take you 6 months or longer, but then apply rule #1 and never give up.
-------------------------------------------------
This article was originally posted here: https://www.echowaves.com/post/a-bug-that-made-me-a-better-developer
Senior Software Leader | Fortune 100 Executive | Founder & Entrepreneur | Business-Focused Technical Strategist | Organizational Transformation | Software & Enterprise Architecture | AI/ML | Information Security
3 年Ebenezer Samuel - you might want to hire Dmitry as an engineer on your mobile team :)