Faster MCU or more HW?
If we do some heavy signal processing with software then a question is often: should I go with faster processors, multi-cores - or better to have (dedicated) hardware?
The answer is: if your software platform provides the hardware features you need - go ahead. But what if we realize during the project that the performance is not enough or the hardware has limitations (e.g. a "must have" for me are two full bi-directional I2S interfaces what most of MCUs, embedded system do not provide, and a Linux based platform ...? forget it).
As an example: I want to do professional audio processing, filtering, sample rate conversion (SRC), to have FFT display, VU meter, a Dynamic Range Extension, Limiter, Equalizer etc. I decided to go with a pretty nice Cortex-M7 platform (one of the famous Discovery boards). I ended up in trouble in terms of performance: It was easy to overload the MCU: with the first versions of the project up to 100% when I did my common lazy programming style (without to take care about hardware really). And just having few of my features implemented (just a DC-Blocker, FFT, VU-Meter, nothing else) but I am already at the end of the system performance, so soon?.
OK, let me fight for performance, when coding the software. Use all the tricks in software development, but - very important: check your hardware capabilities and adapt accordingly: how many buses and parallel paths do I have in system? is there a FPU and how to use it? float or double (SP or DP)? Are there DMAs even to use for a mem-to-mem memcpy? CMSIS DSP or my own code ... (DSP instructions)? DTCM, ITCM, caches, flash accelerators ... use all the "tricks": a well done HW-SW-CoDesign where the hardware seems to dictate the firmware development.
And: I am down now to 3.5% MCU load, with the same FFT LCD Display, audio filters. Nice, but really tough to find all the hooks for a faster firmware. But enough for all my features still in mind?
Not really: some hardware features are not really very fast, e.g. the touch needs I2C which slows down my embedded firmware. Or the LCD has to render my images, text etc. - a simple text print on LCD can be slow. And feeding audio through my system as bi-directional I2S, with two sinks and sources (any-to-any audio interface routing with 2x In and Out, each 192KHz, 32bit) - I overload any MCU again, I am pretty sure. I do not want to have clicks in audio later just when I tap the touch screen. Unpredictable if 80% CPU load with audio processing would be enough.
And fast MCUs run often a heavy OS, e.g. Linux. And such platforms let me miss all the hardware I wish to have, e.g. often Linux based MCUs do not have full blown I2S in and out, or just one or even just one direction, not two I2S hardware interfaces. And no way to use an external word clock, e.g. a GPS disciplined clock reference.
Therefore my decision: extend the hardware, at best in a way to use the same MCUs, to create a Dual- or even Quad-MCU solution, as a pipeline with the same processors I am familiar with (actually all types of ARM CPUs, including A53, A57 but changing from one to the other is a dramatic decision, even for me). And: split the firmware I have already in place and spread over a Multi-MCU-System, similar to a multi-core CPU, but with less trouble due to cache coherence, system bus bottlenecks, shared memory handling etc. And reusing the same MCU as a Multi-MCU-System avoids headaches when mixing different types of MCUs. Often they need different BSPs, approaches to design firmware or even needed tools (ARM vs. DSPs or embedded bare-metal vs. Linux).
And: Multi-MCU is even better for my needs: a multi-core single chip processor sounds nice but if I just think about the "cache maintenance" or "coherency" necessary to do ...?
My Dual-MCU Lyrebird APP is ready to go: two STM32F7 CM7 MCUs as a single system, an Audio Processing Pipeline (APP). Let's move the audio processing to a dedicated MCU and have the second MCU free as UI interface, with more fancy displays on LCD, multi-touch control etc. Or use one STM as one sink and source (e.g. Line In and Out), the other end of the APP is connected to another sink and source (e.g. AES3 in and out).
Quite easy now to split the existing firmware, no need to start over with a new platform. In my case: additional hardware makes my live easier as a firmware/software developer. Not all issues can be solved just with software. More people can achieve more compared to just one (or me working without to sleep).