from batch to streaming
Unexpected complication I wish I were well aware of from the beginning.
If you coming from a conventional DSP background like me, this could be particularly useful. Let’s dive in.
First, for those without the concept of mini-batch training, here is my short version. Instead of processing single unit of input data at a time, use a batch of input data and process them in parallel. This way not just provides efficient learning (less parameter update calculation), but also takes full advantage of parallel computing hardware (GPU). All makes good sense.
Here is what a batch of data often look like, very similar to a 3-dimentional array, and it’s called a Tensor.
[Batch_size, time_steps, data_frame]
Data_frame:?input unit to the model.
Time_steps:?sequence of input units line up in time elapsing order, the number of units is dependent on the length of a clip (or a sample) of training data.
Batch_size:?a batch contains a number of clips.
As example, a batch containing total of 80 seconds speech data, 10sec per clip, 128 PCM data per frame no overlap, 16000kHz sample rate, would have dimensions of [8, 1250, 128].
领英推荐
Many DSP applications require processing stream(s) of data, in real-time, hence frame by frame is the primary data logistic format. Expressing a single data frame in tensor format:
[1, 1, data_frame]
In straightforward cases, setting the batch_size and time_steps to 1 is exactly what need to be done when preparing a trained NN model for deployment. Simple enough.
What could make this straightforward process become chewy is data lookback. Since mini-batch training process all the input data in parallel, when a model requires the previous frame as part of the input for instance , this is what usually happens.
Simply include the data frames from earlier time steps would do just fine in mini-batch training and there are configurations to achieve that conveniently. But what will happen in deployment then, when the input has only single time step, [1,1,data_frame]? This simple way of including earlier time steps is not applicable, at least not straightforwardly.
Adding buffer (to deployment model) before every layer that requires data lookback is a logical solution. More advanced solutions are employed to suite different architecture and scalability. The intricacy is mostly around training efficiency and having consistent code between training and deployment.
The twist I wish I learned earlier is, in conventional algo development, we strive to have consistent implementation between simulation and deployment. In NN world, due to different data handling, it’s common that the trained model is not exactly the same as deployment one. Fully consider this complication from the beginning of model development would make life a lot easier during later debugging and deployment process.