Problems with transfer learning: First, let’s define what transfer
learning is. Transfer learning is when a model has been trained for
one particular task (classifying vehicles for example), and has the last
layer(s) taken out and retrained completely so that the model can be
used for a new classification task (classifying animals, for example).
In computer vision, there are some really powerful models, such
as the inception-v3 model, that have been trained on powerful
GPUs for quite some time in order to achieve the performances
that they do. Instead of training our own CNN from the ground up
(and most of us don’t have the GPU hardware or the time to spend
in long training an extremely deep model like inception-v3), we
can simply take inception-v3, for example, which is really good
at extracting features out of images, and train it to associate the
features that it extracts with a completely new set of classes. This
process takes a lot less time since the weights in the entire network
are already well optimized, so you’re only concerned with finding
the optimal weights for the layers you are retraining.
That’s why transfer learning is such a valuable process; it allows us
to take a pretrained, high-performance model and simply retrain
the last layer(s) with our hardware and teach the model a new
classification task (for CNNs).
Going back to TCNs, the model might be required to remember
varying levels of sequence history in order to make predictions.
If the model did not have to take in as much history in the old task
to make predictions, but in the new task it had to receive even
more/less history to make predictions, that would cause issues
and might lead the model to perform poorly.
In a one-dimensional convolutional layer, we still have parameter
k to determine the
size of our kernel, or filter. The way the convolutional layer works is pretty similar to the
two-dimensional convolutional layer you looked at in Chapter
3
, but we are only dealing
with vectors in this case.
Here’s an example of what the one-dimensional convolutional operation looks like.
Assuming an input vector defined as in Figure
7-1
,
Chapter 7 temporal Convolutional networks
260
and a filter initialized as in Figure
7-2
,
the output of the convolutional layer is calculated as shown in Figure
7-3
, Figure
7-4
,
Figure
7-5
, and Figure
7-6
.
10 5 15 20 10 20
x =
Figure 7-1. A vector x defined with these corresponding values. This is the input
vector
1 0.2 0.1
Filter Weights
Do'stlaringiz bilan baham: |