II.
D
EEP
N
ETWORK
A.
Deep architecture
The proposed stacked network utilizes five layers of
sigmoidal neurons organized as one input,
three hidden and
one output layers. To combine the higher-level features of
different data behaviors, hidden layers are trained separately
and then stacked, on top of which the output layer is added.
The third hidden layers also incorporated as input recently
captured information such as the last eight weeks’ average, and
some fresh peaks and valleys values (Fig. 1).
Figure 1. Deep predictor
For operative purposes, the proposed stacked architecture is
derived from three shallow networks called Autoencoder,
Precursor and Gambler.
B.
Data Handling
Given three years of daily sales grouped in weeks, the
network unravels the problem of
predicting sales one week
ahead of the current input window (one product-one outlet).
The dataset is taken from the database of a real pharmaceutical
databases in Ecuador. For training purposes, the available data
is divide in three mobile zones (Fig. 2), where times moves to
the right.
The first initial zone, to the left, is reserved to train the first
two shallow nets “autoencoder-predictor” which work as a
coordinate duet.
The next zone, of about 10 weeks, is reserved to train the net
“gambler”, which holds the final solidary network output and
provides the final prediction information. Finally, the zone
“unknown future” is used to test the performance of the system
and to make a real prediction, when the unknown future line
reaches the end of the data. At any time, more data can be
added and the system responds creating new predictions.
C.
Input Vector
The input vector is composed by a moving window of 16
consecutive weeks plus three other elements defined by the
day/month/year where the top
right of the moving window
stays at a given time instant (Fig. 2). All 19 entries are
normalized to neural values inside the analog segment [0,1].
When a target is needed, it will be taken as the sale value of
the week next to the right of the sample window (near future).
The shown data ranges from January 2014 to April 2017.
IT _Empresarial.
Figure 2. Data handling and input vector. Weekly sales behavior of a typical
pharmaceutical products, with an erratic pattern of consumption
and a moving
window of 16 weeks data sale plus window ubication date information.
.
For training purposes, the moving widow travels in
different space-time patterns for diverse training scenarios.
III.
F
IRST SCENARIO
:
T
HE AUTOENCODER
Our autoencoder has 19 inputs, 11 hidden and 19 output
neurons. To train it, the moving window is located at a random
position inside the autoencoder zone and the same input vector
is used as target.
The job of the trained autoencoder is to reproduce in its
output, as exactly as possible,
the image of the moving
window just loaded in its inputs, for any random position in the
allowed area. Since hidden neurons are less than input neurons,
data compression and abstract representations must occur
during training. Our stacked systems will work with
abstraction that travel from layer to layer as the main source of
information, so we take special care about abstractions’
quality.
Figure 3. The Autoencoder and the Precursor. Once the Autoencoder is
trained, its hidden layer becomes the input to the Precursor, which never sees
the real input windows but only abstractions created by hidden1. Also, the
learning cycles of precursor do not affect the weights of the autoencoder
We try several metric methods to avoid overfitting-underfitting
problems [15] and at the same time try to guarantee quality
abstractions from the involved hidden layers. We finally
adopted the following scheme, which begins by measuring the
quadratic variation V among all the outputs of the hidden
neuron in two consecutive randomly selected images in times t
and t-1. That is:
(1)
Where:
Vt = hidden outputs variation
between two consecutive
inputs
n = number of hidden neurons
o
i,t
= output of hidden neuron i at time t.
In a typical run, with small initial random weights in the
hidden layer, V starts from a small value and then grows into a
random oscillatory time series. We use this outcome and
introduce a selective peak search procedure where the last
found peak value of V is stored until a bigger peak value is
found. In pseudo code:
Do'stlaringiz bilan baham: