Conference Paper

predictions. Lastly by using the abstraction of this second net

Download 0,56 Mb.

Pdf ko'rish

bet	3/4
Sana	29.01.2022
Hajmi	0,56 Mb.
	#418378

1 2 3 4

Bog'liq
ADeepLearningAlgorithmtoForecastSalesofPharmaceuticalProductsA

predictions. Lastly by using the abstraction of this second net
plus recently captured information, a third shallow net is trained
to produce its own one week ahead estimates, using new timing
and data procedures. After training, the whole stacked system
can produce stable weekly forecasting with up to 91%_ 55 % hit
rate, for assorted products and periods. The system has been
tested in real time with real data.

Keywords—

Deep Learning, Time Series Prediction, Sales
Forecasting.
I.
I
NTRODUCTION
In the deep learning world state-of-the art performance have
gained a good reputation in fields like object recognition [1],
speech recognition [2], natural language processing [3],
physiological affect modelling [4] and many others. More
recently papers on time-series prediction or classiﬁcation with
deep neural networks have been reported [5] [6] [7] [8].
The search for depth
Both in biology and circuit complexity theory it is maintained
that deep architectures can be much more efﬁcient (even
exponentially efficient) than shallow ones in terms of
computational power and abstract representation of some
functions [10] [11]. Unfortunately, well stablished gradient
descent methods such as backpropagation, that have proved
effective when applied to shallow architectures, does not work
well when applied to deep architectures.
In previous works [12] [13] [14] we have shown an innovative
line of deep learning algorithms, with its own set of
advantages / disadvantages, but eventually producing efficient
neural computing processors. We have taken these ideas
further and in this paper, we propose a DNN specialized in
forecasting the sales or pharmaceutical products. The general
problem is to find, for each outlet and for each product, an
ideal balance that minimizes inventory costs and maximize
customer attention. For a distribution centers with hundreds of
outlets and thousands of products, this becomes a most
entangled and important operation, where deep learning could
contribute with practical solutions.
Our methodology contemplates the training with
backpropagation of shallow networks inside explicit scenarios,
with specialized tasks, where predictive information about
predictive sales, circulates freely and is used as immediate
targets or rewards for local neural training. The final objective
is to produce reliable abstract representations of the data
behavior at both short-term and long-term inﬂuences, codified
in hidden layers, and then stack them together as to produce
forecasting information.
We also propose a primordial method to measure the
quality of abstract representations generated in the different
used hidden layers, by monitoring, while training is in
progress, the neural activity of hidden neurons. This procedure
requires quadratic sum of differences over a selected period.

II.
D
EEP
N
ETWORK
A.

Deep architecture
The proposed stacked network utilizes five layers of
sigmoidal neurons organized as one input, three hidden and
one output layers. To combine the higher-level features of
different data behaviors, hidden layers are trained separately
and then stacked, on top of which the output layer is added.
The third hidden layers also incorporated as input recently
captured information such as the last eight weeks’ average, and
some fresh peaks and valleys values (Fig. 1).
Figure 1. Deep predictor
For operative purposes, the proposed stacked architecture is
derived from three shallow networks called Autoencoder,
Precursor and Gambler.
B.

Data Handling
Given three years of daily sales grouped in weeks, the
network unravels the problem of predicting sales one week
ahead of the current input window (one product-one outlet).
The dataset is taken from the database of a real pharmaceutical
databases in Ecuador. For training purposes, the available data
is divide in three mobile zones (Fig. 2), where times moves to
the right.
The first initial zone, to the left, is reserved to train the first
two shallow nets “autoencoder-predictor” which work as a
coordinate duet.
The next zone, of about 10 weeks, is reserved to train the net
“gambler”, which holds the final solidary network output and
provides the final prediction information. Finally, the zone
“unknown future” is used to test the performance of the system
and to make a real prediction, when the unknown future line
reaches the end of the data. At any time, more data can be
added and the system responds creating new predictions.
C.

Input Vector
The input vector is composed by a moving window of 16
consecutive weeks plus three other elements defined by the
day/month/year where the top right of the moving window
stays at a given time instant (Fig. 2). All 19 entries are
normalized to neural values inside the analog segment [0,1].
When a target is needed, it will be taken as the sale value of
the week next to the right of the sample window (near future).
The shown data ranges from January 2014 to April 2017.
IT _Empresarial.

Figure 2. Data handling and input vector. Weekly sales behavior of a typical
pharmaceutical products, with an erratic pattern of consumption
and a moving
window of 16 weeks data sale plus window ubication date information.
.
For training purposes, the moving widow travels in
different space-time patterns for diverse training scenarios.
III.
F
IRST SCENARIO
:
T
HE AUTOENCODER
Our autoencoder has 19 inputs, 11 hidden and 19 output
neurons. To train it, the moving window is located at a random
position inside the autoencoder zone and the same input vector
is used as target.
The job of the trained autoencoder is to reproduce in its
output, as exactly as possible, the image of the moving
window just loaded in its inputs, for any random position in the
allowed area. Since hidden neurons are less than input neurons,
data compression and abstract representations must occur
during training. Our stacked systems will work with
abstraction that travel from layer to layer as the main source of
information, so we take special care about abstractions’
quality.
Figure 3. The Autoencoder and the Precursor. Once the Autoencoder is
trained, its hidden layer becomes the input to the Precursor, which never sees
the real input windows but only abstractions created by hidden1. Also, the
learning cycles of precursor do not affect the weights of the autoencoder
We try several metric methods to avoid overfitting-underfitting
problems [15] and at the same time try to guarantee quality
abstractions from the involved hidden layers. We finally
adopted the following scheme, which begins by measuring the
quadratic variation V among all the outputs of the hidden
neuron in two consecutive randomly selected images in times t
and t-1. That is:
(1)
Where:
Vt = hidden outputs variation between two consecutive
inputs
n = number of hidden neurons
o
i,t
= output of hidden neuron i at time t.
In a typical run, with small initial random weights in the
hidden layer, V starts from a small value and then grows into a
random oscillatory time series. We use this outcome and
introduce a selective peak search procedure where the last
found peak value of V is stored until a bigger peak value is
found. In pseudo code:

Download 0,56 Mb.

Do'stlaringiz bilan baham:

1 2 3 4