will be reported independent of the request using one of the two methods
below:
Push: After the scores have been obtained, they will be pushed to
the "client" in the form of a "notification".
Poll: After the scores have been produced, they will be saved in a
"low read-latency database" and the client will poll the database at a
regular interval to fetch any existing predictions.
There are a couple of techniques listed below, that can be used to reduce the
time taken by the
system to deliver the scores, once the request has been
received:
The input features can be saved in a "low-read latency in-memory
data store".
The predictions that have already been computed through an
"offline batch-scoring" task can be cached for convenient access as
dictated by the use-case, since "offline predictions"
may lose their
relevance.
9.
Performance Monitoring
A very well-defined "performance monitoring solution" is necessary for
every machine learning model. For the "model serving clients", some of the
data points that you may want to observe include:
"Model Identifier"
"Deployment date and time"
The "number of times" the model was served.
The "average, min and max" of the time it took to serve the
model.
The "distribution of the features" that were utilized.
The difference between the "predicted or expected results" and
the "actual or observed results".
Throughout the model scoring process, this metadata can be computed and
subsequently used to monitor the model performance.
Another "offline pipeline" is the "Performance Monitoring Service", which
will be notified whenever a new prediction has been served and
then proceed to evaluate the performance
while persisting the
scoring result and raising any pertinent notifications. The assessment will
be carried out by drawing a comparison between the scoring results to the
output created by the training set of the data pipeline.
To implement fundamental performance monitoring of the model, a variety
of methods can be used. Some of the widely used methods include "logging
analytics" such as "Kibana", "Grafana" and "Splunk".
A low performing model that is not able to generate predictions at high
speed will trigger the scoring results to be produced by the preceding
model, to maintain the resiliency of the machine learning solution. A
strategy of being incorrect rather than being late is applied, which implies
that if the model requires an extended period
to time for computing a
specific feature then it will be replaced by a preceding model instead of
blocking the prediction. Furthermore, the scoring results will be connected
to the actual results as they are accessible. This implies continuously
measuring the precision of the model and at the same time,
any sign of
deterioration in the speed of the execution can be handled by returning to
the preceding model. In order to connect the distinct versions together, a
"chain of responsibility pattern" could be utilized. The monitoring of the
performance of the models is an on-going method,
considering that a simple
prediction modification can cause a model structure to be reorganized.
Remember the advantages of machine learning model are defined by its
ability to generate predictions and forecasts with high accuracy and speed to
contribute to the success of the company.
Do'stlaringiz bilan baham: