Fakultеt dеkani: dots. J.U.Xujamov
Content:
Introduction.........................................................................................................
Big data mining and problem statement
What Does Big Data Look Like?......................................................
Big Data Tools, Techniques, and Strategies......................................
Drivetrain Approach to Recommender Systems...............................
Optimizing Lifetime Customer Value...............................................
Best Practices from Physical Data Products.....................................
Data mining
Association rules...............................................................................
Clustering..........................................................................................
Experimental evaluation
Experimental results.........................................................................
Examples of data mining lgorithms.................................................
Conclusion...........................................................................................................
References............................................................................................................
Introduction
Recently, very often, both inside the team and outside it, I often come across different interpretations of the concepts of “Big Data” and “Data Mining”. Because of this, there is a growing misunderstanding between the Contractor and the Customer regarding the proposed technologies and the desired result for both parties.
The situation is aggragated by the lack of clear definitions from some generally accepted standardizer, as well as a different order of the cost of work in the eyes of a potential buyer.
There was an opinion on the market that “Data mining” is when a dump was shipped to the Contractor, he found a couple of trends there, generated a report and received his million rubles. With "Big Data" everything is much more interesting. People think it's some kind of black magic, and magic is expensive.
The purposes of this article are to prove the absence of significant differences between the interpretation of these concepts, as well as to clarify the main dark spots in the understanding of the subject.
Big data in information technology is a series of approaches, tools and methods for processing structured and unstructured data of huge volumes and significant diversity in order to obtain human-perceptible results that are effective in conditions of continuous growth, distribution over numerous nodes of a computer network that have formed at the end 2000s alternative to traditional database management systems and business intelligence class solutions.
What do we see? A definition that is supposed to define an object by its appearance (a large bicycle, a small tree, a scooter, etc.) actually defines a certain set of methods and goals, in fact defining a certain range of processes. Is it possible to agree with such a definition, with the assumption that jogging (process) can be called a teapot (object)? It's hard to say, let's try to decompose the definition.
Big Data is:
certain technologies, tools and methods;
data can be structured and unstructured;
the data must be huge;
as a result of data processing, we should get some profit.
In these components of the definition, it is not clear what is:
unstructured data;
huge size.
The tasks solved by Big Data methods include:
data collection (parsers, gates, etc.);
data storage (building complex data warehouses);
work with data (aggregation, analysis, description);
identifying relationships and building trends (possibly with the ultimate goal of prediction).
Do'stlaringiz bilan baham: |