What is Data Mining?
Data Mining (Russian data mining, data mining, data mining) is a collective name used to refer to a set of methods for detecting previously unknown, non-trivial, practically useful and accessible knowledge in data, necessary for making decisions in various fields of human activity. The term was introduced by Grigory Pyatetsky-Shapiro in 1989.
Translating into a simple language - you already have a certain data array that has already been somehow processed earlier, and now you are processing this data array again, perhaps in some other way than before, and you get some useful conclusions that you then use to obtain profit.
It turns out that according to the definition of Wikipedia, the decomposition of “Data Mining” includes:
certain technologies, tools and methods;
the data is already structured, as it is already somehow stored and already somehow worked with;
data can be of any size;
as a result of data processing, we should get some profit.
The tasks solved by Data Mining methods include:
work with data (aggregation, analysis, description);
identifying relationships and building trends (possibly with the ultimate goal of prediction).
Findings. According to the demopositions of definitions made above, Data mining, as it were, “wins” over Big Data due to a democratic approach to the amount of data.
According to the list of tasks solved using Big Data and Data Mining methods, Big Data already “wins”, as it solves the problems of collecting and storing data.
Thus, if we take into account that it is in principle not advisable to explore small amounts of data, then the meaning of the concept of Data Mining is fully included in the meaning of the concept of Big Data. Therefore, those who say that this task is just “Data Mining”, and not the magical “Big Data”, say something like this - “This is not a bird, this is just a dove”, which is not true from the point of view of formal logic, which we all respect so much.
As for the price, in both areas of knowledge regarding overlapping tasks, an identical stack of technologies, tools and methods is used. As a result, the price of work should also be of the same order.
In conclusion, it makes sense to add that many people try to compare these concepts with each other and other concepts (for example, with the highload task, as the author did here: habrahabr.ru/company/beeline/blog/218669) along the software stack. For example, if we use RDBMS, then this is already 100% not Big Data.
I can’t agree with this point of view, because modern RDBMS operate with impressive amounts of data and allow you to store data of almost any type inside yourself, which, with proper indexing, is quickly aggregated and issued to the application level, and it is possible to write your own indexing mechanism.
In general, it is wrong to classify a class of tasks by a software and hardware stack, since any unique task requires a unique approach that includes those tools that are most effective for solving this particular task.
Getting Up to Speed With Big Data — Essential information on the structures and definitions of big data.
Big Data Tools, Techniques, and Strategies — Expert guidance for turning big data theories into big data products.
The Application of Big Data — Examples of big data in action, in‐ cluding a look at the downside of data.
What to Watch for in Big Data — Thoughts on how big data will evolve and the role it will play across industries and domains.
Big Data and Health Care — A special section exploring the possi‐ bilities that arise when data and health care come together. In addition to Big Data Now, you can stay on top of the latest data developments with our ongoing analysis on O’Reilly Radar and through our Strata coverage and events series.
Do'stlaringiz bilan baham: |