RESEARCH METHODOLOGY
This paper uses general methodological approaches to achieving the research goal - comparison, comparative analysis, dialectical approach and generalization methods.
ANALYSIS AND RESULTS
Initially, the set of approaches and technologies included tools for mass-parallel processing of indefinitely structured data, such as NoSQL DBMS, MapReduce algorithms and Hadoop project tools. Map Reduce is a model of distributed parallel computing in computer clusters introduced by Google. According to this model the application is divided into a large number of identical elementary tasks performed on nodes of the cluster and then naturally summarized in the final result No SQL (Not Only SQL) is a generic term for various non-relational databases and repositories, it does not refer to a particular technology or product.
Hadoop is a freely distributed set of utilities, libraries and frameworks for developing and executing distributed programs running on clusters of hundreds or thousands of nodes. It is considered one of the foundational technologies of big data.Usually BIG DATA is described by the following characteristics[13]:
1. Volume - The amount of data generated and stored. The size of the data determines the significance and potential of the data, as well as whether it can be considered Big Data.
2. Variety - The type of data. Big Data can consist of text, images, audio, video.
3. Velocity - speed. This refers to the speed at which the data is generated and processed.
4. Variability - The inconsistency of data sets can hinder data processing and management.
5. Veracity - the quality of data directly affects the accuracy of data analysis.
Big Data analysis methods:
Cluster analysis - a statistical method of classifying objects that results in dividing diverse groups into smaller groups of similar objects.
Crowdsourcing - a method of collecting, categorizing and enriching data by a wide range of individuals engaged on the basis of a public offer, without entering into labor relations, usually through the use of online media.
Machine learning. A class of artificial intelligence methods, a characteristic feature of which is not a direct solution to a problem, but learning by applying solutions to a set of similar problems.
A mathematical model built on the principle of organization and functioning of biological neural networks - networks of nerve cells in a living organism.
Network analysis (network analysis). A set of methods used to describe and analyze the relationships between discrete nodes in a graph or network.
Predictive analytics. A class of data analysis techniques, which focuses on predicting the future behavior of objects and actors in order to make optimal decisions.
Simulation modeling. (Simulation modeling is a research method in which the system under study is replaced by a model that accurately describes the real system being experimented with in order to obtain information about that system.
Big Data Platforms:
Hadoop http://hadoop.apache.org/ Provides an interface to Java, a freely distributed software package under Apache License 2.0 and GNU GPL licenses, consisting of the Hadoop Common management module, the HDFS distributed file system, the YARN job scheduler and the HadoopMapReduce computing platform. Developed since 2005.
Spark https://spark.apache.org/ Provides interfaces to Scala, Java, Python and R, distributed under the ApacheLicense 2.0 license. A computational platform that has been in development since 2014.
Elastic search https://www.elastic.co/products/elasticsearch Together with Logstash collection system and Kibana analytics platform form an integrated data collection, storage, search and analytics system.
Hortonworks Data Platform (HDP) https://hortonworks.com/products/data-platforms/hdp/ A data management platform including HDFS, Hadoop, HBase, HCatalog, Pig, Hive, Oozie, Zookeper, Ambari, Web HDFS, TalentOS, Sqoop, Flume, and Mahout [14].
All of the information collected by Big Data can be classified according to the sources from which it was obtained [15]:
1. transactional data. This is data about customers, suppliers, partners and employees available from online transaction processing and/or obtained from an online analytics database.
2. dark data. Information that is not specifically stored or collected by organizations, but is generated incidentally in the course of doing business or interacting with online services and remains in Internet archives.
3. Commercial data. Prior to the advent of Big Data technology capabilities, there were aggregators of commercially valuable information in various industries.
4. official data. Information disseminated by government agencies, open public registries, published regulations are the most reliable and structured.
5. Information from social networks and services. Involvement of businesses and individuals in the functionality of major social networks (Facebook, VKontakte, LinkedIn, Twitter, Instagram, etc.) has created another source of data on demand, trends in certain segments of market relations, new and promising products, services and companies[16].
Big data undoubtedly has advantages over traditional human resource management, but it also has disadvantages, which include the following risks:
Big data is heterogeneous, so it is difficult to process it for statistical inference. The more parameters required for prediction, the more errors accumulate in the analysis;
Storage and processing of Big Data is associated with increased vulnerability to cyberattacks and all kinds of leaks. A prime example is the Facebook profile scandals;
Untimely use of data (information often does not go beyond individual business units);
Problems with confidentiality of information about employees and customers;
Increased risk of information leakage due to the use of a large number of data storage devices;
Shortage of HR-managers with sufficient skills to analyze and use special software.
Pros and Prospects:
Dynamic schema: As mentioned above, this DBMS allows you to work flexibly with the data schema without having to change the data itself;
Scalability: BDA is horizontally scalable, which makes it easy to reduce the load on servers with large amounts of data;
Actually, Big Data is a complex consisting of hardware and software system that allows to collect, process, store and display information about the state of objects in real time.
Contact elements are useful for quickly "grasping" the Big Data situation with some of the original data already entered and for understanding how to most quickly and efficiently enter the remaining data [17].
Do'stlaringiz bilan baham: |