Published on : 01 February 20213 min reading time

Machine learning, big data and data science are technical terms that have become very popular in recent years. However, the perimeters used by each of these terms coincide, but their meanings differ. This could lead to confusion. The description of the workflow or workflow of a data scientist to build a data product. You will discover how to better manage its data, science models.

## What is Data Science?

Data science, or better known as data science, is a recent term in the field of computer science. Specialisation allows companies to take advantage of their external and internal data on the Internet to make effective strategic decisions. The fields of work of this discipline are numerous and vast. In particular, product recommendation, research into fraudulent behaviour or pricing assistance. These intelligent techniques, known as data products, are predictive. These models are statistical types built through machine learning algorithms.

## Data mining

The data mining phase facilitates the understanding of the data. Entering data means knowing its composition, interactions and distribution. One of the easiest ways to explore data science is to use descriptive statistics. In particular, indicators such as median, mean, quantiles, standard deviation and variance. These indicators provide a concise view of the breakdown of a characteristic. During the uni-variate study of statistical tools, data can be visualized using tools such as pie charts, moustache boxes and histograms. Cross-referencing the different features during visualizations makes it possible to envisage less obvious relationships to be examined in the first instance. 2D diagrams with point cloud appearance and even 3D diagrams give the possibility to see the distribution of different data in a multi-dimensional range. Ultimately, the goal of data exploration and visualisation is to assimilate the data. Another objective of these processes is to certify that the data set is cleaned and ready to be used and exploited by machine learning algorithms.

## A predictive data science model

After the phase of data preparation, exploration and cleaning come the modelling phase. The objective of this stage is to design a statistical model adapted to predict the occurrence of a given phenomenon and provided. This model will be based on a standard data set of the phenomenon to be modelled. The Machine Learning system will teach these data to structure a statistical model. This type of model will be used to predict the outcome on a finding that it has not yet seen. The objective of this phase is to establish a model that is a better approximation of the concrete phenomenon that one is trying to model. Thus, the data scientist will try out several hypotheses and examine them to produce the best feasible model. Once a statistical model has been obtained and trained, the type of performance of the acquired model will be determined. Indeed, many people want to know to what extent the predictive model system propagates on data. Thus, the data scientist will apply a Testing Set which will facilitate testing the performance of this type of predictive model on data not seen during the learning phase. The Performance Calculation System will quantify how well the model behaves. The concept is that this kind of metric is easily interpretable and a concise value so that the data scientist knows how the type of model reacts.