Why Statistics should be the first Step to Data Science?

 As we all know that Data Science is an interdisciplinary subject and requires the knowledge of several other fields such as Statistics, Python, Machine Learning and much more. The question arises to what is or should be the first steps to Data Science? What should be the first discipline that you should focus? What is the building block of Data Science? Which discipline is more important and Why?  What are the first steps towards Data Science? As we have read everywhere that all the disciplines hold equal importance, however, as far as the first discipline is concerned you should concentrate on STATISTICS for the following reasons.

 

We all know that Data Pre-processing is the one of the preliminary step in the project of any Data Science project, this step includes, cleaning the missing data, imputing the missing data, checking the importance of various variables, and their correlation with the other corresponding variables or the dependent variable. Here, you need to know that, for any action you perform on the dataset, you need to have a logical reason or explanation behind it, be it imputing the data with mean, median or mode, or omitting any piece of data. We take those decisions by doing univariate/multivariate analysis on the data. We do this using various Statistical method, and hence the knowledge of statistics comes here in handy.

 

Data Visualization is another important step while in the project, which helps us to visualize the trends of variables in a dataset, various plots, graphs helps us to visualize them and understand what the data is trying to speak. Visualizing them on graph helps us to conclude the insights from the data, which we utilize further for the forecasting /predictions. Statistical graphs again help us here to understand the trends, correlations amongst the variables, outliers, and patterns in dataset, hence knowing and applying Statistics helps us gain insights from the Data.

 

Knowledge of Statistics is the base of Machine Learning. Statistical Tests and Estimation statistics helps in model selection, statistics is the base of machine learning. The algorithm techniques such as regression and classification are the statistical methods used in predictive analytics. Statistical methods are used in entirety to gain the insights and do the future predictions.   Almost every step in the project requires the use of statistical methods and hence I think having a grasp of the subject will help you learning the subject more easily and swiftly.

Whats you thought on it! Do let su know! and Stay tuned for more such blogs!

          

 

Comments

Popular posts from this blog

Quick Mind Map for Statistics - Part 1

Quick Mind Map for Statistics - Part 2

Statistics - Measure Of Central Tendency