Why Statistics should be the first Step to Data Science?
As we all know that Data Science is an interdisciplinary subject and requires the knowledge of several other fields such as Statistics, Python, Machine Learning and much more. The question arises to what is or should be the first steps to Data Science? What should be the first discipline that you should focus? What is the building block of Data Science? Which discipline is more important and Why? What are the first steps towards Data Science? As we have read everywhere that all the disciplines hold equal importance, however, as far as the first discipline is concerned you should concentrate on STATISTICS for the following reasons.
We
all know that Data Pre-processing is the one of the preliminary step in the
project of any Data Science project, this step includes, cleaning the missing
data, imputing the missing data, checking the importance of various variables,
and their correlation with the other corresponding variables or the dependent
variable. Here, you need to know that, for any action you perform on the
dataset, you need to have a logical reason or explanation behind it, be it
imputing the data with mean, median or mode, or omitting any piece of data. We
take those decisions by doing univariate/multivariate analysis on the data. We
do this using various Statistical method, and hence the knowledge of statistics
comes here in handy.
Data
Visualization is another important step while in the project, which helps us to
visualize the trends of variables in a dataset, various plots, graphs helps us
to visualize them and understand what the data is trying to speak. Visualizing
them on graph helps us to conclude the insights from the data, which we utilize
further for the forecasting /predictions. Statistical graphs again help us here
to understand the trends, correlations amongst the variables, outliers, and
patterns in dataset, hence knowing and applying Statistics helps us gain
insights from the Data.
Knowledge
of Statistics is the base of Machine Learning. Statistical Tests and Estimation
statistics helps in model selection, statistics is the base of machine learning.
The algorithm techniques such as regression and classification are the
statistical methods used in predictive analytics. Statistical methods are
used in entirety to gain the insights and do the future predictions. Almost
every step in the project requires the use of statistical methods and hence I
think having a grasp of the subject will help you learning the subject more
easily and swiftly.
Whats you thought on it! Do let su know! and Stay tuned for more such blogs!
Comments
Post a Comment