What is Data Science ?
Data Science is used almost everywhere in commercial and non-commercial settings, its use cases can be seen in various industries from Finance, Banking, Ecommerce, Social Media and elsewhere. In today’s world when data gets generated every second, we need to leverage this data for making sound business decisions. It is used in every day operations to gain insights into customers, products, processes. Corporations use Data Science to improve their customer experience as well as to upsell or cross sell their products.
Ecommerce
websites uses data science to recommend the products by capturing the data of
what we have searched; also Advertisements are also displayed for the products
of user’s interest on the basis of Data Science.
How to Become
Data Scientist?
Data Scientists requires the knowledge of a
1.) Programming Language (R/Python)
2.) Statistics
3.) Mathematics
4.) Machine Learning
5.) Deep Learning
Data Science Project
Lifecycle
The
lifecycle of a Data Science project includes six steps
1.)
Defining
Goal of the Research
2.)
Data
Capturing
3.)
Data
Preparation/ Pre-processing
4.)
Exploratory
Data Analysis and Visualization
5.)
Model
Selection
6.)
Data
Presentation & Automation
Goal of the
Research
A
project starts with a goal in mind such as what is the need of the research, why
the research is being conducted and what is the expectation of a client or a
company. For Example:- A project may
require to predict sales for next 5 years on the basis of previous years data,
or a research can be conducted to identify the potential markets for a certain
product, or may be predicting the stock prices. Ultimate goal can be anything
and the acquaintance with the goal is the first and best thing to start off the
project.
Data Capturing
Most
of the organizations equip with the required data for the research process,
however the data may not be sufficient and you want look for some third party’s
to arrange for the data. Data can be collected from multiple sources. Data may
come in structured or unstructured format which requires lot of pre-processing
for the subsequent steps.
Data Pre-processing
Data
collected from different sources may have lot of anomalies such as missing
values, wrong values, and outliers, checking their data types. Data will
require lot of cleaning and imputing the missing values with some meaningful data
in order to become useful for generating insights. Most of the time is consumed
in Data Pre-processing.
Exploratory
Data Analysis & Visualization
The Data is ready for analysis where we
try to identify the relationship between the variables, with the help of
several statistical measures. We visualize the data using histograms, boxplots
and scatterplots to gather meaningful insights from the data. Several python
libraries such as matplotlib and sea born are used here for the purpose of
visualization.
Model
Selection
Machine Learning comes in use in this
section. Model Selection is done on the basis of our goal; we identify if the problem
is classification or regression and choose models to train our datasets. Several
Algorithms such as Linear Regression, Logistics Regression, and K-means clustering
etc are used to train our datasets. Features selection is also done in this
section and as a final step model diagnostic comparison is performed to select
the best model.
Data
Presentation and Automation
Presenting final Insights to the stake holders
and automating the process of data analysis for repetitive business use and integration
with other tools.
This is the complete Lifecycle of any project of a Data Science. This lifecycle though looks linear in nature, however we come and go back through these steps several times while in a real project. Hence the lifecycle may differ for every individual and according to the projects.
Comments
Post a Comment