What is Data Science ?

 Data Science is used almost everywhere in commercial and non-commercial settings, its use cases can be seen in various industries from Finance, Banking, Ecommerce, Social Media and elsewhere. In today’s world when data gets generated every second, we need to leverage this data for making sound business decisions. It is used in every day operations to gain insights into customers, products, processes. Corporations use Data Science to improve their customer experience as well as to upsell or cross sell their products.

Ecommerce websites uses data science to recommend the products by capturing the data of what we have searched; also Advertisements are also displayed for the products of user’s interest on the basis of Data Science.

How to Become Data Scientist?

Data Scientists requires the knowledge of a

1.)  Programming Language (R/Python)

    2.)  Statistics

    3.)  Mathematics

    4.)  Machine Learning

    5.)  Deep Learning

Data Science Project Lifecycle

The lifecycle of a Data Science project includes six steps

1.)  Defining Goal of the Research

2.)  Data Capturing

3.)  Data Preparation/ Pre-processing

4.)  Exploratory Data Analysis and Visualization

5.)  Model Selection

6.)  Data Presentation & Automation

 

Goal of the Research

A project starts with a goal in mind such as what is the need of the research, why the research is being conducted and what is the expectation of a client or a company. For Example:-  A project may require to predict sales for next 5 years on the basis of previous years data, or a  research can be conducted to  identify the potential markets for a certain product, or may be predicting the stock prices. Ultimate goal can be anything and the acquaintance with the goal is the first and best thing to start off the project.

Data Capturing

Most of the organizations equip with the required data for the research process, however the data may not be sufficient and you want look for some third party’s to arrange for the data. Data can be collected from multiple sources. Data may come in structured or unstructured format which requires lot of pre-processing for the subsequent steps.

Data Pre-processing

Data collected from different sources may have lot of anomalies such as missing values, wrong values, and outliers, checking their data types. Data will require lot of cleaning and imputing the missing values with some meaningful data in order to become useful for generating insights. Most of the time is consumed in Data Pre-processing.

Exploratory Data Analysis & Visualization

The Data is ready for analysis where we try to identify the relationship between the variables, with the help of several statistical measures. We visualize the data using histograms, boxplots and scatterplots to gather meaningful insights from the data. Several python libraries such as matplotlib and sea born are used here for the purpose of visualization.

Model Selection

Machine Learning comes in use in this section. Model Selection is done on the basis of our goal; we identify if the problem is classification or regression and choose models to train our datasets. Several Algorithms such as Linear Regression, Logistics Regression, and K-means clustering etc are used to train our datasets. Features selection is also done in this section and as a final step model diagnostic comparison is performed to select the best model.

Data Presentation and Automation

Presenting final Insights to the stake holders and automating the process of data analysis for repetitive business use and integration with other tools.

 This is the complete Lifecycle of any project of a Data Science. This lifecycle though looks linear in nature, however we come and go back through these steps several times while in a real project. Hence the lifecycle may differ for every individual and according to the projects. 

 

Comments

Popular posts from this blog

Quick Mind Map for Statistics - Part 1

Quick Mind Map for Statistics - Part 2

Statistics - Measure Of Central Tendency