Posts

Regression Analysis

Image
 R egression Analysis is a statistical technique or a tool that has following objectives/benefits.         1.)   Regression analysis indicates if the relationship between the variables is statistically significant.        2.)   It indicates the relative strength of the independent variables(x) on the dependent variables. In simple terms it helps us to determine which variable is the more important to predict the dependent variable (y)         3.)   Make predictions. Suppose we have got a dataset where we have to predict the house price based on few variables such as a size of the house, location of the house, regression analysis will helps us if these variables (location, size) are actually important/significant for predicting the price of the house. It will also tell us which factor/variable is more important in helping us predicting the price or as earlier said the strength of each variable; also regression analysis will also help us to predict the future price of the houses on t

Why Statistics should be the first Step to Data Science?

 A s we all know that Data Science is an interdisciplinary subject and requires the knowledge of several other fields such as Statistics, Python, Machine Learning and much more. The question arises to what is or should be the first steps to Data Science? What should be the first discipline that you should focus? What is the building block of Data Science? Which discipline is more important and Why?   What are the first steps towards Data Science? As we have read everywhere that all the disciplines hold equal importance, however, as far as the first discipline is concerned you should concentrate on STATISTICS for the following reasons.   We all know that Data Pre-processing is the one of the preliminary step in the project of any Data Science project, this step includes, cleaning the missing data, imputing the missing data, checking the importance of various variables, and their correlation with the other corresponding variables or the dependent variable. Here, you need to know that

Quick Mind Map for Statistics - Part 2

Image
  Statistics is the one of the pillar or a prerequisite of a Data Science and is used in Data analysis and Visualization in the lifecycle of a Data Science Project. In this series, presenting you a quick mind map of Statistics for Data Science. Statistics needs to be studied in the following order. Statistics is subdivided into 1.) Descriptive Statistics 2.) Inferential Statistics. This diagram represents the mind-map of Inferential Statistics. Inferential statistics is used to make the inferences about the population parameters, using sample statistics. For example we will use Sample Mean to calculate the Population mean. Like descriptive statistics, inferential statistics does not have any subsequent divisions, but concepts that help us to make inferences or predictions about Population parameters.   So the first thing you need to know when you are learning the inferential statistics is the difference between 1.) Population 2.) Sample You can find the difference  betwee

TYPE I and Type II Errors, All about P- Value

Image
Continuing  our last blog about Hypothesis testing, which allows us to formulate hypothesis, and we use this hypothesis to perform a Statistic-Test. We also reject or approve the hypothesis as per the significance level selected by the researcher. What is Type I error?  I’ll explain with the help of an example, suppose a patient experiencing a headache visited a doctor, and after doing the preliminary examination, doctor formulates a hypothesis Null Hypothesis:- The patient has Migraine Alternative Hypothesis: - The patient doesn’t have Migraine. Suppose a doctor ends up making the decision that the patient doesn’t have a migraine and sends him home which means he rejected the Null hypothesis, whereas originally, the patient was suffering from migraine. This could prove fatal, as the patient’s condition may worsen. This is the Type I error. In the Type II error, if originally the patient doesn’t have a migraine, however doctor ended up concluding that he is suffering from migrain

Hypothesis Testing

Image
What is hypothesis? Hypothesis is a claim about the Population parameter such as Mean, Standard Deviation or Proportion etc. It’s a claim we can test, suppose the average age of the students in the city is 23. This becomes the Null hypothes is   average Age of Students = 23, we have assigned this claim to the null hypothesis as it has equality sign. The Alternative hypothesis becomes the opposite , a verage Age of Students ≠ 23.  We utilize this hypothesis for a TWO TAIL TEST, because Alternative hypothesis basically means that Average Age of students is either more or less than 23. Suppose if we test that the population mean µ<23, now this claim does not have equality sign, we will make it as an Alternative Hypothesis , now we will formulate the null hypothesis   that population mean µ≥23. This hypothesis will be used for performing a ONE TAIL TEST. This is a left tailed hypothesis And similarly, if we test that the population mean µ>23, we will make it an Alternative Hy

Quick Mind Map for Statistics - Part 1

Image
Statistics is the one of the pillar or a prerequisite of a Data Science and is used in Data analysis and Visualization in the lifecycle of a Data Science Project. In this series, presenting you a quick mind map of Statistics for Data Science. Statistics needs to be studied in the following order.   Mind- Map for Statistics Statistics is subdivided into 1.) Descriptive Statistics 2.) Inferential Statistics.   This diagram represents the mind-map of Descriptive Statistics. Descriptive statistics describes the characteristics of the data set. It is further subdivided into three categories 1.) Measures of Central Tendency 2.) Measures of Variability 3.) Measures of a Symmetry.   Measures of Central Tendency tells about the center of the dataset and are 1.) Mean 2.) Median 3.) Mode   Measures of Variability tells about the dispersion of data points around the mean and are 1.) Variance 2.) Standard Deviation 3.) Co Variance 4.) Coefficient Correlation

Point Estimator and Confidence Interval

Image
  In Inferential statistics, we study sample, and use this results to estimate the Population parameters. Suppose we want to know the age of college students, we will random select a sample and calculate the sample mean. This calculated sample mean is the Point estimator.   The Point Estimator is the single value, statistics, computed from a sample, and used to estimate the population parameter. The Sample mean (point estimator) which is calculated, we are not confident that it represents the mean of the entire population, In inferential statistics, statistician, gives preference to an interval or a range of values, rather than a single digit (Point Estimator). This Range of values or intervals is known as the Confidence Interval. Confidence Interval is the confidence % age of the range of the values or intervals, which contains the population mean. The confidence level primarily used is 90%, 95%, and 99%. Most commonly used confidence level is 95%. Confidence Interval is calculated

Types Of Data & Graphs for Data Analysis

Image
Once the Goal of the research is determined, the next step in the lifecycle of Data Science is Data Retrieval, which can be captured from many sources and in many forms. Once the Data is captured, we verify the types of Data or Variables, or instances we have in a dataset.   There are Two Types of Data          1.)     Categorical          2.)     Numerical Categorical Data Type is the variables which forms a category, or groups such as Gender, or a variable that has an answer as Yes/No. they usually take fixed number of possible values. A categorical variable that can take exactly two values is also known as binary variable or dichotomous variable, and the one with more than two variables is known as polytomous variable. The Data can be further categorized as Nominal and Ordinal Data type. Nominal variable is the one that has no particular order and Ordinal Variable has a particular order. Numerical Data type deals with the numbers. It is further divided into Discrete and Contin

What is Data Science ?

  Data Science is used almost everywhere in commercial and non-commercial settings, its use cases can be seen in various industries from Finance, Banking, Ecommerce, Social Media and elsewhere. In today’s world when data gets generated every second, we need to leverage this data for making sound business decisions. It is used in every day operations to gain insights into customers, products, processes. Corporations use Data Science to improve their customer experience as well as to upsell or cross sell their products. Ecommerce websites uses data science to recommend the products by capturing the data of what we have searched; also Advertisements are also displayed for the products of user’s interest on the basis of Data Science. How to Become Data Scientist? Data Scientists requires the knowledge of a 1.)   Programming Language (R/Python)     2.)   Statistics     3.)   Mathematics     4.)   Machine Learning     5.)   Deep Learning Data Science Project Lifecycle The lifecycle of a Da