Types Of Data & Graphs for Data Analysis

Once the Goal of the research is determined, the next step in the lifecycle of Data Science is Data Retrieval, which can be captured from many sources and in many forms. Once the Data is captured, we verify the types of Data or Variables, or instances we have in a dataset.

 There are Two Types of Data

         1.)   Categorical

         2.)   Numerical

Categorical Data Type is the variables which forms a category, or groups such as Gender, or a variable that has an answer as Yes/No. they usually take fixed number of possible values. A categorical variable that can take exactly two values is also known as binary variable or dichotomous variable, and the one with more than two variables is known as polytomous variable.

The Data can be further categorized as Nominal and Ordinal Data type. Nominal variable is the one that has no particular order and Ordinal Variable has a particular order.

Numerical Data type deals with the numbers. It is further divided into Discrete and Continous data type. Discrete variable has a fixed value such as number of children or it takes only a whole number whereas Continous may take any number and can be best explained as the weight of a population. Weight can be of any values such as 45.5 kgs and thus this data is Continous Data.

Graphs for the Data

Different data types can be visualized using different graphs, We will first deal with Categorical Data type, A variable that is of a categorical nature can be visualize using

1.)  BAR Graphs

2.)  Frequency Distribution graphs

3.)  Pareto Chart

4.)  Pie Charts

Numerical Data Types can be visualize using

1.)  Histogram

2.)  Box Plots

Measuring the Relationship between two Variables

There are cases when we want to analyse the relationship between the variables in Data sets, then we use

1.)  Scatterplots

2.)  Cross Tables/ Contingency Tables

BAR Graphs

A Bar Chart is a graph used to represent the categorical data. It has rectangle bars plotted vertically/horizontally. Vertical bar charts are also referred as Column Bar Chart. The bar charts  compares the categories and groups among discrete data types or counts of each category.

                                                                                                 A Bar Chart

                                                                  Image by OpenClipart-Vectors from Pixabay 

Frequency Distribution Charts

A frequency distribution table/chart/graph is a list of samples with their respective frequencies. Each entry in the table has list of frequencies of a sample. Frequency Tables Can be of Two Types

Univariate Frequency table(Single Variable Frequency Table)

The Table contains the name of the dog breeds and number of pets.


Bi-Variate Frequency table/Joint frequency Distribution is presented as two way contingency tables(cross tables). The below tables shows the investment made by different individuals in Stocks And Bonds.

 

Asim Syed

Pratiksha

Megha

Pankaj

Stocks

20

5

10

20

Bonds

10

25

10

10

 

Pareto Charts

Pareto charts contains both bar graph and line- graph, where individual values are represented in descending order by bars, and the cumulative total is represented by line chart. The left vertical axis represents frequency and the right vertical axis is the cumulative percentage of the total number of occurrences. It is based on Pareto Principle which states that 80% consequences comes from 20% causes and it is also called 80/20 rule.

Pareto Chart

Pie Charts

Pie Charts are a circular statistical tool which is used to visualize the share/proportion/relative frequency of each category. They are usually avoided and you will see less of them while doing Data Analysis.

Pie Chart

Histograms

Histograms are used to visualize the numerical data. The most important thing in histogram is the bins, buckets or range of values. It was introduced by Karl Pearson. We divide the entire range of values in intervals and then count frequency corresponding to each interval. The bins are consecutive, non-overlapping intervals of variables.

We calculate the Interval width = Max – Min/ No of Required Bins.\

Histogram

Box Plots

Box Plot is used to represent numerical data graphically. We can easily check outliers,box plots are non-parametric. They display five-number summary

Minimum (Q0 percentile)

Maximum (Q4 percentile)

Median (Q2 percentile)

First Quartile (Q1 percentile 25th percentile)

Third Quartile (Q3 percentile 75th percentile)


A Box-Plot

Scatter Plots     

Scatter plots are used to show the relationship two numerical variable for a dataset. If the points are coded as color, another variable can also be used. It illustrates the correlation between the two variables.

ScatterPlots

Hope this helps you understand as to which graph to use  with which kind of variable and diffeent kinds of variables you encounter in the dataset. For any questions please feel free to comment down !


 






Comments

Popular posts from this blog

Quick Mind Map for Statistics - Part 1

Quick Mind Map for Statistics - Part 2

Statistics - Measure Of Central Tendency