Descriptive Statistics and Inferential Statistics
- Overview
A descriptive statistic is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics is the process of using and analysing those statistics.
Descriptive statistics is distinguished from inferential statistics (or inductive statistics) by its aim to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent.
This generally means that descriptive statistics, unlike inferential statistics, is not developed on the basis of probability theory, and are frequently nonparametric statistics. Even when a data analysis draws its main conclusions using inferential statistics, descriptive statistics are generally also presented.
For example, in papers reporting on human subjects, typically a table is included giving the overall sample size, sample sizes in important subgroups (e.g., for each treatment or exposure group), and demographic or clinical characteristics such as the average age, the proportion of subjects of each sex, the proportion of subjects with related co-morbidities, etc.
Some measures that are commonly used to describe a data set are measures of central tendency and measures of variability or dispersion. Measures of central tendency include the mean, median and mode, while measures of variability include the standard deviation (or variance), the minimum and maximum values of the variables, kurtosis and skewness.
Please refer to the following for more information:
- Wikipedia: Descriptive Statistics
- Wikipedia: Inferential Statistics
- Descriptive Statistics
Simply put, descriptive statistics help describe and understand the characteristics of a specific data set by providing a brief summary of data samples and measurements. The most recognized types of descriptive statistics are central measures: mean, median, and mode, which are used in nearly all levels of mathematics and statistics. The average is calculated by adding all the numbers in the data set and dividing by the number of numbers in the set.
- Descriptive statistics summarizes or describes the characteristics of a data set.
- Descriptive statistics consists of three basic categories of measures: measures of central tendency, measures of variability (or spread), and frequency distribution.
- Measures of central tendency describe the center of the data set (mean, median, mode).
- Measures of variability describe the dispersion of the data set (variance, standard deviation).
- Measures of frequency distribution describe the occurrence of data within the data set (count).
For example, the sum of the following data sets is 20: (2,3,4,5,6). The average is 4 (20/5). The mode of a data set is the number that occurs most often, and the median is the number in the middle of the data set. It is the number that separates the higher numbers in the data set from the lower numbers. However, there are some less common types of descriptive statistics that are still very important.
Descriptive statistics are used to translate difficult-to-understand quantitative insights from large data sets into brief descriptions. For example, a student's grade point average (GPA) can allow students to better understand descriptive statistics. The idea of GPA is that it takes data points from a variety of tests, courses, and grades and averages them together to provide a holistic view of a student's overall academic performance. A student's individual GPA reflects their average academic performance.
- Inferential Statistics
Inferential statistics is a process that uses data analysis to infer properties of a population. It can help researchers draw conclusions from a sample to a population.
Inferential statistics can be used to:
- Examine differences among groups
- Examine relationships among variables
- Make inferences about a population of measurements
- Estimate some characteristic in a large population
- Test a research hypothesis about a given population
Inferential statistics are particularly useful because it's unlikely that a researcher has access to an entire population.
The two basic types of statistical inference are estimation and hypothesis testing.
Inferential statistics use laws of probability to make inferences about a population based on information gleaned from a sample.
Some measures of inferential statistics include: t-test, z test, Linear regression.
Here are some examples of inferential statistics:
- T-test: A statistical hypothesis test that determines if there is a significant difference between the means of two groups. For example, you might use a t-test to see if there's a statistically significant difference between the exam scores of two mathematics classes taught by different teachers.
- Z-test: A statistical test that determines whether two population means are different when the variances are known and the sample size is large.
- Linear regression: The most common type of regression used in inferential statistics. Linear regression investigates the response of the dependent variable to a unit change in the independent variable.
- Hypothesis testing: A type of inferential statistics where sample data is used to test a claim or hypothesis about a population. For example, a researcher may want to determine if the approval rate of a president is above or below a certain percentage.
[More to come ...]