What is descriptive statistics?
Descriptive statistics is a branch of statistics that deals with the description of a data set. It includes techniques used to summarize and describe the distribution of data, as well as to identify relationships between variables.
In quantitative research, descriptive statistics are used to describe the basic features of a data set. This includes measures of central tendency (e.g., mean, median, mode) and variability (e.g., standard deviation, variance). In qualitative research, descriptive statistics can be used to summarize the distribution of data or to identify relationships between variables.
Inferential statistics is a branch of statistics that deals with the inference of characteristics of a population from a sample. It includes techniques used to test hypotheses about the population, as well as to estimate population parameters. This includes techniques such as t-tests, ANOVA, and regression. In qualitative research, inferential statistics can be used to estimate population parameters.
Some common methods used to generate descriptive statistics include:
- Frequency tables
- Scatter plots
- Box plots
- Correlation coefficients
Descriptive statistics can be helpful in understanding the characteristics of a data set, but it is important to note that they cannot be used to infer causation. For example, just because two variables are correlated does not mean that one variable causes the other. Additionally, descriptive statistics only provide a limited view of a data set and should not be used in place of more detailed statistical analysis.
Types of descriptive statistics
There are three types of descriptive statistics:
- Frequency distribution,
- Measures of central tendency,
- Measures of variability.
We will now discuss each type of descriptive statistics in detail.
Frequency distribution
Frequency distribution is a way of organizing data so that it is easy to see the distribution of values. It shows how often each value occurs in the data set.
A frequency distribution is a table or graph that displays the frequency of occurrence of each value in a data set. The frequencies can be displayed as absolute frequencies (the number of times a value occurs), relative frequencies (the percentage of total observations that a value represents), or cumulatively (the percentage of total observations that have occurred up to and including a particular value).
Measures of central tendency
Measures of central tendency are ways of describing the most common value in a data set. There are three measures of central tendency which include mean, median, and mode.
These measures of central tendency provide different information about a data set.
The mean is the most common measure of central tendency, and it calculates the average value of a data set.
The median is the middle value in a data set, and the mode is the most frequently occurring value.
Measures of variability
Measures of variability describe how spread out the values in a data set are. There are four measures of variability which include: range, standard deviation, variance, and kurtosis.
Range
Range is the simplest measure of variability, and it calculates the difference between the highest and lowest values in a data set.
Example of range:
The range of the data set is 7. This means that the highest value in the data set is 7 and the lowest value is 0.
Standard deviation
Standard deviation is the most common measure of variability, and it calculates the average distance of each value from the mean. Standard deviation is represented by the symbol σ (sigma).
Example of standard deviation:
The standard deviation of the data set is 2. This means that the values in the data set vary from the mean by 2 points.
Variance
Variance is a more advanced measure of variability, and it calculates the sum of squared deviations from the mean.
Variance is represented by the symbol σ² (sigma squared).
Kurtosis
Kurtosis is a more advanced measure of variability, and it quantifies the degree of peakedness or flatness of a distribution.
Example of kurtosis:
The kurtosis of the data set is 3. This means that the distribution is peaked and has more observations near the mean than near the extremes.
Univariate descriptive statistics
Univariate descriptive statistics are used to describe a single variable. They are typically used to generate summary statistics, such as the mean or standard deviation, which provide information about the entire data set.
Bivariate descriptive statistics
Bivariate descriptive statistics are used to describe two variables. They are typically used to generate summary statistics, such as the correlation coefficient, which provide information about the relationship between two variables.
In bivariate analysis, the dependent variable is typically plotted on the horizontal axis and the independent variable is plotted on the vertical axis. This is known as a scatterplot.
Multivariate descriptive statistics
Multivariate descriptive statistics are used to describe more than two variables. They are typically used to generate summary statistics, such as the multiple correlation coefficient, which provide information about the relationships among multiple variables.
In multivariate analysis, the distribution of each variable is described in terms of the other variables. This can be done by plotting the data on a three-dimensional scatterplot or by constructing a multidimensional graph.
Contingency table
A contingency table is a table that displays the relationship between two variables. It is typically used to determine whether there is a statistically significant relationship between the two variables. To create a contingency table, the data must be divided into two groups – called categories – and the number of observations in each group must be recorded.
Scatter plots
A scatter plot is a graph that shows the relationship between two variables. Scatter plots are used to visualize the relationship between two variables.
When plotting a scatter plot, the variables should be placed on the x-axis and y-axis. The independent variable should be on the x-axis and the dependent variable should be on the y-axis.
Descriptive statistics are a helpful way to summarize and understand data sets. However, it is important to note that they should not be used in place of more detailed statistical analysis. Additionally, descriptive statistics only provide a limited view of a data set and should not used to make predictions or inferences.