Research l Reviews l Theories l Mental Health l Quiz

Introduction to Biostatistics

-statistics are simply a collection of tools that researchers employ to help answer research questions

DESCRIPTIVE STATISTICS

MEASURES OF CENTRAL TENDANCY

• A measure of central tendency is a single number used to represent the centre of a grouped data.
• The basic measures are;
• Mean, Median and Mode
• For any symmetrical distribution, the mean, median, and mode will be identical.
• Each measure is designed to represent a typical score.
• The choice of which measure to use depends on:
• the shape of the distribution (whether normal or skewed), and
• the variable’s “level of measurement” (data are nominal, ordinal or interval).

Mean

• The mean (or average) is found by adding all the numbers and then dividing by how many numbers you added together.
• Most common measure of central tendency.
• Formula for calculation of mean:

• Best for making predictions.
• Applicable under two conditions:
• scores are measured at the interval  level, and
• distribution is more or less normal [symmetrical].
 Example:  3,4,5,6,7 3+4+5+6+7= 25 25 divided by 5 = 5 The mean is 5
• Mathematical center of a distribution.
• Good for interval and ratio data.
• Does not ignore any information.
• Inferential statistics is based on mathematical properties of the mean.
• Influenced by extreme scores and skewed distributions.
• May not exist in the data.

Median

• When the numbers are arranged in numerical order, the middle one is the median.
• 50% of observations are above the Median, 50% are below it.
• Formula Median = n + 1 / 2.
 Example: 3,6,2,5,7 Arrange in order 2,3,5,6,7 The number in the middle is 5 The median is 5
•  Not influenced by extreme scores or skewed distribution.
•  Good with ordinal data.
• Easier to compute than the mean.
• Considered as the typical observation.
•  May not exist in the data.
•  Does not take actual values into account.

Mode

• The number that occurs most frequently is the mode.
• We usually find the mode by creating a frequency distribution in which we count how often each value occurs.
• If we find that every value occurs only once, the distribution has no mode.
• If we find that two or more values are tied as the most common, the distribution has more than one mode.
 Example:  2,2,2,4,5,6,7,7,7,7,8 The number that occurs most frequently is 7 The mode is 7
•  Good with nominal data.
•  Bimodal distribution might verify clinical observations (pre and post-menopausal breast cancer).
•  Easy to compute and understand.
•  The score exists in the data set.
•  Ignore most of the information in a distribution.
•  Small samples may not have a mode
•  More than one mode might exist.

Appropriate Measures of Central Tendency

 Nominal variables                -  Mode  Ordinal variables                  -  Median Interval level variables        -   Mean If the distribution is normal (median is better with skewed distribution)

MEASURES OF VARIABILITY

“If there is no variability within populations there would be no need for statistics.”

• Three indices are used to measure variation or dispersion among scores:
• range
• variance, and
• standard deviation (Cozby, 2000).
• These indices answer the question: How Spread out is the distribution?
• Dispersion/Deviation/Spread tells us a lot about how a variable is distributed.

Range

• Range is the simplest method of examining variation among scores
• It refers to the difference between the highest and lowest values produced.
• For continuous variables, the range is the arithmetic difference between the highest and lowest observations in the sample. In the case of counts or measurements, 1 should be added to the difference because the range is inclusive of the extreme observations.
• Another statistic, known as the interquartile range, describes the interval of scores bounded by the 25th and 75th percentile ranks; the interquartile range is bounded by the range of scores that represent the middle 50 percent of the distribution.

Percentiles (or quartiles)

• The First quartile is the 25th percentile (noted Q1),
• the Median value is the 50th percentile (noted Median), and
• the Third quartile is the 75th percentile (noted Q3).
• ‘’ A percentile is a value at or below which a given percentage or fraction of the variable values lie.”
• The p-th percentile is the value that has p% of the measurements below it and (100-p)% above it.
• Thus, the 20th percentile is the value such that one fifth of the data lie below it. It is higher than 20% of the data values and lower than 80% of the data values.’’
• E.g. if you are in the 80th percentile on a real GMAT result, you scored better on that section than 80% of the students taking the GMAT.

Standard deviation

• The standard deviation is the most widely applied measure of variability.
• It shows how much variation there is from the "average" (mean).
• Large standard deviations suggest that scores are probably widely scattered.
• Small standards deviations suggest that there is very little deference among scores.
• Computational formula for S.D:

• Consider a population consisting of the following values:

• There are eight data points in total, with a mean (or average) value of 5:

• To calculate the population standard deviation, first compute the difference of each data point from the mean, and square the result:

• Next divide the sum of these values by the number of values and take the square root to give the standard deviation:

• Therefore, the above has a population standard deviation of 2.

Variance

• The squire of the standard deviation is the variance.

 Introduction Definitions Sampling Scales of Measurement Variables Presenting Data Descriptive Statistics Measures of central tendancy Measures of dispersion/variability Normal Distribution and Probability Inferential statistics: Chisquire Test Inferential statistics: t-tests Inferential statistics: correlation tests Inferential statistics:ANOVA and other tests Inferential statistics: Multivariate analysis Quiz and Questions \