

TECHNICAL NOTE 

Year : 2010  Volume
: 1
 Issue : 2  Page : 6163 


How to select appropriate statistical test?
Jaykaran
Department of Pharmacology, Government Medical College, Surat, Gujarat, India
Date of Web Publication  15Jan2011 
Correspondence Address: Jaykaran Department of Pharmacology, Government Medical College, Surat, Gujarat India
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/09769234.75708
How to cite this article: Jaykaran. How to select appropriate statistical test?. J Pharm Negative Results 2010;1:613 
Selection of appropriate statistical test is very important for analysis of research data. Use of wrong or inappropriate statistical test is a common phenomenon observed in articles published in biomedical journals. ^{[1],[2],[3],[4]} Wrong statistical tests can be seen in many conditions like use of paired test for unpaired data or use of parametric statistical tests for the data which does not follow the normal distribution or incompatibility of statistical tests with the type of data, etc. ^{[5]} Because of the availability of different types of statistical software, performing the statistical tests become easy, but selection of appropriate statistical test is still a problem.
Selection of appropriate statistical tests depends on the following three things:
 What kind of data we are dealing with?
 Whether our data follow the normal distribution or not?
 What is the aim of the study?
What Kind of Data we are Dealing with?   
Usually our research data fall in one out of the following four types of data: nominal data, ordinal data, interval data, and ratio data.
Nominal data
In these kinds of data, observations are given a particular name. Like a person is observed to be 'male' or 'female' or Name of drug as generic or brand, etc. Nominal data cannot be measured or ordered but can be counted. These types of data are considered as categorical data but the order of the categories is meaningless. Data that consist of two classes like male/female or dead/alive are called binomial data, and those that consist of more than two classes like tablet/capsule/syrup are known as multinomial data. Data of these types are usually presented in the form of contingency tables like 2 × 2 tables.
Ordinal data
Ordinal data is also a type of categorical data but in this, categories are ordered logically. These data can be ranked in order of magnitude. One can say definitely that one measurement is equal to, less than, or greater than another. Most of the scores and scales used in research fall under the ordinal data. For example, rating score/scale for the color, taste, smell, ease of application of products, etc.
Interval data
Interval data has a meaningful order and also has the quality that equal intervals between measurements represent equal changes in the quantity of whatever is being measured. But these types of data have no natural zero. Example is Celsius scale of temperature. In the Celsius scale, there is no natural zero, so we cannot say that 70°C is double than 35°C. In interval scale, zero point can be choosen arbitraly. IQ Test is also interval data as it has no natural zero.
Ratio data
Ratio data has all the qualities of interval data (natural order, equal intervals) plus a natural zero point. This type of data is observed to be used most frequently.^{[4]} Example of ration data is height, weight, length, etc. In this type of data, it can be said meaningfully that 10 m of length is double than 5 m. This ratio hold true regardless of which scale the object is being measured in (e.g., meters or yards). Reason for this is the presence of natural zero.
Whether our Data Follow the Normal Distribution or Not?   
This is the second prerequisite for selection of appropriate statistical test. If you know the type of data (nominal, ordinal, interval, and ratio) and distribution of data (normal distribution or not normal distribution), selection of statistical test will be very easy. There is no need to check distribution in the case of ordinal and nominal data. Distribution should only be checked in the case of ratio and interval data. If your data are following the normal distribution, parametric statistical test should be used and nonparametric tests should only be used when normal distribution is not followed.
There are various methods for checking the normal distribution, some of them are plotting histogram, plotting box and whisker plot, plotting QQ plot, measuring skewness and kurtosis, using formal statistical test for normality (KolmogorovSmirnov test, ShapiroWilk test, etc). Formal statistical tests like KolmogorovSmirnov and ShapiroWilk are used frequently to check the distribution of data. All these tests are based on null hypothesis that data are taken from the population which follows the normal distribution. P value is determined to see the alpha error. If P value is less than 0.05, data is not following the normal distribution and nonparametric test should be used in that kind of data. If the sample size is less, chances of nonnormal distribution are increased.
What is the Aim of the Study?   
This is the third prerequisite of selection of appropriate statistical test. What we want to compare? Whether we want to compare the drug with placebo? Or we want to compare effect of intervention by comparing preintervention endpoints with postintervention endpoints.
If a researcher is clear about all the three questions mentioned in previous text, appropriate statistical test can be selected from the flowchart [Figure 1]; this flow chart is modified from the table given on a site of software graphpad ( www.graphpad.com ).
To understand it better, an example has been discussed below.
Incidence of tumors observed in control and treated mice in a preclinical study is as follows:
 Controls  1 of 14 animals
 Treated  6 of 14 animals
Use appropriate statistical test to determine if the incidence is significantly different in two groups.
What is the aim of the study?
Our aim is to compare incidence of tumor between two unpaired groups (control vs treated).
What kind of data we are dealing with?
In the present study, the data type is nominal data.
Whether our data follow the normal distribution or not?
There is no need to check distribution in nominal data. They follow chisquare distribution.
So by looking at the table, we can say that most appropriate test in this condition will be 'Fishers Test' (ChiSquare if large sample size).
References   
1.  Lang T. Twenty statistical errors even you can find in biomedical research articles. Croat Med J 2004;45:36170. [PUBMED] [FULLTEXT] 
2.  Jaykaran, Yadav P, Chavda N, Kantharia ND. Some issues related to the reporting of statistics in clinical trials published in Indian medical journals: A survey. Int J Pharmacol 2010;6:3549. 
3.  Karan J, Kantharia ND, Yadav P, Bhardwaj P. Reporting statistics in clinical trials published in Indian journals: A survey. Pak J Med Sci 2010;26:2126. 
4.  Karan J, Goyal JP, Bhardwaj P, Yadav P. Statistical reporting in Indian Pediatrics. Indian Pediatr 2009;46:8112. [PUBMED] [FULLTEXT] 
5.  Strasak AM, Zaman Q, Karl PP, Gobel G, Ulmer H. Statistical errors in medical researcha review of common pitfalls. Swiss MED Wkly 2007;137:449. 
[Figure 1]
