醫(yī)學(xué)統(tǒng)計(jì)學(xué)-電子教材:Miscellaneous

來源：南方醫(yī)科大學(xué)精品課程網(wǎng) 精品課程網(wǎng)

醫(yī)學(xué)統(tǒng)計(jì)學(xué):電子教材 Miscellaneous:ContentMiscellaneousMiscellaneousRisk(prospective)Risk(retrospective)AttributableriskDiagnostictestsLikelihoodratiosNumberneededtotreatKappaandMaxwellScreeningtesterrorsMantel-HaenzseltestRandomizatio

Content

Miscellaneous

Risk (prospective)

Risk (retrospective)

Attributable risk

Diagnostic tests

Likelihood ratios

Number needed to treat

Kappa and Maxwell

Screening test errors

Mantel-Haenzsel test

Randomization

Crosstabs

Miscellaneous. 1

Risk (prospective). 2

Risk (retrospective). 4

Attributable risk and risk difference. 6

Diagnostic test 2 by 2 table. 7

Likelihood ratios. 9

Number needed to treat. 11

Kappa and Maxwell. 13

Screening test errors. 17

Mantel-Haenszel test and odds ratio meta-analysis. 19

Randomization. 22

Crosstabs. 23

Miscellaneous

·Risk(prospective)

·Risk(retrospective)

·Diagnostictest 2 by 2 table

·Likelihoodratios (2 by k)

·Numberneeded to treat

·Kappa(2 raters)

·Screeningtest errors

·Standardizedmortality ratios

·Mantel-Haenszeltest

·Incidencerates

·Randomization

Menu location: Analysis_Miscellaneous.

Download a free 10 day StatsDirect trial

Risk(prospective)

Menu location: Analysis_Miscellaneous_Risk (Prospective).

This function calculates relativerisk, risk difference and population attributable riskdifference with confidence intervals.

You can examine the risk of anoutcome, such as disease, given the incidence of the outcome in relation to anexposure, such as a suspected risk or protection factor for a disease. Thestudy design should be prospective.If you need information on retrospective studies see risk(retrospective).

The type of data used by thisfunction is counts or frequencies (number of individuals with a studycharacteristic). If you want to analyse person-timedata (e.g. months of follow up) instead of counts then please see incidencerates.

In studies of the incidence of aparticular outcome in two groups of individuals, defined by the presence orabsence of a particular characteristic, the odds ratio for the resultant fourfoldtable becomes the relative risk. Relative risk is used for prospective studieswhere you follow groups with different characteristics to observe whether ornot a particular outcome occurs:

		EXPOSED	UNEXPOSED
OUTCOME	YES	a	b
	NO	c	d

Outcome rate exposed (Pe) = a/(a+c)

Outcome rate not exposed (Pu) = b/(b+d)

Relative risk (RR) = Pe/Pu

Risk difference (RD) = Pe-Pu

Estimate of population exposure (Px) = (a+c)/(a+b+c+d)

Population attributable risk % =100*(Px*(RR-1))/(1+(Px*(RR-1)))

In retrospective studies whereyou select subjects by outcome not by group characteristic then you would usethe odds ratio ((a/c)/(b/d)) and not the relativerisk. See risk(retrospective) for more information.

In addition to the relativemeasure of effect (relative risk) you may wish to express the absolute effectsize in your study as the risk difference. Risk difference is sometimesreferred to as attributable risk and when expressed in percent terms itis also referred to as attributable proportion, attributable rate percent andpreventive fraction. Attributable risk or risk difference is used to quantifyrisk in the exposed group that is attributable to the exposure.

Population attributablerisk estimates the proportion of disease in the study population that isattributable to the exposure. In order to calculate population attributable risk, the incidence of exposure in the studypopulation must be known or estimated, StatsDirectprompts you to enter this value or to default to an estimate made from yourstudy data. Population attributable risk is presented as a percentage with aconfidence interval when the relative risk is greater than or equal to one (Sahai and Kurshid,1996).

Technical validation

Koopman's likelihood-based approximation recommended by Gartand Namis used to construct confidence intervals for relative risk (Gart and Nam, 1988;Koopman, 1984). Please note that relative risk, risk ratio and likelihoodratio are all calculations for ratios of binomial probabilities, therefore, theapproach to confidence intervals is the same for each of them.

The confidence interval for riskdifference is constructed using the robust approximation of Miettinenand Nurminen (Miettinen andNurminen, 1985; Mee, 1984; Anbar, 1983; Gart and Nam, 1990; Newcombe, 1998b).

Approximate power is calculatedas the power achieved with the given sample size to detect the observed effectwith a two-sided probability of type I error of(100-CI%)% based on analysis with Fisher's exact test or a continuity correctedchi-square test of independence in a fourfold contingency table (Dupont, 1990).

Walter's approximate varianceformula is used to construct the confidence interval for population attributable risk (Walter, 1978; Leungand Kupper, 1981).

Example

From Sahai and Khurshid(1996, p. 208).

The following data are a subsetof the Framingham study results showing the number of cases of coronary heartdisease (CHD) becoming clinically apparent six years after follow up of acohort of 1329 men in the 40 to 59 age group. The men are divided by theirlevel of serum cholesterol (a suspected risk factor) at the start of the study:

	Cholesterol >=220 mg%	Cholesterol < 220 mg%
CHD	72	20
No CHD	684	553

To analysethese data in StatsDirect you must select risk(prospective) from the miscellaneous section of the analysis menu. Choose thedefault 95% confidence interval. Then enter the above frequencies into the 2 by2 table on the screen.

For this example:

Risk ratio (relative risk inincidence study) = 2.728571

Approximate (Koopman)95% confidence interval = 1.694347 to 4.412075

Approximate power (for 5%significance) = 99.13%

Risk difference = 0.060334

Approximate (Miettinen)95% confidence interval = 0.034379 to 0.086777

Population exposure % = 56.884876

Population attributable risk % =49.578875

Approximate (Walter) 95%confidence interval = 30.469457 to 68.688294

Here we can say that the risk ofCHD in men of this age is around two and a half times greater for those of themwith serum cholesterol above 220 mg% compared with those with lower cholesterollevels. The confidence interval excludes one, indicating a significant result,and with 97.5% confidence we can say that this relative risk is at least 1.7 ifthe cohort is typical of men of this age in the wider population to which weare applying these results.

The population attributablerisk estimates the proportion of disease (or other outcome) in the populationthat is attributable to the exposure. From these results we can say, with 95%confidence, that somewhere between 30% and 70% of the cases of CHD in 40 to 59year old men are associated with high cholesterol (above 220 mg%).

confidenceintervals

Download a free 10 day StatsDirect trial

Risk(retrospective)

Menu location: Analysis_Miscellaneous_Risk (Retrospective).

This function calculates oddsratios and population attributable risk withconfidence intervals.

You can examine the likelihood ofan outcome such as disease in relation to an exposure such as a suspected riskor protection factor. The study design considered here is retrospectiveand usually a case-control study. If you need information on prospective studiessee risk(prospective).

The type of data used by thisfunction is counts or frequencies (number of individuals with a studycharacteristic). You should organise these data into afourfold table divided by outcome (e.g. disease status) in one dimension andthe presence or absence of a characteristic factor (e.g. risk factor) in theother:

EXPOSED

UNEXPOSED

OUTCOME

YES

Odds ratio (OR) = (a*d)/(b*c)

Estimate of population exposure (Px) = c/(c+d)

Estimate of populationattributable risk% = 100*(Px*(OR-1))/(1+(Px*(OR-1)))

In retrospective studies youselect subjects by outcome and look back to see if they have a characteristicfactor such as a risk factor or a protection factor for a disease. The oddsratio ((a/c)/(b/d)) looks at the likelihood of anoutcome in relation to a characteristic factor. In epidemiological terms, theodds ratio is used as a point estimate of the relative risk in retrospectivestudies. Odds ratio is the key statistic for most case-control studies.

In prospective studies, Attributablerisk or risk difference is used to quantify risk in the exposed group thatis attributable to the exposure. In retrospective studies, attributablerisk can not be calculated directly but population attributable risk can beestimated. Population attributable risk estimates theproportion of disease in the study population that is attributable to theexposure. In order to calculate population attributablerisk, the incidence of exposure in the study population must be known orestimated, StatsDirect prompts you to enter thisvalue or to default to an estimate made from your study data. Populationattributable risk is presented as a percentage with a confidence interval whenthe odds ratio is greater than or equal to one (Sahai and Kurshid,1996).

Technical validation

A confidence interval (CI) forthe odds ratio is calculated using an exact conditional likelihood method (Martin and Austin,1991). The exact calculations can take an appreciable amount of time withlarge numbers.

Example

From Sahai and Kurshid(1996, p. 209).

The following data represent aretrospective investigation of smoking in relation to oral cancer.

	Smoking status (cigarettes per day)
	³ 16	< 16
Cases	255	49
Controls	93	46

To analysethese data in StatsDirect you must select risk(retrospective) from the miscellaneous section of the analysis menu. Choose thedefault 95% confidence interval. Then enter the above frequencies into the 2 by2 table on the screen.

For this example:

Risk analysis (retrospective)

		Characteristic factor
		Present	Absent
Outcome	Positive	255	49
Negative	93	46

Observed odds Ratio = 2.574062

Approximate power (for 5%significance) = 96.84%

Approximate (Woolf, logit) 95% confidence interval = 1.613302 to 4.106976

Conditional maximum likelihoodestimates:

Conditional estimate of oddsratio = 2.56799

Exact Fisher 0% confidenceinterval = 1.566572 to 4.213082

Exact Fisher one sided P =0.000065, two sided P = 0.000096

Exact mid-P 0% confidenceinterval = 1.606435 to 4.107938

Exact mid-P one sided P =0.000044, two sided P = 0.000088

Population exposure % = 66.906475

Population attributable risk % =51.294336

Approximate 95% confidenceinterval = 34.307694 to 68.280978

From these data we have evidencethat the odds of developing oral cancer is around two and a half times higherfor heavy smokers compared with lighter (less than 16 per day) or non-smokersof cigarettes. With 95% confidence we infer that the true population value forthis statistic lies between one and a half and four times. Using an estimate of67% heavy smoking, for the population studied in this 1957 investigation, wecan infer with 95% confidence that the proportion of oral cancer cases in thatpopulation that were due to heavy smoking lay between 34 and 68 percent.

confidenceintervals

Download a free 10 day StatsDirect trial

Attributablerisk and risk difference

Attributable risk (AR) or riskdifference is the difference between the incidence rates in exposed andnon-exposed groups. In a cohort study, AR is calculated as the difference incumulative incidences (risk difference) or incidence densities (ratedifference). This reflects the absolute risk of the exposure or the excess riskof the outcome (e.g. disease) in the exposed group compared with the non-exposedgroup. AR is sometimes referred to as attributablerisk in the exposed because it is used to quantify risk in the exposed groupthat is attributable to the exposure.

Population attributable risk(PAR) is different from AR. PAR estimates the proportion of disease in thestudy population that is attributable to the exposure. In order to calculatePAR, the incidence of exposure in the study population must be known orestimated. PAR is usually expressed as a percentage. See Risk (prospective)for more information.

Download a free 10 day StatsDirect trial

Diagnostictest 2 by 2 table

Menu location: Analysis_Miscellaneous_Diagnostic Test (2 by 2).

This function gives predictivevalues (post-test likelihood) with change, prevalence (pre-test likelihood),sensitivity, specificity and likelihood ratios with robustconfidence intervals (Sackett et al.,1983, 1991; Zhou et al., 2002).

The quality of a diagnostic testis often expressed in terms of sensitivity and specificity. Sensitivity is theability of the test to pick up what it is testing for and specificity isthe ability of the test to reject what it is not testing for.

		DISEASE/OUTCOME
		Present	Absent
TEST	+	a (true +ve)	b (false +ve)
	-	c (false -ve)	d (true -ve)

Sensitivity = a/(a+c)

Specificity = d/(b+d)

+vepredictive value = a/(a+b)

-vepredictive value = d/(d+c)

Likelihood ratio of a positivetest = [a/(a+c)]/[b/(b+d)]

Likelihood ratio of a negativetest = [c/(a+c)]/[d/(b+d)]

Likelihood ratios have becomeuseful because they enable one to quantify the effect a particular test resulthas on the probability of a certain diagnosis or outcome. Using a simplifiedform of Bayes' theorem:

posterior odds = prior odds * likelihood ratio

where:

odds = probability/(1-probability)

probability = odds/(odds+1)

This function is not trulyBayesian because it does not use any starting/prior probability. Likelihoodratios, however, are provided and these can be used to direct the flow ofprobabilities in Bayesian analysis. For an excellent account of this approachin medical diagnosis, see Sackett (1991).

Another way to summarise diagnostic test performaceis via the diagnostic odds ratio:

Diagnostic odds ratio = true/false= (a * d)/(b * c)

Technical validation

The confidence intervals for thelikelihood ratios are constructed using the likelihood-based approach tobinomial proportions of Koopman (1984)suggested by Gartand Nam (1988). The confidence intervals for all other statistics are exactbinomial confidence intervals constructed using the method of Clopper and Pearson (Newcombe, 1998c).

Example

In a hypothetical example of adiagnostic test, serum levels of a biochemical marker of a particular diseasewere compared with the known diagnosis of the disease. 100 international unitsof the marker or greater was taken as an arbitrary positive test result:

	Disease	No Disease
Marker ³ 100	431	30
Marker < 100	29	116

To analysethese data in StatsDirect you must select diagnostictest 2 by 2 table from the miscellaneous section ofthe analysis menu. Choose the default a 95% confidence interval. Then enter theabove frequencies into the 2 by 2 table on the screen.

For this example:

		Disease / Feature:
		Present	Absent	Totals
Test:	Positive	431	30	461
Negative	29	116	145
Totals	460	146	606

52667788.cn/sanji/

Including 95% confidenceintervals:

Prevalence (pre-test likelihoodof disease)

0.759076 (0.722988 to 0.792617),75.91% (72.3% to 79.26%)

Predictive value of +ve test (post-test likelihood of disease)

0.934924 (0.908401 to 0.955666),93.49% (90.84% to 95.57%), {change = 17%}

Predictive values of -ve test

(post-testlikelihood of no disease)

0.8 (0.725563 to 0.861777), 80%(72.56% to 86.18%), {change = 56%}

(post-testdisease likelihood despite -ve test)

0.2 (0.274437 to 0.138223), 20%(27.44% to 13.82%), {change = -56%}

Sensitivity (true positive rate)

0.936957 (0.910711 to 0.957376),93.7% (91.07% to 95.74%)

Specificity (true negative rate)

0.794521 (0.719844 to 0.856862),79.45% (71.98% to 85.69%)

Likelihood Ratio

LR (positive test) = 4.559855(3.364957 to 6.340323)

LR (negative test) = 0.079348(0.055211 to 0.113307)

Diagnostic Odds Ratio

Odds ratio = 56.64839 (32.064885to 103.54465)

Here we can say with 95%confidence that marker results of ³ 100 are at least three (3.365) times morelikely to come from patients with disease than those without disease. Also with95% confidence we can say that marker results of <100 are at most only aboutone tenth (0.133) as likely to come from patients with disease as from thosewithout disease.

confidenceintervals

Download a free 10 day StatsDirect trial

Likelihoodratios

Menu location: Analysis_Miscellaneous_Likelihood Ratios.

This function gives likelihoodratios and their confidence intervals for each of two or more levels of resultsfrom a test (Sackettet al., 1983, 1991).

The quality of a diagnostic testcan be expressed in terms of sensitivity and specificity. Sensitivity is theability of the test to pick up what it is testing for and specificity isthe ability of the test to reject what i醫(yī)學(xué)招聘網(wǎng)t is not testing for.

		DISEASE/OUTCOME
		Present	Absent
TEST	+	a (true +ve)	b (false +ve)
	-	c (false -ve)	d (true -ve)

Sensitivity = a/(a+c)

Specificity = d/(b+d)

Likelihood ratio of a positivetest = [a/(a+c)]/[b/(b+d)]

Likelihood ratio of a negativetest = [c/(a+c)]/[d/(b+d)]

Likelihood ratios enable you toquantify the effect that a particular test result has on the probability of anoutcome (e.g. diagnosis of a disease). Using a simplified form of Bayes' theorem:

posterior odds = prior odds * likelihood ratio

where:

odds = probability/(1-probability)

probability = odds/(odds+1)

These methods can be generalised to more than two possible test outcomes, inwhich case the data can be arranged into a two by k table (k is the number oftest outcomes studied). If one test outcome is called test level j then thelikelihood ratio at level j is given by:

likelihood ratio j = p(tj_disease)/p(tj_no disease)

where p(tj_ is the proportion displaying therelevant test result at level j

Technical validation

The confidence intervals for thelikelihood ratios are constructed using the iterative method suggested by Gart and Nam (1991).

Example

From Sackett et al.(1991, p. 111).

Initial creatinephosphokinase (CK) levels were related to thesubsequent diagnosis of acute myocardial infarction (MI) in a group of patientswith suspected MI. Four ranges of CK result were chosen for the study:

	MI	No MI
CK ³ 280	97	1
CK = 80-279	118	15
CK = 40-79	13	26
CK = 1-39	2	88

To analysethese data in StatsDirect you must select likelihoodratios for 2 by k tables from the miscellaneous section of the analysis menu.Choose the default 95% confidence interval. Enter the number of test levels as4 then enter the above frequencies as prompted on the screen.

For this example:

Result	+ Feature	- Feature	Likelihood Ratio	95% CI (Koopman)
1	97	1	54.826087	9.923105 to 311.581703
2	118	15	4.446377	2.772565 to 7.31597
3	13	26	0.282609	0.151799 to 0.524821
4	2	88	0.012846	0.003513 to 0.046227

Here we can say with 95%confidence that CK results of ³280 are at least ten (9.9) times more likely to comefrom patients who have had an MI than they are to come from those who have nothad an MI.

confidenceintervals

Download a free 10 day StatsDirect trial

Numberneeded to treat

Menu location: Analysis_Miscellaneous_Number Needed to Treat.

This function gives relativerisk, relative risk reduction, absolute risk reduction (risk difference) andnumber needed to treat (NNT) with exact or near-exact confidence intervals.Theses statistics are usually presented in the context of health careinterventions but they apply equally to other forms of treatment; NNT ingeneral terms is the number of treated subjects needed to produce one outcome (Cook and Sackett,1995; Deeks and Altman, 2001).

For example, in the VeteransAdministration Trial, drugs used to treat high blood pressure were investigatedover three years for their effect on damage rates to organs of the bodytypically affected by high blood pressure (Laupacis et al.,1988).

If the intervention studied hasan adverse effect on outcome then the same calculations used here for NNT maybe expressed instead as number needed to harm (NNH). The notation of harm orbenefit suggested by Doug Altman (1998)is used here instead of quoting signed NNT estimates and confidence limits.

Some authors discourage the useof NNT, due mainly to the assumptions made when converting rate differencesinto numbers of individuals (Hutton, 2000).In some situations, it may be preferable to quote absolute risk reduction (ARR)multiplied by a constant, say 100, as the main summary measure of effect from aclinical trial.

Definitions

TREATED

CONTROLS

ADVERSE EVENT

YES

LET:

pc = proportion of subjects in control group who suffer an event

pc = b / (b+d)

pt = proportion of subjects in treated group who suffer an event

pt = a / (a+c)

er = expected/baseline risk inuntreated subjects

THEN:

Relative risk of event (RRe) = pt / pc

Relative risk of no event (RRne) = (1-pt) / (1-pc)

Odds ratio (OR) = (a*d) / (b*c)

Relative risk reduction (RRR) =(pc-pt) / pc = 1-RRe

Absolute risk reduction (ARR)/risk difference (RD) = pc-pt

Number needed to treat (NNT):

NNT [risk difference] = 1 / RD

NNT [relative risk of event] = 1/ (pc*RRR)

NNT [relative risk of no event] =1 / ((1-pc)*(RRne-1))

NNT [odds ratio] = (1-(pc*(1-OR))/ (pc*(1-pc)*(1-OR))

Adjusted NNT statistics can becalculated with er substituted for pc.

Consensus regarding the roundingof NNT statistics is to round up (Sackett et al.,1996a,b); StatsDirectgives rounded up and unrounded NNT statistics.

The most commonly quoted NNTstatistic is NNT [risk difference] or the empirical NNT, which assumes aconstant risk difference over different expected event rates. The other NNTstatistics assume that a relative measure (RRe, RRne or OR) is constant overdifferent expected event rates, therefore these NNTsvary with the expected event rate. You might want to calculate a range of NNTs for the range of control event rates observed in allrelevant studies. Careful thought must be given to the assumption that arelative measure is constant across different studies and populations, as thismay be incorrect (Sharpet al., 1996; Ioannidis et al., 1997; Smith et al., 1997; Thompson et al.,1997; Smeeth et al., 1999; Altman and Deeks, 2002; Deeks 2002). If indoubt, please consult with a Statistician.

If you wish to calculate NNTs across a number of studies then you might considerapplying one of the relative NNT formulae above with to a relative effectstatistic calculated using meta-analysis.This is best done with the guidance of a Statistician (Smeeth et al., 1999;Sharp et al. 1996; Smith et al., 1997)

Technical validation

Confidence intervals forindividual risks/proportions are calculated using the Clopper-Pearsonmethod (Newcombe,1998c). Confidence intervals for relative risk are calculated using Koopman's likelihood-based approach advocated by Gart and Nam(Gart and Nam,1988; Koopman, 1984; Haynes and Sackett, 1993). Confidence intervals forrisk difference and number needed to treat are based on the iterative method ofMiettinen and Nurminen (Mee, 1984; Anbar,1983; Gart and Nam 1990; Miettinen and Nurminen, 1985) for constructingconfidence intervals for differences between independent proportions. ExactFisher confidence intervals are used for odds ratios (Martin and Austin,1991).

Example

From Haynes and Sackett(1993):

In a trial of a drug for thetreatment of severe congestive heart failure 607 patients were treated with anew angiotensin converting enzyme inhibitor (ACEi) and 607 other patients were treated with a standardnon-ACEi régime. 123 out of 607 patients on the non-ACEi régime died within six months and 94 out of the 607 ACEi treated patients died within six months.

To analysethese data in StatsDirect you must select numberneeded to treat from the miscellaneous section of the analysis menu. Choose thedefault 95% confidence interval. Enter the number of controls as 607 with 123suffering an event and enter the number treated as 607 with 94 suffering anevent.

For this example:

Number needed to treat(empirical results using observed counts only)

Estimates with 95% confidenceintervals:

Risk of event in controls =123/607 = 0.202636 (0.171346 to 0.23685)

Risk of event in treated = 94/607= 0.15486 (0.126993 to 0.186133)

Relative risk of event = 0.764228(0.598898 to 0.974221)

Risk of no event in controls =484/607 = 0.797364 (0.76315 to 0.828654)

Risk of no event in treated =513/607 = 0.84514 (0.813867 to 0.873007)

Relative risk of no event =1.059917 (1.005736 to 1.118137)

Odds ratio of event in treatedcf. controls = 0.721026 (0.529913 to 0.979347)

Relative risk reduction(controls-treated) = 0.235772 (0.025779 to 0.401102)

Risk difference(controls-treated) = 0.047776 (0.004675 to 0.090991)

NNT [risk difference] =20.931034_benefit (10.990105_benefit to 213.910426_benefit)

NNT [risk difference] (roundedup) = 21_benefit (11_benefit to 214_benefit)

Here we infer, with 95% confidence,that you need to treat as many as 214 or as few as 11 patients in severecongestive heart failure with this ACEi in order toprevent one death that would not have been prevented with the standard non-ACEi therapy in six months of treatment.

confidenceintervals

Download a free 10 day StatsDirect trial

Kappa andMaxwell

Menu location: Analysis_Miscellaneous_Kappa & Maxwell.

AgreementAnalysis

For the case of two raters, thisfunction gives Cohen's kappa (weighted and unweighted)and Scott's pi as measures of inter-rater agreement for two raters' categoricalassessments (Fleiss,1981; Altman, 1991; Scott 1955). For three or more raters, this functiongives extensions of the Cohen kappa method, due to Fleiss and Cuzick(1979) in the case of two possible responses per rater, and Fleiss, Nee andLandis (1979) in the general case of three or more responses per rater.

If you have only two categoriesthen Scott's pi is the statistic of choice (with confidence intervalconstructed by the Donner-Eliasziw(1992) method) for inter-rater agreement (Zwick, 1988).

Weighted kappa partly compensatesfor a problem with unweighted kappa, namely that itis not adjusted for the degree of disagreement. Disagreement is weighted indecreasing priority from the top left (origin) of the table. StatsDirect uses the following definitions for weight (1 isthe default):

1. w(ij)=1-abs(i-j)/(g-1)

2. w(ij)=1-[(i-j)/(g-1)]²

3. User defined (this is onlyavailable via workbook data entry)

g = categories

w = weight

i = category for one observer (from 1 to g)

j = category for the otherobserver (from 1 to g)

In broad terms a kappa below 0.2indicates poor agreement and a kappa above 0.8 indicates very good agreementbeyond chance.

Guide (Landis and Koch,1977):

Kappa	Strength of agreement
< 0.2	Poor
> 0.2 £ 0.4	Fair
> 0.4 £ 0.6	Moderate
> 0.6 £ 0.8	Good
> 0.8 £ 1	Very good

N.B. You can not reliably compare kappa values from different studiesbecause kappa is sensitive to the prevalence of different categories. i.e. if one category is observed more commonly in one studythan another then kappa may indicate a difference in inter-rater agreementwhich is not due to the raters.

Agreement analysis with more thantwo raters is a complex and controversial subject, see Fleiss (1981, p.225).

Disagreement Analysis

StatsDirect uses the methods of Maxwell (1970)to test for differences between the ratings of the two raters (or k nominalresponses with paired observations).

Maxwell's chi-square statistictests for overall disagreement between the two raters. The general McNemar statistic tests for asymmetry in the distributionof subjects about which the raters disagree, i.e. disagreement more over somecategories of response than others.

Data preparation

You may present your data for thetwo-rater methods as a fourfold table in the interactive screen data entry menuoption. Otherwise, you may present your data as responses/ratings in columnsand rows in a worksheet, where the columns represent raters and the rowsrepresent subjects rated. If you have more than two raters then you mustpresent your data in the worksheet column (rater) row (subject) format. Missingdata can be used where raters did not rate all subjects.

Technical validation

All formulae for kappa statisticsand their tests are as per Fleiss (1981):

For two raters (m=2) and twocategories (k=2):

- where n is the number ofsubjects rated, w is the weight for agreement or disagreement, po is the observed proportion of agreement, pe is the expected proportion of agreement, pij is the fraction of ratings iby the first rater and j by the second rater, and so is the standard error fortesting that the kappa statistic equals zero.

For three or more raters (m>2)and two categories (k =2):

- wherexi is the number of positive ratings out of mi raters for subject i of n subjects, and so is the standard error for testingthat the kappa statistic equals zero.

For three or more raters andcategories (m>2, k>2):

- where sojis the standard error for testing kappa equal for each rating categoryseparately, and so bar is the standard error for testing kappa equal to zerofor the overall kappa across the k categories. Kappa hat is calculated as forthe m>2, k=2 method shown above.

Example

From Altman (1991).

Altman quotes the results of Brostoff et al. in a comparison not of two human observersbut of two different methods of assessment. These methods are RAST (radioallergosorbent test) and MAST (multi-RAST) for testingthe sera of individuals for specifically reactive IgEin the diagnosis of allergies. Five categories of result were recorded usingeach method:

		RAST
		Negative	weak	moderate	high	very high
MAST	negative	86	3	14	0	2
Weak	26	0	10	4	0
Moderate	20	2	22	4	1
High	11	1	37	16	14
very high	3	0	15	24	48

To analysethese data in StatsDirect you may select kappa fromthe miscellaneous section of the analysis menu. Choose the default 95%confidence interval. Enter the above frequencies as directed on the screen andselect the default method for weighting.

For this example:

General agreement over allcategories (2 raters)

Cohen's kappa (unweighted)

Observed agreement = 47.38%

Expected agreement = 22.78%

Kappa = 0.318628 (se = 0.026776)

95% confidence interval =0.266147 to 0.371109

z (for k = 0) = 11.899574

P < 0.0001

Cohen's kappa (weighted by 1-Abs(i-j)/(1 - k))

Observed agreement = 80.51%

Expected agreement = 55.81%

Kappa = 0.558953 (se = 0.038019)

95% confidence interval for kappa= 0.484438 to 0.633469

z (for kw = 0) = 14.701958

P < 0.0001

Scott's pi

Observed agreement = 47.38%

Expected agreement = 24.07%

Pi = 0.30701

Disagreement over any categoryand asymmetry of disagreement (2 raters)

Marginal homogeneity (Maxwell)chi-square = 73.013451, df =4, P < 0.0001

Symmetry (generalisedMcNemar) chi-square = 79.076091, df = 10, P < 0.0001

Note that for calculation ofstandard errors for the kappa statistics, StatsDirectuses a more accurate method than that which is quoted in most textbooks (e.g. Altman, 1990).

The statistically highlysignificant z tests indicate that we should reject the null hypothesis that theratings are independent (i.e. kappa = 0) and accept the alternative thatagreement is better than one would expect by chance. Do not put too muchemphasis on the kappa statistic test, it makes a lot of assumptions and fallsinto error with small numbers.

The statistically highlysignificant Maxwell test statistic above indicates that the raters disagreesignificantly in at least one category. The generalisedMcNemar statistic indicates the disagreement is notspread evenly.

confidenceintervals

P values

Download a free 10 day StatsDirect trial

Screeningtest errors

Menu location: Analysis_Miscellaneous_Screening Test Errors.

This function gives theprobability of false positive and false negative results with a test of given trueand false positive rates and a given prevalence of disease (Fleiss, 1981).

When considering a diagnostictest for screening populations it is important to consider the number of falsenegative and false positive results you will have to deal with. The quality ofa diagnostic test is often expressed in terms of sensitivity and specificity.Sensitivity is the ability of the test to pick up what you are looking for andspecificity is the ability of the test to reject what you are not looking for.

		DISEASE
		Present	Absent
TEST	+	a (true +ve)	b (false +ve)
-	c (false -ve)	d (true -ve)

Sensitivity = a/(a+c)

Specificity = d/(b+d)

We can apply Bayes'theorem if we know the approximate likelihood that a subject has the diseasebefore they come for screening, this is given by the prevalence of the disease.For low prevalence diseases the false negative rate will be low and the falsepositive rate will be high. For high prevalence diseases the false negativerate will be high and the false positive rate will be lower. People are oftensurprised by the high numbers of projected false positives,you need a highly specific test to keep this number low. The false positiverate of a screening test can be reduced by repeating the test. In some cases atest is performed three times and the patient is declared positive if at leasttwo out of the three component tests were positive.

Technical Validation

Results are calculated as:

- where PF+is the false positive rate, PF- is the false negative rate, P(A|B) is theprobability of A given B, A is a positive test result, A bar is a negative testresult, B is disease present and B bar is disease absent.

Example

In a hypothetical example 4000patients were tested with a screening test for a disease. Of these 4000patients 2000 were known to have the disease and 2000 were known to be free ofthe disease:

		DISEASE
		Present	Absent
TEST	+	1902 (true +ve)	22 (false +ve)
-	98 (false -ve)	1978 (true -ve)

To analysethese data in StatsDirect you must select falseresult probabilities from the miscellaneous sub-menu of the screen datafunctions section of the analysis menu. Enter the true +verate as 0.951 (1902/(1902+98)) and the false +ve rate as 0.011 (22/(1978+22)). Enter the prevalence as 1 in 100 by entering n as 100.

For this example:

For an overall case rate of 100per ten thousand population tested:

Test SENSITIVITY = 95.1%

Probability of a FALSE POSITIVEresult = 0.533824

Test SPECIFICITY = 98.9%

Probability of a FALSE NEGATIVEresult = 0.0005

Here we see that more than halfof the patients tested will give a positive test when they do not have thedisease. This is clearly not acceptable for a full screening method but couldbe used as pre-screening before further tests if there was no better initialtest available.

Download a free 10 day StatsDirect trial

Mantel-Haenszel test and odds ratiometa-analysis

Menu locations:

Analysis_Chi-Square_MantelHaenszel;

Analysis_Meta-analysis_Odds Ratio.

Case-control studies ofdichotomous outcomes (e.g. healed or not healed) can by represented byarranging the observed counts into fourfold (2 by 2) tables. The separation ofdata into different tables or strata represents a sub-grouping, e.g. into agebands. Stratification of this kind is sometimes used to reduce confounding.

The Mantel-Haenszelmethod provides a pooled odds ratio across the strata of fourfold tables. Meta-analysis is used to investigate the combinationor interaction of a group of independent studies, for example a series offourfold tables from similar studies conducted at different centres.

This StatsDirectfunction examines the odds ratio for each stratum (a single fourfold table) andfor the group of studies as a whole. Exact methods are used here in addition toconventional approximations.

For a single stratum odds ratiois estimated as follows:

Exposed

Non-Exposed

OUTCOME

Cases

Non-cases

Sample estimate of the odds ratio= (ad)/(bc)

For each table, the observed oddsratio is displayed with an exact confidence interval (Martin and Austin,1991; Sahai and Kurshid, 1996). With very large numbers these calculationscan take an appreciable amount of time. If the ‘try exact’ optionis not selected then the logit (Woolf) interval isgiven instead.

The Mantel-Haenszelmethod is used to estimate the pooled odds ratio for all strata, assuming afixed effects model:

- where ni = ai+bi+ci+di.

Alternative methods, such Woolfand inverse variance, can be used to estimate the pooled odds ratio with fixedeffects but the Mantel-Haenszel method is generallythe most robust. A confidence interval for the Mantel-Haenszelodds ratio in StatsDirect is calculated using theRobins, Breslow and Greenland variance formula (Robins et al., 1986)or by the method of Sato (1990) ifthe estimate of the odds ratio can not be determined. A chi-square teststatistic is given with its associated probability that the pooled odds ratiois equal to one.

If any cell count in a table iszero then a continuity correction is applied to each cell in that table – ifyou have selected the ‘delay continuity correction’ optionthen no continuity correction is applied to the Mantel-Haenszelcalculation unless all of the ‘a(chǎn)’ cells or all of the ‘d’ cells are zero acrossthe studies. The type of continuity correction used is set in the options.

An exact conditional likelihoodmethod is optionally used to evaluate the pooled odds ratio (Martin and Austin,2000). The exact method may take an appreciable time to compute with largenumbers. The exact results should be used in preference to the Mantel-Haenszel approximation, especially if some categoriesinvolve few observations (less than 15 or so).

The inconsistency of results acrossstudies is summarised in the I² statistic, which isthe percentage of variation across studies that is due to heterogeneity ratherthan chance – see the heterogeneitysection for more information.

Note that the results from StatsDirect may differ slightly from other software or fromthose quoted in papers; this is due to differences in the variance formulae. StatsDirect employs the most robust practical approaches tovariance according to accepted statistical literature.

DATA INPUT:

Observed frequencies may beentered in a workbook (see example in relative riskmeta-analysis) or directly via the screen as multiple fourfold tables:

	feature present	feature absent
outcome positive	a	b
outcome negative	c	d

Example

From Armitage and Berry(1994, p. 516).

The following data compare thesmoking status of lung cancer patients with controls. Ten different studies arecombined in an attempt to improve the overall estimate of relative risk. Thematching of controls has been ignored because there was not enough informationabout matching from each study to be sure that the matching was the same ineach study.

Lung cancer	Controls
smoker	non-smoker	smoker	non-smoker
83	3	72	14
90	3	227	43
129	7	81	19
412	32	299	131
1350	7	1296	61
60	3	106	27
459	18	534	81
499	19	462	56
451	39	1729	636
260	5	259	28

To analysethese data in StatsDirect you may select the Mantel-Haenszel function from the chi-square section of theanalysis menu. Select the default 95% confidence interval. Enter the number oftables as 10. Then enter each row of the table above as a separate 2 by 2contingency table:

i.e. The first row is entered as:

	Smkr	Non
Lung cancer	83	3
Control	72	14

... thisis then repeated for each of the ten rows.

For this example:

Fixedeffects (Mantel-Haenszel, Robins-Breslow-Greenland)

Pooledodds ratio = 4.681639 (95% CI = 3.865935 to 5.669455)

Chi²(test odds ratio differs from 1) = 292.379352 P < 0.0001

Fixedeffects (conditional maximum likelihood)

Pooledodds ratio = 4.713244

ExactFisher 95% CI = 3.888241 to 5.747141

ExactFisher one sided P < 0.0001, two sided P < 0.0001

Exactmid-P 95% CI = 3.904839 to 5.719768

Exactmid-P one sided P < 0.0001, two sided P < 0.0001

Non-combinabilityof studies

Breslow-Day = 6.766765 (df = 9) P = 0.6614

CochranQ = 6.641235 (df = 9) P =0.6744

Moment-basedestimate of between studies variance = 0

I²(inconsistency) = 0% (95% CI = 0% to 52.7%)

Randomeffects (DerSimonian-Laird)

Pooledodds ratio = 4.625084 (95% CI = 3.821652 to 5.597423)

Chi²(test odds ratio differs from 1) = 247.466729 (df = 1) P < 0.0001

Biasindicators

Begg-Mazumdar: Kendall's tau = 0.111111 P = 0.7275 (low power)

Egger:bias = 0.476675 (95% CI = -0.786168 to 1.739517) P = 0.4094

Horbold-Egger: bias = 0.805788 (92.5% CI = -0.686033to 2.297609) P = 0.3013

Here we can say with 95%confidence that the true population odds in favour ofbeing a smoker were between 3.9 and 5.7 times greater in patients who had lungcancer compared with controls.

P values

confidenceintervals

Download a free 10 day StatsDirect trial

Randomization

Menu location: Analysis_Miscellaneous_Randomization.

This section provides randomallocations for randomized study designs:

·Seriesx to y

·Intervention-controlpairs

·Two independent groups

·Blockrandomization to k treatments

·Preferenceallocation (menu item only shows with workbook open)

A good quality pseudo-randomnumber generator is used to randomize series of numbers for each of the typesof allocation. The random number generator is reseeded each time it is used,therefore, there is extremely little risk using the same (pseudo-)random numberseries for different randomizations unless you specify the same seed for therandom number generator. For technical information on the random numbergenerator used here please see random numbergenerator.

Another section of StatsDirect generates random deviates from differentprobability distributions (uniform, normal, gamma etc.), see randomnumbers.

Download a free 10 day StatsDirect trial

Crosstabs

Menu location: Analysis_Crosstabs.

This a two orthree way cross tabulation function. If you havetwo columns of numbers that correspond to different classifications of the sameindividuals then you can use this function to give a two way frequency tablefor the cross classification. This can be stratified by a third classificationvariable.

For two way crosstabs, StatsDirect offers a range of analyses appropriate to thedimensions of the contingency table. For more information see chi-squaretests and exacttests.

For three way crosstabs, StatsDirect offers either odds ratio(for case-control studies) or relative risk(for cohort studies) meta-analyses for 2 by 2 by k tables, and generalisedCochran-Mantel-Haenszel tests for r by c by k tables.

Example

A database of test scorescontains two fields of interest, sex (M=1, F=0) and grade of skin reaction toan antigen (none = 0, weak + = 1, strong + = 2). Here is a list of those fieldsfor 10 patients:

Sex	Reaction
0	0
1	1
1	2
0	2
1	2
0	1
0	0
0	1
1	2
1	0

In order to get a crosstabulation of these from StatsDirect you should enterthese data in two workbook columns. Then choose crosstabs from the analysismenu.

For this example:

		Reaction
		0	1	2
Sex	0	2	2	1
	1	1	1	3

We could then proceed to an r byc (2 by 3) contingencytable analysis to look for association between sex and reaction to thisantigen:

Contingency table analysis

Observed	2	2	1	5
% of row	40%	40%	20%
% of col	66.67%	66.67%	25%	50%

Observed	1	1	3	5
% of row	20%	20%	60%
% of col	33.33%	33.33%	75%	50%

Total	3	3	4	10
% of n	30%	30%	40%