Subgroup analyses in randomised controlled trials: quantifying the risks of false-positives and false-negatives

Brookes S T, Whitley E, Peters T J, Mulheran P A, Egger M, Davey Smith G
Record ID 32001000972
Authors' objectives:

Objectives: - To quantify the extent to which subgroup analyses may be misleading. - To compare the relative merits and weaknesses of the two most common approaches to subgroup analysis: separate (subgroup-specific) analyses of treatment effect and formal statistical tests of interaction. - To establish what factors affect the performance of the two approaches. - To provide estimates of the increase in sample size required to detect differential subgroup effects. - To provide recommendations on the analysis and interpretation of subgroup analyses.

Authors' results and conclusions: While there was some variation for smaller sample sizes, the results for the three types of outcome were very similar for simulations with a total sample size of >= 200. With simulated simplest case data with no differential subgroup effects, the formal tests of interaction were significant in 5% of cases as expected, while subgroup-specific tests were less reliable and identified effects in 766% of cases depending on whether there was an overall treatment effect. The most common type of subgroup effect identified in this way was where the treatment effect was seen to be significant in one subgroup only. When a simulated differential subgroup effect was included, the results were dependent on the nominal power of the simulated data and the type and magnitude of the subgroup effect. However, the performance of the formal interaction test was generally superior to that of the subgroup-specific analyses, with more differential effects correctly identified. In addition, the subgroup-specific analyses often suggested the wrong type of differential effect. The ability of formal interaction tests to (correctly) identify subgroup effects improved as the size of the interaction increased relative to the overall treatment effect. When the size of the interaction was twice the overall effect or greater, the interaction tests had at least the same power as the overall treatment effect. However, power was considerably reduced for smaller interactions, which are much more likely in practice. The inflation factor required to increase the sample size to enable detection of the interaction with the same power as the overall effect varied with the size of the interaction. For an interaction of the same magnitude as the overall effect, the inflation factor was 4, and this increased dramatically to Formal interaction tests were generally robust to alterations in the number and size of the treatment and subgroups and, for continuous data, the variance in the treatment groups, with the only exception being a change in the variance in one of the sugroups. In contrast, the performance of the subgroup-specific tests was affected by almost all of these factors with only a change in the number of treatment groups having no impact at all.
Authors' recommendations: While it is generally recognised that subgroup analyses can produce spurious results, the extent of the problem is almost certainly under-estimated. This is particularly true when subgroup-specific analyses are used. In addition, the increase in sample size required to identify differential subgroup effects may be substantial and the commonly used rule of four may not always be sufficient, especially when interactions are relatively subtle, as is often the case.
Authors' methods: Data simulation
Project Status: Completed
URL for project:
Year Published: 2001
English language abstract: An English language summary is available
Publication Type: Not Assigned
Country: England, United Kingdom
MeSH Terms
  • False Negative Reactions
  • False Positive Reactions
  • Randomized Controlled Trials as Topic
Organisation Name: NIHR Health Technology Assessment programme
Contact Address: NIHR Journals Library, National Institute for Health and Care Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK
Contact Name:
Contact Email:
Copyright: 2009 Queen's Printer and Controller of HMSO
This is a bibliographic record of a published health technology assessment from a member of INAHTA or other HTA producer. No evaluation of the quality of this assessment has been made for the HTA database.