Feb-2005Feb-2005 [email protected]@uab.es 11
CPMP/EWP/1776/99: CPMP/EWP/1776/99:
PtC on Missing DataPtC on Missing Data
Feb-2005Feb-2005 [email protected]@uab.es 22
Evolución de los sujetosEvolución de los sujetos
PACVisita
SelecciónVisitaBasal RND
Toma de 1a.Medicación Visita 1 Visita 2 Visita 3 Visita 4
1 X X A X X X X X
2 X X A
3 X X B X X X
4 X X A X
5 X X B X X X X/A @ X/A @
6 X X B X X
7 X X A X X X X
8 X
9 X X B X X X X X
10 X X B X X/ERR. X X X
Feb-2005Feb-2005 [email protected]@uab.es 33
Datos faltantes (missing data)Datos faltantes (missing data)(1)(1)
¿Qué son los datos faltantes? ¡¡¡¡¡ Casillas vacías ¿Qué son los datos faltantes? ¡¡¡¡¡ Casillas vacías en los CRDs!!! en los CRDs!!!
Viola el principio de la estricto principio de la ITTViola el principio de la estricto principio de la ITT La posibles causas son, por ejemplo :La posibles causas son, por ejemplo :
– Pérdida de seguimientoPérdida de seguimiento– Fracaso o éxito terapéuticoFracaso o éxito terapéutico– Acontecimiento adversoAcontecimiento adverso– Traslado del sujetoTraslado del sujeto
No todas las razones de abandono están No todas las razones de abandono están relacionadas con el tratamientorelacionadas con el tratamiento
Feb-2005Feb-2005 [email protected]@uab.es 44
Datos faltantes (missing data) Datos faltantes (missing data) (2)(2)
Afectando a :Afectando a :– Solo un datoSolo un dato– Varios datos en una visitaVarios datos en una visita– Toda una visitaToda una visita– Varias visitasVarias visitas– Toda una variableToda una variable– Todas las visitas tras la inclusiónTodas las visitas tras la inclusión
Feb-2005Feb-2005 [email protected]@uab.es 55
Datos faltantes (missing data) Datos faltantes (missing data) (3)(3)
Por qué son un problema? Potencial fuente Por qué son un problema? Potencial fuente de sesgos en el análisisde sesgos en el análisis
– Tanto mayor cuanto mayor la proporción de datos Tanto mayor cuanto mayor la proporción de datos
afectadosafectados– Tanto más sesgo cuanto menos aleatoriosTanto más sesgo cuanto menos aleatorios– Tanta más interferencia cuanto más relacionados con el Tanta más interferencia cuanto más relacionados con el
tratamientotratamiento– Impide la ITTImpide la ITT
Feb-2005Feb-2005 [email protected]@uab.es 77
Ejemplo: Descripción de poblaciones (1)Ejemplo: Descripción de poblaciones (1)Distribución de pacientes :
All-randomizedAll-randomized
Patients with a Patients with a randomization randomization codecode
1208 1208 (100%)(100%)
SafetySafety
Receiving Any Study Receiving Any Study MedicationMedication
1190 (99%)1190 (99%)
Intent to treatIntent to treat
Receiving Study Receiving Study medication and a medication and a Baseline VABaseline VA
1186 (98%)1186 (98%)
Per-protocolPer-protocol
……and without a Major and without a Major Protocol ViolationProtocol Violation
1144 (95%)1144 (95%)
Per Protocol Week 54 Per Protocol Week 54 observedobserved
……and with a Week 54 and with a Week 54 VAVA
1055 (87%)1055 (87%)
Patients withdrawing before treatment
Patients without Baseline VA
No Major Protocol ViolationE.g., CataractE.g., Only a Baseline VA
Feb-2005Feb-2005 [email protected]@uab.es 88
Ejemplo 2: Incorrecto uso de poblaciones Ejemplo 2: Incorrecto uso de poblaciones (1)(1)
DiseñoDiseño Cirugía vs Tratamiento Médico en estenosis Cirugía vs Tratamiento Médico en estenosis
carotidea bilateral (Sackket et al., 1985)carotidea bilateral (Sackket et al., 1985) Variable principalVariable principal: Número de pacientes que : Número de pacientes que
presenten TIA, ACV o muertepresenten TIA, ACV o muerte Distribución de los pacientes:Distribución de los pacientes:
Pacientes randomizados:Pacientes randomizados: 167167 Tratamiento quirúrgico: Tratamiento quirúrgico: 94 94 Tratamiento médico:Tratamiento médico: 73 73
– Pacientes que no completaron el estudio Pacientes que no completaron el estudio debido a ACV en las fases iniciales de debido a ACV en las fases iniciales de hospitalización: hospitalización: Tratamiento quirúrgico:Tratamiento quirúrgico: 15 pacientes 15 pacientes Tratamiento médico:Tratamiento médico: 01 pacientes 01 pacientes
Feb-2005Feb-2005 [email protected]@uab.es 99
Ejemplo 2: Incorrecto uso de poblaciones Ejemplo 2: Incorrecto uso de poblaciones (2)(2)
Población Por Protocolo (PP):Población Por Protocolo (PP):
Pacientes que hayan completado el estudioPacientes que hayan completado el estudio
AnálisisAnálisis
– Tratamiento quirúrgico:Tratamiento quirúrgico: 43 / (94 - 15) = 43 / 79 = 54%43 / (94 - 15) = 43 / 79 = 54%
– Tratamiento médico:Tratamiento médico: 53 / (73 - 1) = 53 / 72 = 74%53 / (73 - 1) = 53 / 72 = 74%
– Reducción del riesgo:Reducción del riesgo:27%, p = 0.0227%, p = 0.02
Primer análisis que se realiza :
Feb-2005Feb-2005 [email protected]@uab.es 1010
Ejemplo 2: Incorrecto uso de poblaciones Ejemplo 2: Incorrecto uso de poblaciones (3)(3)
El análisis definitivo queda de la siguiente forma :
Población Intención de Tratar (ITT):Población Intención de Tratar (ITT):
Todos los pacientes randomizadosTodos los pacientes randomizados
AnálisisAnálisis– Tratamiento quirúrgico:Tratamiento quirúrgico: 58 / 94 = 62%58 / 94 = 62%– Tratamiento médico:Tratamiento médico: 54 / 73 = 74%54 / 73 = 74%– Reducción del riesgo:Reducción del riesgo:18%, p = 0.0918%, p = 0.09 (PP: 27%, p = (PP: 27%, p =
0.02)0.02)
Conclusiones: La población correcta de análisis es la ITT El tratamiento quirúrgico no ha demostrado ser significativamente superior al tratamiento médico
Feb-2005Feb-2005 [email protected]@uab.es 1111
Relación de los Relación de los valores faltantes convalores faltantes con
1) Tratamiento1) Tratamiento2) Resultado2) Resultado
Feb-2005Feb-2005 [email protected]@uab.es 1212
A B
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
Effi cacy
A B
X X
X X
X X
X X
X X
X X
X X
. .
. .
. .
Effi cacy
A B
X X
X X
X X
X X
X X
X X
X X
. X
. X
. .
Effi cacy
0%
2%
4%
6%
8%
10%
12%
14%
A B
Obs
MD
0%
10%
20%
30%
40%
50%
60%
A B
Obs
MD
0%
2%
4%
6%
8%
10%
12%
14%
A B
Obs
MD
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
A B
Obs
MD
Feb-2005Feb-2005 [email protected]@uab.es 1313
Trt Outc.Missing - -
S = 5,0% S = 12,0%
S = - S = - S = - S = -
MissingnessA B
Succes (M.D.)A B
Succes (Observed)A B
n % N %S 5 5,0% 12 12,0%F 95 95,0% 88 88,0%
100 100% 100 100%
% dif -7,0%OR 0,386RR 0,417
A B
A B
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
Effi cacy
0%
2%
4%
6%
8%
10%
12%
14%
A B
Obs
MD
Feb-2005Feb-2005 [email protected]@uab.es 1414
Trt Outc.Missing - -
S = 5,0% S = 12,0%
S = 30,0% S = 30,0% S = 5,0% S = 12,0%
B BA
Succes (Observed)
Succes (M.D.)MissingnessA
A B
n % N %S 3,5 5,0% 8,4 12,0%F 66,5 95,0% 61,6 88,0%
70 100% 70 100%
% dif -7,0%
OR 0,386
RR 0,417
A B
n % N %S 5 5,0% 12 12,0%F 95 95,0% 88 88,0%
100 100% 100 100%
% dif -7,0%OR 0,386RR 0,417
BA
A B
X X
X X
X X
X X
X X
X X
X X
. .
. .
. .
Effi cacy
0%
2%
4%
6%
8%
10%
12%
14%
A B
Obs
MD
Feb-2005Feb-2005 [email protected]@uab.es 1515
n % n %S 3.5 5.0% 10.8 12.0%F 66.5 95.0% 79.2 88.0%
70 100% 90 100%
% dif -7.0%OR 0.386RR 0.417
A B
Trt Outc.Missing si _
S = 5.0% S = 12.0%
S = 30.0% S = 10.0% S = 5.0% S = 12.0%
MissingnessA B
Succes (Observed)
Succes (M.D.)
BA
A B
n % N %
S 5 5,0% 12 12,0%F 95 95,0% 88 88,0%
100 100% 100 100%
% dif -7,0%OR 0,386RR 0,417
A B
A B
X X
X X
X X
X X
X X
X X
X X
. X
. X
. .
Effi cacy
0%
2%
4%
6%
8%
10%
12%
14%
A B
Obs
MD
Feb-2005Feb-2005 [email protected]@uab.es 1616
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
A B
Obs
MDA B
X X
X X
X X
X X
X X
X X
X X
. .
. .
. .
Effi cacy
Trt Outc.Missing - si
S = 5,0% S = 12,0%
S = 30,0% S = 30,0% S = 10,0% S = 17,0%
MissingnessA B
Succes (Observed)A B
Succes (M.D.)A B
n % N %S 3,5 5,0% 8,4 12,0%F 66,5 95,0% 61,6 88,0%
70 100% 70 100%
% dif -7,0%OR 0,386RR 0,417
A B
n % N %S 6,5 6,5% 13,5 13,5%F 93,5 93,5% 86,5 86,5%
100 100% 100 100%
% dif -7,0%OR 0,445RR 0,481
A B
Feb-2005Feb-2005 [email protected]@uab.es 1717
Trt Outc.Missing - si
S = 5,0% S = 12,0%
S = 30,0% S = 30,0% S = 50,0% S = 50,0%
Missingness
Succes (Observed)
Succes (M.D.)
A B
A B A B
n % N %S 3,5 5,0% 8,4 12,0%F 66,5 95,0% 61,6 88,0%
70 100% 70 100%
% dif -7,0%OR 0,386RR 0,417
A B
n % N %S 18,5 18,5% 23,4 23,4%F 81,5 81,5% 76,6 76,6%
100 100% 100 100%
% dif -4,9%OR 0,743RR 0,791
A B
A B
X X
X X
X X
X X
X X
X X
X X
. .
. .
. .
Effi cacy
0%
10%
20%
30%
40%
50%
60%
A B
Obs
MD
Feb-2005Feb-2005 [email protected]@uab.es 1818
n % N %S 3.5 5.0% 10.8 12.0%F 66.5 95.0% 79.2 88.0%
70 100% 90 100%
% dif -7.0%
OR 0.386RR 0.417
A B
Trt Outc.Missing si si
S = 5.0% S = 12.0%
S = 30.0% S = 10.0% S = 50.0% S = 50.0%
MissingnessA B
A
A B
BObserved
Succes (M.D.)
A B
X X
X X
X X
X X
X X
X X
X X
. X
. X
. .
Effi cacy
0%
10%
20%
30%
40%
50%
60%
A B
Obs
MD
n % N %S 18.5 18.5% 15.8 15.8%F 81.5 81.5% 84.2 84.2%
100 100% 100 100%
% dif 3%OR 1.210RR 1.171
A B
Feb-2005Feb-2005 [email protected]@uab.es 2020
MCARMCAR
– Missing completely at randomMissing completely at random
La probabilidad de obtener un missing es La probabilidad de obtener un missing es completamente independiente de:completamente independiente de:– Valores observados:Valores observados:
Variables basales, otras mediciones de la misma Variables basales, otras mediciones de la misma variable...variable...
– Valores no observados o missingValores no observados o missing
Ejemplo: Cambio de ubicación geográficaEjemplo: Cambio de ubicación geográfica
Feb-2005Feb-2005 [email protected]@uab.es 2121
MARMAR
– Missing at randomMissing at random
La probabilidad de obtener un La probabilidad de obtener un missing depende:missing depende:– Sí: Valores observados:Sí: Valores observados:– No: Valores no observados o missingNo: Valores no observados o missing
Ejemplo: Sujetos con peor puntuación basal Ejemplo: Sujetos con peor puntuación basal abandonan el estudio independientemente abandonan el estudio independientemente del resultadodel resultado
Feb-2005Feb-2005 [email protected]@uab.es 2222
Non-IgnorableNon-Ignorable
La probabilidad de obtener un La probabilidad de obtener un missing depende:missing depende:– Valores no observados o missingValores no observados o missing
– Ejemplo: malas o excelentes respuestas cursan Ejemplo: malas o excelentes respuestas cursan con una mayor tasa de abandonoscon una mayor tasa de abandonos
Feb-2005Feb-2005 [email protected]@uab.es 2323
Manejo de los Manejo de los valores faltantesvalores faltantes
Feb-2005Feb-2005 [email protected]@uab.es 2424
General StrategiesGeneral Strategies
Complete-case analysisComplete-case analysis ““Weigthing methods”Weigthing methods” Imputation methodsImputation methods Analysing data as incompleteAnalysing data as incomplete Other methodsOther methods
Feb-2005Feb-2005 [email protected]@uab.es 2525
Complete-case analysisComplete-case analysis Analyse only subjects with complete Analyse only subjects with complete
datadata– Restrict analysis to those subjects with no Restrict analysis to those subjects with no
missing data on variables of interest: missing data on variables of interest: – Also called ADO (Available Data Only)Also called ADO (Available Data Only)– Assumes in-complete cases are like Assumes in-complete cases are like
complete cases. complete cases. – Gives unbiased estimates if the reduced Gives unbiased estimates if the reduced
sample resulting from list-wise deletion is sample resulting from list-wise deletion is a random sub sample of the original a random sub sample of the original sample (MCAR). sample (MCAR).
Feb-2005Feb-2005 [email protected]@uab.es 2626
Complete-case analysisComplete-case analysis Disadvantages: Disadvantages:
– Ignores possible systematic differences Ignores possible systematic differences between complete cases and in-complete between complete cases and in-complete cases. cases.
– Loss of power. Loss of power. Standard Errors will Standard Errors will generally be larger in the reduced sample generally be larger in the reduced sample because less information is utilized. because less information is utilized.
– Get biased estimates if the reduced Get biased estimates if the reduced sample is NOT a random sub-sample of sample is NOT a random sub-sample of the original sample. the original sample.
– Against the ITT principleAgainst the ITT principle
Feb-2005Feb-2005 [email protected]@uab.es 2727
General StrategiesGeneral Strategies
Complete-case analysisComplete-case analysis ““Weigthing methods”Weigthing methods” Imputation methodsImputation methods Analysing data as incompleteAnalysing data as incomplete Other methodsOther methods
Feb-2005Feb-2005 [email protected]@uab.es 2828
““Weigthing methods”Weigthing methods”(Sometimes considered as a form of imputation)(Sometimes considered as a form of imputation)
To constuct weigths for incomplete To constuct weigths for incomplete cases:cases:– Each patient belongs to a subgroup in Each patient belongs to a subgroup in
which all subjects have the same which all subjects have the same characteristicscharacteristics
– A proportion within each subgroup are A proportion within each subgroup are destined to complete the studydestined to complete the study
Heyting el al.Heyting el al. Robins et al.Robins et al.
Feb-2005Feb-2005 [email protected]@uab.es 2929
General StrategiesGeneral Strategies
Complete-case analysisComplete-case analysis ““Weigthing methods”Weigthing methods” Imputation methodsImputation methods Analysing data as incompleteAnalysing data as incomplete Other methodsOther methods
Feb-2005Feb-2005 [email protected]@uab.es 3030
Datos faltantes : métodos de tratamiento Datos faltantes : métodos de tratamiento (2)(2)
Paciente Visita basal Visita 1 Visita 2 Visita 3 Visita 4
0010 75 72 60 55
0005 76 78
0101 80 70 66
0201 81 75 75 78 80
0060 78
RandomizaciónInicio del tratamiento
Sujetos con valores missing en la variable de eficacia
Feb-2005Feb-2005 [email protected]@uab.es 3131
Datos faltantes : métodos de tratamiento Datos faltantes : métodos de tratamiento (3)(3)
Se aplica el método LOCF (Last Observation Carried Forward)
Paciente Visita basal Visita 1 Visita 2 Visita 3 Visita 4
0010 75 72 72 60 55
0005 76 78 78 78 78
0101 80 70 66
0201 81 75 75 78 80
0060 78 Excluido de las poblaciones ITT y PP
RandomizaciónInicio del tratamiento
Feb-2005Feb-2005 [email protected]@uab.es 3232
Datos faltantes : métodos de tratamiento Datos faltantes : métodos de tratamiento (4)(4)
Se aplica el método BOCF (Basal Observation Carried Forward)
Paciente Visita basal Visita 1 Visita 2 Visita 3 Visita 4
0010 75 72 72 60 55
0005 76 78 78 78 78
0101 80 80 80 70 66
0201 81 75 75 78 80
0060 78 78 78 78 78
RandomizaciónInicio del tratamiento
Feb-2005Feb-2005 [email protected]@uab.es 3333
Ex: LOCF & lineal extrapolation lineal
36
32
28
24-
20
16
12
8
4
0 2 4 6 8 10 12 14 16 18 Time (months)
LOCF
Lineal Regresion
Bias
Ad
as-
Cog
> Worse
< Better
Feb-2005Feb-2005 [email protected]@uab.es 3434
Ex: Early drop-out due to AE
Ad
as-
Cog
36
32
28
24-
20
16
12
8
4
0 2 4 6 8 10 12 14 16 18 Time (months)
Placebo
Active
> Worse
< Better
Bias:
Favours
Active
Feb-2005Feb-2005 [email protected]@uab.es 3535
Ex: Early drop-out due to lack of Efficacy
Ad
as-
Cog
36
32
28
24-
20
16
12
8
4
0 2 4 6 8 10 12 14 16 18 Time (months)
Placebo
Active
> Worse
< Better
Bias:
Favours
Placebo
Feb-2005Feb-2005 [email protected]@uab.es 3636
Feb-2005Feb-2005 [email protected]@uab.es 3737
RND
B
Baseline Last Visit
≠ Frecuencies
A
Drop-outs and missing dataDrop-outs and missing data
A A A A A AB B A
Visit 2Visit 1
A
Feb-2005Feb-2005 [email protected]@uab.es 3838
RND
Baseline Last Visit
≠ Timing
A
Drop-outs and missing dataDrop-outs and missing data
A A A A B B
Visit 2Visit 1
B B B
Feb-2005Feb-2005 [email protected]@uab.es 3939
Imputation methodsImputation methods
LOCF and variantsLOCF and variants– Bias: Bias:
depending on the amount and timing of drop-outs:depending on the amount and timing of drop-outs: Ej: The conditions under study has a worsening Ej: The conditions under study has a worsening
coursecourse– Conservative: Conservative:
Drop-outs beacuse of lack of efficacy in the control Drop-outs beacuse of lack of efficacy in the control groupgroup
– Anticonservative:Anticonservative: Drop-outs beacuse of intolerance in the test groupDrop-outs beacuse of intolerance in the test group
– Otros: interpolación, extrapolaciónOtros: interpolación, extrapolación
Feb-2005Feb-2005 [email protected]@uab.es 4040
Adas-Cog36
32
28
24-
20
16
12
8
4
0 2 4 6 8 10 12 14 16 18
Time month
Ejemplo: falta el resultado de Adas-cog en alguno de los tiempos
Imputación por regresión
Feb-2005Feb-2005 [email protected]@uab.es 4141
Imputation methodsImputation methods
Worst case analysis:Worst case analysis:– Impute:Impute:
The worst response to the testThe worst response to the testThe best response to the controlThe best response to the control
– Ultraconservative. Increases the variability.Ultraconservative. Increases the variability.– Robustness of results:Robustness of results:
Second approach: “Sensitivity analysis”Second approach: “Sensitivity analysis” Lower bound of efficacyLower bound of efficacy
Feb-2005Feb-2005 [email protected]@uab.es 4242
Group MeansGroup Means
CContinuous variableontinuous variable:: – group mean derived from a grouping group mean derived from a grouping
variablevariable
CCategoricalategorical – ordinal variable: – ordinal variable:– ModeMode– If If no unique modeno unique mode::
– NominalNominal: : a value will be randomly selecteda value will be randomly selected– OOrdinalrdinal: : the ‘middle’ category or a value is the ‘middle’ category or a value is
randomly chosen from the middle tworandomly chosen from the middle two (even case) (even case)
Feb-2005Feb-2005 [email protected]@uab.es 4343
Predicted MeanPredicted Mean
CContinuous or ordinalontinuous or ordinal variables: variables:LLeast-squares multiple regression algorithm east-squares multiple regression algorithm
to impute the most likely value to impute the most likely value
BBinary or categorical variableinary or categorical variable::a discriminant method is applied to impute a discriminant method is applied to impute
the most likely value.the most likely value.
Feb-2005Feb-2005 [email protected]@uab.es 4444
Imputation Class methodsImputation Class methods Imputed values from responders that are Imputed values from responders that are
similar with respect to a set of auxiliary similar with respect to a set of auxiliary variables.variables.– Clinical experienceClinical experience– Statistical methods: Statistical methods: Hot-DeckingHot-Decking
RRespondents and non-respondents espondents and non-respondents are sorted are sorted into a into a number of imputation subsets according to a user-number of imputation subsets according to a user-specified set of covariates. specified set of covariates.
An imputation sub-set comprises cases with the same An imputation sub-set comprises cases with the same values as those of the user-specified covariates. values as those of the user-specified covariates.
Missing values are then replaced with values taken Missing values are then replaced with values taken from matching respondents. from matching respondents.
– Options:Options: The first respondent’s valueThe first respondent’s value (similar in time) (similar in time) A respondent’s randomly selectedA respondent’s randomly selected valuevalue
Feb-2005Feb-2005 [email protected]@uab.es 4545
Some problems in Single Some problems in Single ImputationImputation
Mean EstimationMean Estimation– Replace missing data with the mean of non-missing values.Replace missing data with the mean of non-missing values.– Standard deviation and standard errors are underestimated Standard deviation and standard errors are underestimated
(no variation in the imputed values).(no variation in the imputed values). Hot-deck Imputation Hot-deck Imputation
– Stratify and sort by key covariates, replace missing data Stratify and sort by key covariates, replace missing data from another record in the same strata. from another record in the same strata.
– Underestimation of standard errors can be a problem.Underestimation of standard errors can be a problem. Predict missing values from Regression Predict missing values from Regression
– Impute each independent variable on the basis of other Impute each independent variable on the basis of other independent variables in model.independent variables in model.
– Produces biased estimates.Produces biased estimates.
Disadvantage: Disadvantage: – In general, Single Imputation results in the sample size being In general, Single Imputation results in the sample size being
over-estimated with the variance and standard errors being over-estimated with the variance and standard errors being underestimated. underestimated.
Feb-2005Feb-2005 [email protected]@uab.es 5454
Multiple ImputationMultiple Imputation
Requires Missing At Random (MAR) or Missing Requires Missing At Random (MAR) or Missing Completely At Random (MCAR) Assumption. Completely At Random (MCAR) Assumption.
Combine results from repeated single imputations. Combine results from repeated single imputations.
Feb-2005Feb-2005 [email protected]@uab.es 5555
Multiple ImputationMultiple Imputation
RReplaces each missing value in the eplaces each missing value in the dataset with several imputed values dataset with several imputed values instead of just one. instead of just one. Rubin 1970's Rubin 1970's
Steps:Steps:Use complete data to estimateUse complete data to estimateCombine the estimators Combine the estimators (i.e. Regresion (i.e. Regresion
coefficients)coefficients) to compute predicted values to compute predicted valuesRandomly simulate a set of residuals to be Randomly simulate a set of residuals to be
added to the regression to impute m valuesadded to the regression to impute m values
Feb-2005Feb-2005 [email protected]@uab.es 5656
MI: Assumptions MI: Assumptions (2)(2)
The data model:The data model:– Probability model on observed dataProbability model on observed data
– Multivariate normal, loglinear ...Multivariate normal, loglinear ...Prediction of the missing dataPrediction of the missing data
The distributionThe distributionSpecification of the distribution for the Specification of the distribution for the
parameters of the imputation modelsparameters of the imputation models– Use likelihood / bayesian techniques for analysisUse likelihood / bayesian techniques for analysis
Noninformative prior distributionNoninformative prior distribution
The mechanism of nonresponseThe mechanism of nonresponse
Feb-2005Feb-2005 [email protected]@uab.es 5757
Multiple ImputationMultiple Imputation
S-PLUSS-PLUS SOLASSOLAS Gary King: Gary King:
AmeliaAmelia
Joe Schafer: Joe Schafer: webweb SoftSoft TheThe multiplemultiple imputationimputation FAQ FAQ pagepage
Feb-2005Feb-2005 [email protected]@uab.es 6767
General StrategiesGeneral Strategies
Complete-case analysisComplete-case analysis ““Weigthing methods”Weigthing methods” Imputation methodsImputation methods Analysing data as incompleteAnalysing data as incomplete Other methodsOther methods
Feb-2005Feb-2005 [email protected]@uab.es 6868
Analysing data as Analysing data as incompleteincomplete
Time to event variablesTime to event variables Mixed models (random-fixed)Mixed models (random-fixed)
Feb-2005Feb-2005 [email protected]@uab.es 6969
General StrategiesGeneral Strategies
Complete-case analysisComplete-case analysis ““Weigthing methods”Weigthing methods” Imputation methodsImputation methods Analysing data as incompleteAnalysing data as incomplete Other methodsOther methods
Feb-2005Feb-2005 [email protected]@uab.es 7070
OtherOther
Gould 1980Gould 1980– Converts the variable into an ordinal score. Converts the variable into an ordinal score. – Impute according a pre-defined value (ej. Impute according a pre-defined value (ej.
percentile) and the time and cause of drop-out percentile) and the time and cause of drop-out (lack of efficacy, cure, adverse effects...)(lack of efficacy, cure, adverse effects...)
Miscelanea: Miscelanea: Missing data indicators, pairwise deletion...Missing data indicators, pairwise deletion...
Feb-2005Feb-2005 [email protected]@uab.es 7171
Missing Data in Missing Data in Clinical Trials –Clinical Trials –
A Regulatory ViewA Regulatory View
Feb-2005Feb-2005 [email protected]@uab.es 7272
ICH-E3,6,9ICH-E3,6,9
Key points:Key points:– Potential source of biasPotential source of bias– Common in Clinical TrialsCommon in Clinical Trials– Avoiding MDAvoiding MD– Importance of the methods of dealingImportance of the methods of dealing– Pre-specification, re-definitionPre-specification, re-definition– Lack of universally accepted method for Lack of universally accepted method for
handlinghandling– Sensitivity analysisSensitivity analysis– Identification and description of missingnessIdentification and description of missingness
Feb-2005Feb-2005 [email protected]@uab.es 7373
Points to Consider on Points to Consider on Biostatistical / Biostatistical /
Methodological issues Methodological issues arising from recent CPMP arising from recent CPMP
discussion on licensing discussion on licensing applicationsapplications
PtC on Missing DataPtC on Missing Data
Feb-2005Feb-2005 [email protected]@uab.es 7474
Feb-2005Feb-2005 [email protected]@uab.es 7575
StructureStructure
1.1. IntroductionIntroduction
2.2. The effect of MD on data analysisThe effect of MD on data analysis
3.3. Handling of MDHandling of MD
4.4. General recommendationsGeneral recommendations
Feb-2005Feb-2005 [email protected]@uab.es 7777
IntroductionIntroduction
Potential source of biasPotential source of bias
Many possible sources and different Many possible sources and different
degrees of incompletenessdegrees of incompleteness
MD violates the ITT principle:MD violates the ITT principle:– Full set analysis requires imputationFull set analysis requires imputation
The strategy employed might in itself The strategy employed might in itself
provide a source of biasprovide a source of bias
Feb-2005Feb-2005 [email protected]@uab.es 7878
The effect of missing values on data analysis and
interpretation
Feb-2005Feb-2005 [email protected]@uab.es 7979
Effect on data analysis Effect on data analysis (1)(1)
Power:Power:– Reduction of cases for analysis: Reduction of cases for analysis:
reduction of powerreduction of power
Variability:Variability:– Non-completers (greater likelihood of Non-completers (greater likelihood of
extreme values):extreme values):Their loss => underestimate of variabilityTheir loss => underestimate of variability
Feb-2005Feb-2005 [email protected]@uab.es 8080
Effect on data analysis Effect on data analysis (2)(2)
Bias:Bias: Estimation of treatment effectEstimation of treatment effect Comparability of treatment groupsComparability of treatment groups Representativeness of the sampleRepresentativeness of the sample
– TThe reduction of the statistical power is mainly related to he reduction of the statistical power is mainly related to the number of missing valuesthe number of missing values
– TThe risk of bias depends upon the relationship betweenhe risk of bias depends upon the relationship between:: MissingnessMissingness TTreatmentreatment OOutcomeutcome
Feb-2005Feb-2005 [email protected]@uab.es 8181
Effect on data analysis Effect on data analysis ((33))
NotNot expected to lead to bias: expected to lead to bias:– if if MDMD are are only relatedonly related to the to the
treatmenttreatment – (an observation is more likely to be missing on (an observation is more likely to be missing on
one treatment arm than another) one treatment arm than another)
– but but not not to the to the outcomeoutcome – real value of the unobserved measurement (poor real value of the unobserved measurement (poor
outcomes are no more likely to be missing than outcomes are no more likely to be missing than good outcomes).good outcomes).
Feb-2005Feb-2005 [email protected]@uab.es 8282
Effect on data analysis Effect on data analysis ((44))
Bias:Bias:– if if MD MD ((unmeasured observationunmeasured observations)s) areare
related to related to the real value of the the real value of the outcomeoutcome(e.g. the unobserved measurements have an (e.g. the unobserved measurements have an
higher proportion of poor outcomes)higher proportion of poor outcomes)– this will lead to bias even if the missing values are this will lead to bias even if the missing values are
not related to treatment (i.e. missing values are not related to treatment (i.e. missing values are equally likely in all treatment arms).equally likely in all treatment arms).
Feb-2005Feb-2005 [email protected]@uab.es 8383
Effect on data analysis Effect on data analysis ((55))
Bias:Bias:
– If If MDMD if they are if they are related to bothrelated to both the the treatmenttreatment and the unobserved and the unobserved outcomeoutcome variable variable (e.g. missing values are more likely in one (e.g. missing values are more likely in one
treatment arm because it is not as effective). treatment arm because it is not as effective).
Feb-2005Feb-2005 [email protected]@uab.es 8484
Effect on data analysis Effect on data analysis ((66))
Pragmatic approach:Pragmatic approach:– In most cases it is difficult or impossible to In most cases it is difficult or impossible to
elucidate whether the relationship elucidate whether the relationship between missing values and the between missing values and the unobserved outcome variable is unobserved outcome variable is completely absent. completely absent.
– Thus it is sensible to adopt a conservative Thus it is sensible to adopt a conservative approach, considering missing values as a approach, considering missing values as a potential source of bias.potential source of bias.
Feb-2005Feb-2005 [email protected]@uab.es 8686
Handling of MD Handling of MD (1)(1)
Avoidance of missingness:Avoidance of missingness:– In the design and conduct of a clinical trial all In the design and conduct of a clinical trial all
efforts should be directed towards minimising efforts should be directed towards minimising the amount of missing data likely to occur. the amount of missing data likely to occur.
– Despite these efforts some missing values will Despite these efforts some missing values will generally be expected. generally be expected.
The way these missing observations are The way these missing observations are handled may substantially affect the handled may substantially affect the conclusions of the study.conclusions of the study.
Feb-2005Feb-2005 [email protected]@uab.es 8787
Handling of MD Handling of MD ((22))
Complete case analysis:Complete case analysis:– Bias, power and variabilityBias, power and variability– Not generally appropriate. Exceptions:Not generally appropriate. Exceptions:
– Exploratory studiesExploratory studies, , especially in the initial phases especially in the initial phases of drug development.of drug development.
– Secondary supportive analysis in confirmatory Secondary supportive analysis in confirmatory trials (robustness)trials (robustness)
Violates the ITT principle.Violates the ITT principle. It cIt cannot be recommended as the primary annot be recommended as the primary
analysis in a confirmatory trialanalysis in a confirmatory trial
Feb-2005Feb-2005 [email protected]@uab.es 8888
Handling of MD Handling of MD ((33))
Imputation of Missing Data:Imputation of Missing Data:– Scope of imputation:Scope of imputation:
Not restricted to main outcomes: Not restricted to main outcomes: – (secondary efficacy, safety, baseline covariates...)(secondary efficacy, safety, baseline covariates...)
– Methods for imputation:Methods for imputation:Many techniquesMany techniquesNo gold standard for every situationNo gold standard for every situation
Feb-2005Feb-2005 [email protected]@uab.es 8989
Handling of MD Handling of MD ((44))
Methods for imputation Methods for imputation (cont)(cont)::– Not a description of the different Not a description of the different
methodsmethods– All methods may be valid:All methods may be valid:
Simple methods to more complex:Simple methods to more complex:– From LOCF to multiple imputation methodsFrom LOCF to multiple imputation methods
But their appropriateness has to be justifiedBut their appropriateness has to be justified– e.g.: LOCF: e.g.: LOCF: acceptable if measurements are acceptable if measurements are
expected to be relatively constant over time. expected to be relatively constant over time. In In Alzheimer’s disease where the patient’s Alzheimer’s disease where the patient’s
condition is expected to deteriorate over time, condition is expected to deteriorate over time, the the LOCF LOCF method is less acceptablemethod is less acceptable
Feb-2005Feb-2005 [email protected]@uab.es 9090
Handling of MD Handling of MD ((55))
Statistical approaches less sensitive to MDStatistical approaches less sensitive to MD::
– Mixed modelsMixed models– Survival modelsSurvival models
They assume no relationship between treatment and They assume no relationship between treatment and the missing outcome, and generally this cannot be the missing outcome, and generally this cannot be assumed.assumed.
Feb-2005Feb-2005 [email protected]@uab.es 9292
General recommendations General recommendations (1)(1)
Avoidance of missing dataAvoidance of missing data– Try to reduce the number of MDTry to reduce the number of MD
Anticipate sources and try to avoid them in the Anticipate sources and try to avoid them in the designdesign
Strategies to obtain measurements Strategies to obtain measurements If large amount of MD is expected:If large amount of MD is expected:
– Relevance of blinding (assignment and evaluation)Relevance of blinding (assignment and evaluation)Anticipation of the “acceptable amount of MD”Anticipation of the “acceptable amount of MD”
– Sample sizeSample size
Feb-2005Feb-2005 [email protected]@uab.es 9393
General recommendations General recommendations (2)(2)
Avoidance of missing data Avoidance of missing data (cont)(cont)
– ““Acceptable amount” of MD:Acceptable amount” of MD:Not general rule, depends onNot general rule, depends on
– Nature of variableNature of variable Mortality vs sophisticated methods of diagnosisMortality vs sophisticated methods of diagnosis
– Length of the clinical trialLength of the clinical trial– Condition under studyCondition under study
Psychiatric disorders: low adherence of patients Psychiatric disorders: low adherence of patients to sto sttudy protocoludy protocol
Feb-2005Feb-2005 [email protected]@uab.es 9494
General recommendations General recommendations (3)(3)
Avoidance of missing data Avoidance of missing data (cont)(cont)
– Continue data collection after patient Continue data collection after patient withdrawalwithdrawal ITT based on real dataITT based on real data
– AlternativesAlternatives Analysis on incomplete dataAnalysis on incomplete data
oror Analysis on imputed dataAnalysis on imputed data
Feb-2005Feb-2005 [email protected]@uab.es 9595
General recommendations General recommendations (4)(4)
Design of the study. Design of the study. Relevance of predefinitionRelevance of predefinition
– Pre-specify in the protocol:Pre-specify in the protocol:Description and justification of the methodDescription and justification of the methodAnticipation of the expected amount of MDAnticipation of the expected amount of MD
– Deviations documented and justifiedDeviations documented and justifiedConservative:Conservative:
– To avoid:To avoid: mminimisation of differences in non-inferiority trials, inimisation of differences in non-inferiority trials,
overestimation in superiority trialsoverestimation in superiority trials
Feb-2005Feb-2005 [email protected]@uab.es 9696
General recommendations General recommendations (5)(5)
Design of the study. Design of the study. Relevance of predefinitionRelevance of predefinition (cont)(cont)
– Update:Update:– Unpredictability of some problemsUnpredictability of some problems
Statistical Analysis PlanStatistical Analysis PlanDuring the Blind ReviewDuring the Blind Review
– Deviation and amendments documented (traceability)Deviation and amendments documented (traceability)– Identification of the blindinIdentification of the blindingg
Feb-2005Feb-2005 [email protected]@uab.es 9797
General recommendations General recommendations ((66))
Analysis of missing dataAnalysis of missing data– Pattern of MD: time and proportion Pattern of MD: time and proportion
Investigate Investigate whether there is any indication whether there is any indication of of differdifferencesences between the treatment groups. between the treatment groups.
– Elucidate if pElucidate if patients with and without missing atients with and without missing values have different characteristics at baselinevalues have different characteristics at baseline..
This might help to establish:This might help to establish:– whether the missing values have lead towhether the missing values have lead to baseline baseline
imbalanceimbalance, and , and – whether the process generating missing values has whether the process generating missing values has
differentially influenced the treatment groups. differentially influenced the treatment groups.
Feb-2005Feb-2005 [email protected]@uab.es 9898
General recommendations General recommendations ((77))
Sensitivity analysisSensitivity analysis a set of analyses showing the influence of different a set of analyses showing the influence of different
methods of handling missing data on the study resultsmethods of handling missing data on the study results
– Some examples:Some examples: Imputation of Best plausible vs Worst plausibleImputation of Best plausible vs Worst plausible Best possible in control and Worst possible in experimental Best possible in control and Worst possible in experimental
and inverselyand inversely Full set analysis vs complete case analysisFull set analysis vs complete case analysis
– Pre-defined and designed to assess the Pre-defined and designed to assess the repercussion on the results of the particular repercussion on the results of the particular assumptions made in imputationassumptions made in imputation
Feb-2005Feb-2005 [email protected]@uab.es 9999
General recommendations General recommendations ((88))
Final ReportFinal Report– Detailed description of the planned and Detailed description of the planned and
amendments of the predefined methods amendments of the predefined methods
– Discussion of the MD:Discussion of the MD: Number, Time & PatternNumber, Time & Pattern Possible implications in efficacy and safetyPossible implications in efficacy and safety
– Imputed values must be listed and identifiedImputed values must be listed and identified
– A sensitivity analysis may give robustness to the A sensitivity analysis may give robustness to the conclusionsconclusions