Skip to contents

Derives a new variable according to a specified function of synthesised data.

Usage

syn.passive(data, func)

Arguments

data

a data frame with synthesised data.

func

a formula specifying transformations on data. It is specified as a string starting with ~.

Details

Any function of the synthesised data can be specified. Note that several operators such as +, -, * and ^ have different meanings in formula syntax. Use the identity function I() if they should be interpreted as arithmetic operators, e.g. "~I(age^2)". Function syn() checks whether the passive assignment is correct in the original data and fails with a warning if this is not true. The variables synthesised passively can be used to predict later variables in the synthesis except when they are numeric variables with missing data. A warning is produced in this last case.

Value

A list with two components:

res

a vector of length k including the result of applying the formula.

fit

a name of the method used for synthesis ("passive").

References

Van Buuren, S. and Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. doi:10.18637/jss.v045.i03

Author

Gillian Raab, 2021 based on Stef van Buuren, Karin Groothuis-Oudshoorn, 2000

See also

Examples

### the examples shows how inconsistencies in the SD2011 data are picked up
### by syn.passive()
ods <- SD2011[, c("height", "weight", "bmi", "age", "agegr")]
ods$hsq <- ods$height^2
ods$sex <- SD2011$sex
meth <- c("cart", "cart", "~I(weight / height^2 * 10000)",
          "cart", "~I(cut(age, c(15, 24, 34, 44, 59, 64, 120)))",
          "~I(height^2)", "logreg")

if (FALSE) {
### fails for bmi
s1 <- syn(ods, method = meth, seed = 6756, models = TRUE)

### fails for agegr
ods$bmi <- ods$weight / ods$height^2 * 10000
s2 <- syn(ods, method = meth, seed = 6756, models = TRUE)

### fails because of wrong order
ods$agegr <- cut(ods$age, c(15, 24, 34, 44, 59, 64, 120))
s3 <- syn(ods, method = meth, visit.sequence = 7:1,
          seed = 6756, models = TRUE)
}

### runs without errors
ods$bmi   <- ods$weight / ods$height^2 * 10000
ods$agegr <- cut(ods$age, c(15, 24, 34, 44, 59, 64, 120))
s4 <- syn(ods, method = meth, seed = 6756, models = TRUE)
#> 
#> Variable bmi with passive synthesis has missing values
#> so it will not be used to predict other variables.
#> 
#> Variable hsq with passive synthesis has missing values
#> so it will not be used to predict other variables.
#> 
#> Method "cart" is not valid for a variable without predictors (height)
#> Method has been changed to "sample"
#> 
#> 
#> Synthesis
#> -----------
#>  height weight bmi age agegr hsq sex
### bmi and hsq do not predict sex because of missing values
s4$models$sex
#> 
#> Call:
#> NULL
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#>               0.428892   0.042636  10.059  < 2e-16 ***
#> age          -0.024160   0.010212  -2.366 0.017991 *  
#> height.0     -0.269075   0.008603 -31.276  < 2e-16 ***
#> weight.0     -0.044899   0.003669 -12.237  < 2e-16 ***
#> agegr.1       0.495299   0.188263   2.631 0.008516 ** 
#> agegr.2       0.758444   0.251910   3.011 0.002606 ** 
#> agegr.3       0.371103   0.360359   1.030 0.303097    
#> agegr.4       0.305319   0.460557   0.663 0.507372    
#> agegr.5       0.415576   0.568737   0.731 0.464963    
#> height.NA.1 -46.366887   1.674739 -27.686  < 2e-16 ***
#> weight.NA.1  -2.718794   0.763433  -3.561 0.000369 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 6865.8  on 5039  degrees of freedom
#> Residual deviance: 3592.1  on 5029  degrees of freedom
#> AIC: 3596.8
#> 
#> Number of Fisher Scoring iterations: 6
#> 

### hsq with no missing values used to predict sex
ods2 <- ods[!is.na(ods$height),]
s5 <- syn(ods2, method = meth, seed = 6756, models = TRUE)
#> 
#> Variable bmi with passive synthesis has missing values
#> so it will not be used to predict other variables.
#> 
#> Method "cart" is not valid for a variable without predictors (height)
#> Method has been changed to "sample"
#> 
#> 
#> Synthesis
#> -----------
#>  height weight bmi age agegr hsq sex
s5$models$sex
#> 
#> Call:
#> NULL
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#>              0.1524028  0.0468528   3.253  0.00114 ** 
#> height       1.5071387  0.1231117  12.242  < 2e-16 ***
#> age         -0.0258619  0.0101953  -2.537  0.01119 *  
#> hsq         -0.0052908  0.0003767 -14.044  < 2e-16 ***
#> weight.0    -0.0434550  0.0036847 -11.793  < 2e-16 ***
#> agegr.1      0.4968707  0.1963467   2.531  0.01139 *  
#> agegr.2      0.7103598  0.2565054   2.769  0.00562 ** 
#> agegr.3      0.3307380  0.3631623   0.911  0.36244    
#> agegr.4      0.3029624  0.4621868   0.655  0.51215    
#> agegr.5      0.4038381  0.5707149   0.708  0.47919    
#> weight.NA.1 -2.4363326  0.8099956  -3.008  0.00263 ** 
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 6815.9  on 5004  degrees of freedom
#> Residual deviance: 3463.6  on 4994  degrees of freedom
#> AIC: 3456.6
#> 
#> Number of Fisher Scoring iterations: 6
#> 

### agegr with missing values used to predict sex because not numeric
ods3 <- ods
ods3$age[1:4] <- NA
ods3$agegr <- cut(ods3$age, c(15, 24, 34, 44, 59, 64, 120))
s6 <- syn(ods3, method = meth, seed = 6756, models = TRUE)
#> 
#> Variable bmi with passive synthesis has missing values
#> so it will not be used to predict other variables.
#> 
#> Variable hsq with passive synthesis has missing values
#> so it will not be used to predict other variables.
#> 
#> Method "cart" is not valid for a variable without predictors (height)
#> Method has been changed to "sample"
#> 
#> 
#> Synthesis
#> -----------
#>  height weight bmi age agegr hsq sex
s6$models$sex
#> 
#> Call:
#> NULL
#> 
#> Coefficients:
#>               Estimate Std. Error z value Pr(>|z|)    
#>               0.429533   0.042657  10.069  < 2e-16 ***
#> height.0     -0.269177   0.008604 -31.285  < 2e-16 ***
#> weight.0     -0.044978   0.003671 -12.254  < 2e-16 ***
#> age.0        -0.024540   0.010217  -2.402 0.016309 *  
#> agegr.1       0.499779   0.188350   2.653 0.007967 ** 
#> agegr.2       0.766891   0.252030   3.043 0.002343 ** 
#> agegr.3       0.379872   0.360472   1.054 0.291966    
#> agegr.4       0.322277   0.460761   0.699 0.484274    
#> agegr.5       0.435642   0.568974   0.766 0.443878    
#> agegr.6       2.010719   2.251562   0.893 0.371840    
#> height.NA.1 -46.384462   1.674914 -27.694  < 2e-16 ***
#> weight.NA.1  -2.722389   0.763576  -3.565 0.000363 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 6867.2  on 5043  degrees of freedom
#> Residual deviance: 3591.2  on 5032  degrees of freedom
#> AIC: 3596.5
#> 
#> Number of Fisher Scoring iterations: 6
#>