Passive synthesis
syn.passive.Rd
Derives a new variable according to a specified function of synthesised data.
Arguments
- data
a data frame with synthesised data.
- func
a
formula
specifying transformations on data. It is specified as a string starting with~
.
Details
Any function of the synthesised data can be specified. Note that several operators such as
+
, -
, *
and ^
have different meanings in formula
syntax.
Use the identity function I()
if they should be interpreted as arithmetic operators,
e.g. "~I(age^2)"
.
Function syn()
checks whether the passive assignment is correct in the original data
and fails with a warning if this is not true. The variables synthesised passively can be
used to predict later variables in the synthesis except when they are numeric variables
with missing data. A warning is produced in this last case.
Value
A list with two components:
- res
a vector of length
k
including the result of applying theformula
.- fit
a name of the method used for synthesis (
"passive"
).
References
Van Buuren, S. and Groothuis-Oudshoorn, K. (2011).
mice
: Multivariate Imputation by Chained Equations
in R
. Journal of Statistical Software,
45(3), 1-67. doi:10.18637/jss.v045.i03
Examples
### the examples shows how inconsistencies in the SD2011 data are picked up
### by syn.passive()
ods <- SD2011[, c("height", "weight", "bmi", "age", "agegr")]
ods$hsq <- ods$height^2
ods$sex <- SD2011$sex
meth <- c("cart", "cart", "~I(weight / height^2 * 10000)",
"cart", "~I(cut(age, c(15, 24, 34, 44, 59, 64, 120)))",
"~I(height^2)", "logreg")
if (FALSE) {
### fails for bmi
s1 <- syn(ods, method = meth, seed = 6756, models = TRUE)
### fails for agegr
ods$bmi <- ods$weight / ods$height^2 * 10000
s2 <- syn(ods, method = meth, seed = 6756, models = TRUE)
### fails because of wrong order
ods$agegr <- cut(ods$age, c(15, 24, 34, 44, 59, 64, 120))
s3 <- syn(ods, method = meth, visit.sequence = 7:1,
seed = 6756, models = TRUE)
}
### runs without errors
ods$bmi <- ods$weight / ods$height^2 * 10000
ods$agegr <- cut(ods$age, c(15, 24, 34, 44, 59, 64, 120))
s4 <- syn(ods, method = meth, seed = 6756, models = TRUE)
#>
#> Variable bmi with passive synthesis has missing values
#> so it will not be used to predict other variables.
#>
#> Variable hsq with passive synthesis has missing values
#> so it will not be used to predict other variables.
#>
#> Method "cart" is not valid for a variable without predictors (height)
#> Method has been changed to "sample"
#>
#>
#> Synthesis
#> -----------
#> height weight bmi age agegr hsq sex
### bmi and hsq do not predict sex because of missing values
s4$models$sex
#>
#> Call:
#> NULL
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> 0.428892 0.042636 10.059 < 2e-16 ***
#> age -0.024160 0.010212 -2.366 0.017991 *
#> height.0 -0.269075 0.008603 -31.276 < 2e-16 ***
#> weight.0 -0.044899 0.003669 -12.237 < 2e-16 ***
#> agegr.1 0.495299 0.188263 2.631 0.008516 **
#> agegr.2 0.758444 0.251910 3.011 0.002606 **
#> agegr.3 0.371103 0.360359 1.030 0.303097
#> agegr.4 0.305319 0.460557 0.663 0.507372
#> agegr.5 0.415576 0.568737 0.731 0.464963
#> height.NA.1 -46.366887 1.674739 -27.686 < 2e-16 ***
#> weight.NA.1 -2.718794 0.763433 -3.561 0.000369 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 6865.8 on 5039 degrees of freedom
#> Residual deviance: 3592.1 on 5029 degrees of freedom
#> AIC: 3596.8
#>
#> Number of Fisher Scoring iterations: 6
#>
### hsq with no missing values used to predict sex
ods2 <- ods[!is.na(ods$height),]
s5 <- syn(ods2, method = meth, seed = 6756, models = TRUE)
#>
#> Variable bmi with passive synthesis has missing values
#> so it will not be used to predict other variables.
#>
#> Method "cart" is not valid for a variable without predictors (height)
#> Method has been changed to "sample"
#>
#>
#> Synthesis
#> -----------
#> height weight bmi age agegr hsq sex
s5$models$sex
#>
#> Call:
#> NULL
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> 0.1524028 0.0468528 3.253 0.00114 **
#> height 1.5071387 0.1231117 12.242 < 2e-16 ***
#> age -0.0258619 0.0101953 -2.537 0.01119 *
#> hsq -0.0052908 0.0003767 -14.044 < 2e-16 ***
#> weight.0 -0.0434550 0.0036847 -11.793 < 2e-16 ***
#> agegr.1 0.4968707 0.1963467 2.531 0.01139 *
#> agegr.2 0.7103598 0.2565054 2.769 0.00562 **
#> agegr.3 0.3307380 0.3631623 0.911 0.36244
#> agegr.4 0.3029624 0.4621868 0.655 0.51215
#> agegr.5 0.4038381 0.5707149 0.708 0.47919
#> weight.NA.1 -2.4363326 0.8099956 -3.008 0.00263 **
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 6815.9 on 5004 degrees of freedom
#> Residual deviance: 3463.6 on 4994 degrees of freedom
#> AIC: 3456.6
#>
#> Number of Fisher Scoring iterations: 6
#>
### agegr with missing values used to predict sex because not numeric
ods3 <- ods
ods3$age[1:4] <- NA
ods3$agegr <- cut(ods3$age, c(15, 24, 34, 44, 59, 64, 120))
s6 <- syn(ods3, method = meth, seed = 6756, models = TRUE)
#>
#> Variable bmi with passive synthesis has missing values
#> so it will not be used to predict other variables.
#>
#> Variable hsq with passive synthesis has missing values
#> so it will not be used to predict other variables.
#>
#> Method "cart" is not valid for a variable without predictors (height)
#> Method has been changed to "sample"
#>
#>
#> Synthesis
#> -----------
#> height weight bmi age agegr hsq sex
s6$models$sex
#>
#> Call:
#> NULL
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> 0.429533 0.042657 10.069 < 2e-16 ***
#> height.0 -0.269177 0.008604 -31.285 < 2e-16 ***
#> weight.0 -0.044978 0.003671 -12.254 < 2e-16 ***
#> age.0 -0.024540 0.010217 -2.402 0.016309 *
#> agegr.1 0.499779 0.188350 2.653 0.007967 **
#> agegr.2 0.766891 0.252030 3.043 0.002343 **
#> agegr.3 0.379872 0.360472 1.054 0.291966
#> agegr.4 0.322277 0.460761 0.699 0.484274
#> agegr.5 0.435642 0.568974 0.766 0.443878
#> agegr.6 2.010719 2.251562 0.893 0.371840
#> height.NA.1 -46.384462 1.674914 -27.694 < 2e-16 ***
#> weight.NA.1 -2.722389 0.763576 -3.565 0.000363 ***
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 6867.2 on 5043 degrees of freedom
#> Residual deviance: 3591.2 on 5032 degrees of freedom
#> AIC: 3596.5
#>
#> Number of Fisher Scoring iterations: 6
#>