Fitting (generalized) linear models to synthetic data
glm.synds.Rd
Fits generalized linear models or simple linear models to the synthesised
data set(s) using glm
and lm
function respectively.
Usage
glm.synds(formula, family = "binomial", data, ...)
lm.synds(formula, data, ...)
# S3 method for fit.synds
print(x, msel = NULL, ...)
Arguments
- formula
a symbolic description of the model to be estimated. A typical model has the form
response ~ predictors
. See the documentation ofglm
andformula
for details.- family
a description of the error distribution and link function to be used in the model. See the documentation of
glm
andfamily
for details.- data
an object of class
synds
, which stands for 'synthesised data set'. It is typically created by functionsyn
and it includesdata$m
synthesised data set(s).- ...
- x
an object of class
fit.synds
.- msel
index or indices of synthetic data copies for which coefficient estimates are to be displayed. If
NULL
(default) the combined (average) coefficient estimates are printed.
Value
The summary
function (summary.fit.synds
) can be
used to obtain the combined results of models fitted to each of the m
synthetic data sets.
An object of class fit.synds
. It is a list with the following
components:
- call
the original call to
glm.synds
orlm.synds
.- mcoefavg
combined (average) coefficient estimates.
- mvaravg
combined (average) variance estimates of
mcoef
.- analyses
summary.glm
orsummary.lm
object respectively or a list ofm
such objects.- fitting.function
function used to fit the model.
- n
a number of cases in the original data.
- k
a number of cases in the synthesised data.
- proper
a logical value indicating whether synthetic data were generated using proper synthesis.
- m
the number of synthetic versions of the observed data.
- method
a vector of synthesising methods applied to each variable in the saved synthesised data.
- incomplete
a logical value indicating whether the dependent variable in the model was not synthesised.
- mcoef
a matrix of coefficients estimates from all
m
syntheses.- mvar
a matrix of variance estimates from all
m
syntheses.
Examples
### Logit model
ods <- SD2011[1:1000, c("sex", "age", "edu", "marital", "ls", "smoke")]
s1 <- syn(ods, m = 3)
#>
#> Synthesis number 1
#> --------------------
#> sex age edu marital ls smoke
#>
#> Synthesis number 2
#> --------------------
#> sex age edu marital ls smoke
#>
#> Synthesis number 3
#> --------------------
#> sex age edu marital ls smoke
f1 <- glm.synds(smoke ~ sex + age + edu + marital + ls, data = s1, family = "binomial")
f1
#> Note: To get more details of the fit see vignette on inference.
#>
#> Call:
#> glm.synds(formula = smoke ~ sex + age + edu + marital + ls, family = "binomial",
#> data = s1)
#>
#> Average coefficient estimates from 3 syntheses:
#> (Intercept) sexFEMALE
#> 0.48783978 0.33628188
#> age eduVOCATIONAL/GRAMMAR
#> 0.03328396 -0.20169295
#> eduSECONDARY eduPOST-SECONDARY OR HIGHER
#> 0.52613311 0.74998347
#> maritalMARRIED maritalWIDOWED
#> -0.97882316 -1.20624768
#> maritalDIVORCED maritalDE FACTO SEPARATED
#> -1.84919903 3.01509355
#> lsPLEASED lsMOSTLY SATISFIED
#> -0.19061178 -0.59893295
#> lsMIXED lsMOSTLY DISSATISFIED
#> -0.65800593 -1.10577603
#> lsUNHAPPY lsTERRIBLE
#> -1.02349566 -0.91511305
#> maritalLEGALLY SEPARATED
#> 5.95406597
print(f1, msel = 1:2)
#> Note: To get more details of the fit see vignette on inference.
#>
#> Call:
#> glm.synds(formula = smoke ~ sex + age + edu + marital + ls, family = "binomial",
#> data = s1)
#>
#> Coefficient estimates for selected synthetic data set(s):
#> (Intercept) sexFEMALE age eduVOCATIONAL/GRAMMAR eduSECONDARY
#> syn=1 0.6275968 0.37704099 0.03289964 -0.4033997 0.6835348
#> syn=2 0.1188924 -0.03760743 0.02535927 -0.1095308 0.7072583
#> eduPOST-SECONDARY OR HIGHER maritalMARRIED maritalWIDOWED maritalDIVORCED
#> syn=1 0.5999478 -0.9007848 -1.3374957 -1.792095
#> syn=2 0.7795410 -0.6208188 -0.4858115 -1.063505
#> maritalDE FACTO SEPARATED lsPLEASED lsMOSTLY SATISFIED lsMIXED
#> syn=1 -0.4369683 -0.2182708 -0.92649295 -1.0272515
#> syn=2 -3.2836108 0.3655260 -0.06202271 0.1711328
#> lsMOSTLY DISSATISFIED lsUNHAPPY lsTERRIBLE maritalLEGALLY SEPARATED
#> syn=1 -1.2438716 -0.4729003 -1.038164 NA
#> syn=2 -0.5161799 -0.4428476 -1.529707 -1.229137
### Linear model
ods <- SD2011[1:1000,c("sex", "age", "income", "marital", "depress")]
ods$income[ods$income == -8] <- NA
s2 <- syn(ods, m = 3)
#>
#> Synthesis number 1
#> --------------------
#> sex age income marital depress
#>
#> Synthesis number 2
#> --------------------
#> sex age income marital depress
#>
#> Synthesis number 3
#> --------------------
#> sex age income marital depress
f2 <- lm.synds(depress ~ sex + age + log(income) + marital, data = s2)
f2
#> Note: To get more details of the fit see vignette on inference.
#>
#> Call:
#> lm.synds(formula = depress ~ sex + age + log(income) + marital,
#> data = s2)
#>
#> Average coefficient estimates from 3 syntheses:
#> (Intercept) sexFEMALE age
#> 4.7949298 0.6695879 0.1435322
#> log(income) maritalMARRIED maritalWIDOWED
#> -0.9340511 -1.0134384 0.2349639
#> maritalDIVORCED maritalLEGALLY SEPARATED maritalDE FACTO SEPARATED
#> -1.1182066 -1.7268073 -0.1270227
print(f2,1:3)
#> Note: To get more details of the fit see vignette on inference.
#>
#> Call:
#> lm.synds(formula = depress ~ sex + age + log(income) + marital,
#> data = s2)
#>
#> Coefficient estimates for selected synthetic data set(s):
#> (Intercept) sexFEMALE age log(income) maritalMARRIED maritalWIDOWED
#> syn=1 5.209308 0.7014139 0.1397755 -0.9672165 -0.9595500 -0.02452132
#> syn=2 3.249343 0.7188651 0.1579204 -0.8050770 -0.8519202 -0.11734146
#> syn=3 5.926138 0.5884847 0.1329007 -1.0298598 -1.2288451 0.84675449
#> maritalDIVORCED maritalLEGALLY SEPARATED maritalDE FACTO SEPARATED
#> syn=1 -1.5082447 -1.261336 0.04095237
#> syn=2 -0.1017294 NA 0.05252519
#> syn=3 -1.7446458 -2.192279 -0.47454558