Fitting (generalized) linear models to synthetic data

Fits generalized linear models or simple linear models to the synthesised data set(s) using glm and lm function respectively.

Usage

glm.synds(formula, family = "binomial", data,  ...)
lm.synds(formula, data, ...)

# S3 method for fit.synds
print(x, msel = NULL, ...)

Arguments

formula: a symbolic description of the model to be estimated. A typical model has the form response ~ predictors. See the documentation of glm and formula for details.
family: a description of the error distribution and link function to be used in the model. See the documentation of glm and family for details.
data: an object of class synds, which stands for 'synthesised data set'. It is typically created by function syn and it includes data$m synthesised data set(s).
...: additional parameters passed to glm or lm.
x: an object of class fit.synds.
msel: index or indices of synthetic data copies for which coefficient estimates are to be displayed. If NULL (default) the combined (average) coefficient estimates are printed.

Value

The summary function (summary.fit.synds) can be used to obtain the combined results of models fitted to each of the m

synthetic data sets.

An object of class fit.synds. It is a list with the following components:

call: the original call to glm.synds or lm.synds.
mcoefavg: combined (average) coefficient estimates.
mvaravg: combined (average) variance estimates of mcoef.
analyses: summary.glm or summary.lm object respectively or a list of m such objects.
fitting.function: function used to fit the model.
n: a number of cases in the original data.
k: a number of cases in the synthesised data.
proper: a logical value indicating whether synthetic data were generated using proper synthesis.
m: the number of synthetic versions of the observed data.
method: a vector of synthesising methods applied to each variable in the saved synthesised data.
incomplete: a logical value indicating whether the dependent variable in the model was not synthesised.
mcoef: a matrix of coefficients estimates from all m syntheses.
mvar: a matrix of variance estimates from all m syntheses.

Examples

### Logit model
ods <- SD2011[1:1000, c("sex", "age", "edu", "marital", "ls", "smoke")]
s1 <- syn(ods, m = 3)
#> 
#> Synthesis number 1
#> --------------------
#>  sex age edu marital ls smoke
#> 
#> Synthesis number 2
#> --------------------
#>  sex age edu marital ls smoke
#> 
#> Synthesis number 3
#> --------------------
#>  sex age edu marital ls smoke
f1 <- glm.synds(smoke ~ sex + age + edu + marital + ls, data = s1, family = "binomial")
f1
#> Note: To get more details of the fit see vignette on inference.
#> 
#> Call:
#> glm.synds(formula = smoke ~ sex + age + edu + marital + ls, family = "binomial", 
#>     data = s1)
#> 
#> Average coefficient estimates from 3 syntheses:
#>                 (Intercept)                   sexFEMALE 
#>                  0.48783978                  0.33628188 
#>                         age       eduVOCATIONAL/GRAMMAR 
#>                  0.03328396                 -0.20169295 
#>                eduSECONDARY eduPOST-SECONDARY OR HIGHER 
#>                  0.52613311                  0.74998347 
#>              maritalMARRIED              maritalWIDOWED 
#>                 -0.97882316                 -1.20624768 
#>             maritalDIVORCED   maritalDE FACTO SEPARATED 
#>                 -1.84919903                  3.01509355 
#>                   lsPLEASED          lsMOSTLY SATISFIED 
#>                 -0.19061178                 -0.59893295 
#>                     lsMIXED       lsMOSTLY DISSATISFIED 
#>                 -0.65800593                 -1.10577603 
#>                   lsUNHAPPY                  lsTERRIBLE 
#>                 -1.02349566                 -0.91511305 
#>    maritalLEGALLY SEPARATED 
#>                  5.95406597 
print(f1, msel = 1:2)
#> Note: To get more details of the fit see vignette on inference.
#> 
#> Call:
#> glm.synds(formula = smoke ~ sex + age + edu + marital + ls, family = "binomial", 
#>     data = s1)
#> 
#> Coefficient estimates for selected synthetic data set(s):
#>       (Intercept)   sexFEMALE        age eduVOCATIONAL/GRAMMAR eduSECONDARY
#> syn=1   0.6275968  0.37704099 0.03289964            -0.4033997    0.6835348
#> syn=2   0.1188924 -0.03760743 0.02535927            -0.1095308    0.7072583
#>       eduPOST-SECONDARY OR HIGHER maritalMARRIED maritalWIDOWED maritalDIVORCED
#> syn=1                   0.5999478     -0.9007848     -1.3374957       -1.792095
#> syn=2                   0.7795410     -0.6208188     -0.4858115       -1.063505
#>       maritalDE FACTO SEPARATED  lsPLEASED lsMOSTLY SATISFIED    lsMIXED
#> syn=1                -0.4369683 -0.2182708        -0.92649295 -1.0272515
#> syn=2                -3.2836108  0.3655260        -0.06202271  0.1711328
#>       lsMOSTLY DISSATISFIED  lsUNHAPPY lsTERRIBLE maritalLEGALLY SEPARATED
#> syn=1            -1.2438716 -0.4729003  -1.038164                       NA
#> syn=2            -0.5161799 -0.4428476  -1.529707                -1.229137

### Linear model
ods <- SD2011[1:1000,c("sex", "age", "income", "marital", "depress")]
ods$income[ods$income == -8] <- NA
s2 <- syn(ods, m = 3)
#> 
#> Synthesis number 1
#> --------------------
#>  sex age income marital depress
#> 
#> Synthesis number 2
#> --------------------
#>  sex age income marital depress
#> 
#> Synthesis number 3
#> --------------------
#>  sex age income marital depress
f2 <- lm.synds(depress ~ sex + age + log(income) + marital, data = s2)
f2
#> Note: To get more details of the fit see vignette on inference.
#> 
#> Call:
#> lm.synds(formula = depress ~ sex + age + log(income) + marital, 
#>     data = s2)
#> 
#> Average coefficient estimates from 3 syntheses:
#>               (Intercept)                 sexFEMALE                       age 
#>                 4.7949298                 0.6695879                 0.1435322 
#>               log(income)            maritalMARRIED            maritalWIDOWED 
#>                -0.9340511                -1.0134384                 0.2349639 
#>           maritalDIVORCED  maritalLEGALLY SEPARATED maritalDE FACTO SEPARATED 
#>                -1.1182066                -1.7268073                -0.1270227 
print(f2,1:3)
#> Note: To get more details of the fit see vignette on inference.
#> 
#> Call:
#> lm.synds(formula = depress ~ sex + age + log(income) + marital, 
#>     data = s2)
#> 
#> Coefficient estimates for selected synthetic data set(s):
#>       (Intercept) sexFEMALE       age log(income) maritalMARRIED maritalWIDOWED
#> syn=1    5.209308 0.7014139 0.1397755  -0.9672165     -0.9595500    -0.02452132
#> syn=2    3.249343 0.7188651 0.1579204  -0.8050770     -0.8519202    -0.11734146
#> syn=3    5.926138 0.5884847 0.1329007  -1.0298598     -1.2288451     0.84675449
#>       maritalDIVORCED maritalLEGALLY SEPARATED maritalDE FACTO SEPARATED
#> syn=1      -1.5082447                -1.261336                0.04095237
#> syn=2      -0.1017294                       NA                0.05252519
#> syn=3      -1.7446458                -2.192279               -0.47454558

Fitting (generalized) linear models to synthetic data

Usage

Arguments

Value

See also

Examples