Skip to contents

Graphical comparisons of a variable (var) in the synthesised data set with the original (observed) data set within subgroups defined by the variables in a vector by. var can be a factor or a continuous variable and the plots produced will depend on the class of var. The variables in by will usually be factors or variables with only a few values.

Usage

multi.compare(object, data, var = NULL, by = NULL, msel = NULL,
  barplot.position = "fill", cont.type = "hist", y.hist = "count",
  boxplot.point = TRUE, binwidth = NULL, ...)

Arguments

object

an object of class synds, which stands for 'synthesised data set'. It is typically created by function syn() and it includes object$m synthesised data set(s).

data

an original (observed) data set.

var

variable to be compared between observed and synthetic data within subgroups.

by

variables to be tabulated or cross-tabulated to form groups.

barplot.position

type of barplot. The default "fill" gives a single bar with the proportions in each group while "dodge" gives side-by-side bars with the numbers in each category.

cont.type

default "hist" gives histograms and "boxplot" gives boxplots.

y.hist

defines y scale for histograms - "count" is default; "density" gives proportions.

boxplot.point

default (TRUE) adds individual points to boxplots.

msel

numbers of synthetic data sets to be used - must be numbers in the range 1:object$m. If NULL pooled synthetic data copies are compared with the original data.

binwidth

sets width of a bin for histograms.

...

additional parameters that can be supplied to ggplot.

Value

Plots as specified above. A table of the numbers in the subgroups is printed to the R console.

Numeric variables with fewer than 6 distinct values are changed to factors in order to make plots more readable.

Examples

### default synthesis of selected variables
vars <- c("sex", "age", "edu", "smoke")
ods  <- na.omit(SD2011[1:1000, vars])
s1 <- syn(ods)
#> 
#> Synthesis
#> -----------
#>  sex age edu smoke

### categorical var
multi.compare(s1, ods, var = "smoke", by = c("sex","edu"))
#> 
#> Plots of  smoke  by  sex edu 
#> Numbers in each plot (observed data):
#> 
#>         edu
#> sex      PRIMARY/NO EDUCATION VOCATIONAL/GRAMMAR SECONDARY
#>   MALE                     65                189       122
#>   FEMALE                  100                147       190
#>         edu
#> sex      POST-SECONDARY OR HIGHER
#>   MALE                         69
#>   FEMALE                      115


### numeric var
multi.compare(s1, ods, var = "age", by = c("sex"), y.hist = "density", binwidth = 5)
#> 
#> Plots of  age  by  sex 
#> Numbers in each plot (observed data):
#> 
#> sex
#>   MALE FEMALE 
#>    445    552 
#> Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
#>  Please use `after_stat(density)` instead.
#>  The deprecated feature was likely used in the synthpop package.
#>   Please report the issue to the authors.

multi.compare(s1, ods, var = "age", by = c("sex", "edu"), cont.type = "boxplot")
#> 
#> Plots of  age  by  sex edu 
#> Numbers in each plot (observed data):
#> 
#>         edu
#> sex      PRIMARY/NO EDUCATION VOCATIONAL/GRAMMAR SECONDARY
#>   MALE                     65                189       122
#>   FEMALE                  100                147       190
#>         edu
#> sex      POST-SECONDARY OR HIGHER
#>   MALE                         69
#>   FEMALE                      115