Multivariate comparison of synthesised and observed data
multi.compare.Rd
Graphical comparisons of a variable (var
) in the synthesised data set
with the original (observed) data set within subgroups defined by the
variables in a vector by
. var
can be a factor or a continuous
variable and the plots produced will depend on the class of var
.
The variables in by
will usually be factors or variables with only
a few values.
Usage
multi.compare(object, data, var = NULL, by = NULL, msel = NULL,
barplot.position = "fill", cont.type = "hist", y.hist = "count",
boxplot.point = TRUE, binwidth = NULL, ...)
Arguments
- object
an object of class
synds
, which stands for 'synthesised data set'. It is typically created by functionsyn()
and it includesobject$m
synthesised data set(s).- data
an original (observed) data set.
- var
variable to be compared between observed and synthetic data within subgroups.
- by
variables to be tabulated or cross-tabulated to form groups.
- barplot.position
type of barplot. The default
"fill"
gives a single bar with the proportions in each group while"dodge"
gives side-by-side bars with the numbers in each category.- cont.type
default
"hist"
gives histograms and"boxplot"
gives boxplots.- y.hist
defines y scale for histograms -
"count"
is default;"density"
gives proportions.- boxplot.point
default (
TRUE
) adds individual points to boxplots.- msel
numbers of synthetic data sets to be used - must be numbers in the range
1:object$m
. IfNULL
pooled synthetic data copies are compared with the original data.- binwidth
sets width of a bin for histograms.
- ...
additional parameters that can be supplied to
ggplot
.
Value
Plots as specified above. A table of the numbers in the subgroups is printed to the R console.
Numeric variables with fewer than 6 distinct values are changed to factors in order to make plots more readable.
Examples
### default synthesis of selected variables
vars <- c("sex", "age", "edu", "smoke")
ods <- na.omit(SD2011[1:1000, vars])
s1 <- syn(ods)
#>
#> Synthesis
#> -----------
#> sex age edu smoke
### categorical var
multi.compare(s1, ods, var = "smoke", by = c("sex","edu"))
#>
#> Plots of smoke by sex edu
#> Numbers in each plot (observed data):
#>
#> edu
#> sex PRIMARY/NO EDUCATION VOCATIONAL/GRAMMAR SECONDARY
#> MALE 65 189 122
#> FEMALE 100 147 190
#> edu
#> sex POST-SECONDARY OR HIGHER
#> MALE 69
#> FEMALE 115
### numeric var
multi.compare(s1, ods, var = "age", by = c("sex"), y.hist = "density", binwidth = 5)
#>
#> Plots of age by sex
#> Numbers in each plot (observed data):
#>
#> sex
#> MALE FEMALE
#> 445 552
#> Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
#> ℹ Please use `after_stat(density)` instead.
#> ℹ The deprecated feature was likely used in the synthpop package.
#> Please report the issue to the authors.
multi.compare(s1, ods, var = "age", by = c("sex", "edu"), cont.type = "boxplot")
#>
#> Plots of age by sex edu
#> Numbers in each plot (observed data):
#>
#> edu
#> sex PRIMARY/NO EDUCATION VOCATIONAL/GRAMMAR SECONDARY
#> MALE 65 189 122
#> FEMALE 100 147 190
#> edu
#> sex POST-SECONDARY OR HIGHER
#> MALE 69
#> FEMALE 115