Tools for statistical disclosure control (sdc)
sdc.Rd
Labeling and removing unique replicates of unique actual (observed) individuals.
Usage
sdc(object, data, label = NULL, rm.replicated.uniques = FALSE,
uniques.exclude = NULL, recode.vars = NULL, bottom.top.coding = NULL,
recode.exclude = NULL, smooth.vars = NULL)
Arguments
- object
an object of class
synds
, which stands for 'synthesised data set'. It is typically created by functionsyn()
and it includesobject$m
synthesised data set(s).- data
the original (observed) data set.
- label
a single string with a label to be added to the synthetic data sets as a new variable to make it clear that the data are synthetic/fake.
- rm.replicated.uniques
a logical value indicating whether unique replicates of units that are unique also in the orginal data set should be removed.
- uniques.exclude
a single string or a vector of strings with name(s) of variable(s) to be excluded from the identification of uniques.
- recode.vars
a single string or a vector of strings with name(s) of variable(s) to be bottom- or/and top-coded.
- bottom.top.coding
a list of two-element vectors specifing bottom and top codes for each variable in
recode.vars
. If there is no need for bottom or top codingNA
should be used. If only one variable is to be recoded, codes can be given as a two-element vector.- recode.exclude
a list specifying for each variable in
recode.vars
values to be excluded from recoding, e.g. missing data codes. If all values should be considered for recodingNA
should be used. If only one variable is to be recoded, code(s) can be given as a single number or a vector.- smooth.vars
a single string or a vector of strings with name(s) of numeric variable(s) to be smoothed (
smooth.spline
function is used).
Examples
ods <- SD2011[1:1000,c("sex","age","edu","marital","income")]
s1 <- syn(ods, m = 2)
#>
#> Synthesis number 1
#> --------------------
#> sex age edu marital income
#>
#> Synthesis number 2
#> --------------------
#> sex age edu marital income
s1.sdc <- sdc(s1, ods, label="false_data", rm.replicated.uniques = TRUE,
recode.vars = c("age","income"),
bottom.top.coding = list(c(20,80),c(NA,2000)),
recode.exclude = list(NA,c(NA,-8)))
#>
#> m = 1
#> age: no. of bottom-coded values - 67, no. of top-coded values - 26
#> income: no. of bottom-coded values - 0, no. of top-coded values - 146
#> m = 2
#> age: no. of bottom-coded values - 66, no. of top-coded values - 31
#> income: no. of bottom-coded values - 0, no. of top-coded values - 148
#> no. of replicated uniques: 99, 113