Tools for statistical disclosure control (sdc)
sdc.RdLabeling and removing unique replicates of unique actual (observed) individuals.
Usage
sdc(object, data, label = NULL, rm.replicated.uniques = FALSE,
uniques.exclude = NULL, recode.vars = NULL, bottom.top.coding = NULL,
recode.exclude = NULL, smooth.vars = NULL)Arguments
- object
an object of class
synds, which stands for 'synthesised data set'. It is typically created by functionsyn()and it includesobject$msynthesised data set(s).- data
the original (observed) data set.
- label
a single string with a label to be added to the synthetic data sets as a new variable to make it clear that the data are synthetic/fake.
- rm.replicated.uniques
a logical value indicating whether unique replicates of units that are unique also in the orginal data set should be removed.
- uniques.exclude
a single string or a vector of strings with name(s) of variable(s) to be excluded from the identification of uniques.
- recode.vars
a single string or a vector of strings with name(s) of variable(s) to be bottom- or/and top-coded.
- bottom.top.coding
a list of two-element vectors specifing bottom and top codes for each variable in
recode.vars. If there is no need for bottom or top codingNAshould be used. If only one variable is to be recoded, codes can be given as a two-element vector.- recode.exclude
a list specifying for each variable in
recode.varsvalues to be excluded from recoding, e.g. missing data codes. If all values should be considered for recodingNAshould be used. If only one variable is to be recoded, code(s) can be given as a single number or a vector.- smooth.vars
a single string or a vector of strings with name(s) of numeric variable(s) to be smoothed (
smooth.splinefunction is used).
Examples
ods <- SD2011[1:1000,c("sex","age","edu","marital","income")]
s1 <- syn(ods, m = 2)
#>
#> Synthesis number 1
#> --------------------
#> sex age edu marital income
#>
#> Synthesis number 2
#> --------------------
#> sex age edu marital income
s1.sdc <- sdc(s1, ods, label="false_data", rm.replicated.uniques = TRUE,
recode.vars = c("age","income"),
bottom.top.coding = list(c(20,80),c(NA,2000)),
recode.exclude = list(NA,c(NA,-8)))
#>
#> m = 1
#> age: no. of bottom-coded values - 67, no. of top-coded values - 26
#> income: no. of bottom-coded values - 0, no. of top-coded values - 146
#> m = 2
#> age: no. of bottom-coded values - 66, no. of top-coded values - 31
#> income: no. of bottom-coded values - 0, no. of top-coded values - 148
#> no. of replicated uniques: 99, 113