Skip to contents

Labeling and removing unique replicates of unique actual (observed) individuals.

Usage

sdc(object, data, label = NULL, rm.replicated.uniques = FALSE,
 uniques.exclude = NULL, recode.vars = NULL, bottom.top.coding = NULL,
 recode.exclude = NULL, smooth.vars = NULL)

Arguments

object

an object of class synds, which stands for 'synthesised data set'. It is typically created by function syn() and it includes object$m synthesised data set(s).

data

the original (observed) data set.

label

a single string with a label to be added to the synthetic data sets as a new variable to make it clear that the data are synthetic/fake.

rm.replicated.uniques

a logical value indicating whether unique replicates of units that are unique also in the orginal data set should be removed.

uniques.exclude

a single string or a vector of strings with name(s) of variable(s) to be excluded from the identification of uniques.

recode.vars

a single string or a vector of strings with name(s) of variable(s) to be bottom- or/and top-coded.

bottom.top.coding

a list of two-element vectors specifing bottom and top codes for each variable in recode.vars. If there is no need for bottom or top coding NA should be used. If only one variable is to be recoded, codes can be given as a two-element vector.

recode.exclude

a list specifying for each variable in recode.vars values to be excluded from recoding, e.g. missing data codes. If all values should be considered for recoding NA should be used. If only one variable is to be recoded, code(s) can be given as a single number or a vector.

smooth.vars

a single string or a vector of strings with name(s) of numeric variable(s) to be smoothed (smooth.spline function is used).

Value

An object provided as an argument adjusted in accordance with the other parameters' values.

Examples

ods <- SD2011[1:1000,c("sex","age","edu","marital","income")]
s1 <- syn(ods, m = 2)
#> 
#> Synthesis number 1
#> --------------------
#>  sex age edu marital income
#> 
#> Synthesis number 2
#> --------------------
#>  sex age edu marital income
s1.sdc <- sdc(s1, ods, label="false_data", rm.replicated.uniques = TRUE,
recode.vars = c("age","income"),
bottom.top.coding = list(c(20,80),c(NA,2000)),
recode.exclude = list(NA,c(NA,-8)))
#> 
#> m = 1
#> age: no. of bottom-coded values - 67, no. of top-coded values - 26
#> income: no. of bottom-coded values - 0, no. of top-coded values - 146
#> m = 2
#> age: no. of bottom-coded values - 66, no. of top-coded values - 31
#> income: no. of bottom-coded values - 0, no. of top-coded values - 148
#> no. of replicated uniques: 99, 113