Skip to contents

Generates a synthetic categorical variable using unordered polytomous regression (without or with bootstrap).

Usage

syn.polyreg(y, x, xp, proper = FALSE, maxit = 1000, trace = FALSE,
            MaxNWts = 10000, ...)

Arguments

y

an original data vector of length n.

x

a matrix (n x p) of original covariates.

xp

a matrix (k x p) of synthesised covariates.

proper

for proper synthesis (proper = TRUE) a multinomial model is fitted to a bootstrapped sample of the original data.

maxit

the maximum number of iterations for nnet.

trace

switch for tracing optimization for nnet.

MaxNWts

the maximum allowable number of weights for nnet.

...

additional parameters passed to nnet.

Details

Generates synthetic categorical variables by the polytomous regression model. The method consists of the following steps:

  1. Fit categorical response as a multinomial model.

  2. Compute predicted categories.

  3. Add appropriate noise to predictions.

The algorithm of syn.polyreg uses the function multinom from the nnet package. Any numerical variables are scaled to cover the range (0,1) before fitting. Warnings are printed if the algorithm fails to converge in maxit iterations and also if the synthesised data has only one category. The latter may occur if the variable being synthesised is sparse so that the algorithm fails to iterate.

In order to avoid bias due to perfect prediction, the data are augmented by the method of White, Daniel and Royston (2010).

NOTE that when the function is called by setting elements of method in syn() to "polyreg", the parameters maxit, trace and MaxNWts can be supplied to syn() as e.g. polyreg.maxit.

Value

A list with two components:

res

a vector of length k with synthetic values of y.

fit

a summary of the model fitted to the observed data and used to produce synthetic values.

References

White, I.R., Daniel, R. and Royston, P. (2010). Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables. Computational Statistics and Data Analysis, 54, 2267--2275.

See also