Skip to contents

Generates a synthetic categorical variable using ordered polytomous regression (without or with bootstrap).

Usage

syn.polr(y, x, xp, proper = FALSE, maxit = 1000, trace = FALSE,
  MaxNWts = 10000, ...)

Arguments

y

an original data vector of length n.

x

a matrix (n x p) of original covariates.

xp

a matrix (k x p) of synthesised covariates.

proper

for proper synthesis (proper = TRUE) a model is fitted to a bootstrapped sample of the original data.

maxit

the maximum number of iterations for nnet.

trace

switch for tracing optimization for nnet.

MaxNWts

the maximum allowable number of weights for nnet.

...

additional parameters passed to optim or nnet.

Details

Generates synthetic ordered categorical variables by the proportional odds logistic regression (polr) model. The function repeatedly applies logistic regression on the successive splits. The model is also known as the cumulative link model.

The algorithm of syn.polr uses the function polr from the MASS package.

In order to avoid bias due to perfect prediction, the data are augmented by the method of White, Daniel and Royston (2010).

In case the call to polr fails, usually because the data are very sparse, multinom function is used instead.

Value

A list with two components:

res

a vector of length k with synthetic values of y.

fit

a summary of the model fitted to the observed data and used to produce synthetic values.

References

White, I.R., Daniel, R. and Royston, P. (2010). Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables. Computational Statistics and Data Analysis, 54, 2267--2275.