Group numeric variables before synthesis
numtocat.syn.Rd
Selected numeric variables are grouped into factors with ranges selected from the data.
Usage
numtocat.syn(data, numtocat = NULL, print.flag = TRUE, cont.na = NULL,
catgroups = 5, style.groups = "quantile")
Arguments
- data
a data frame.
- numtocat
a vector of numbers or variable names of numeric variables to be grouped into factors. If
NULL
all the numeric variables in data will be grouped.- print.flag
if TRUE a list of grouped variables is printed.
- cont.na
a named list that gives the values of the named variables to be treated as separate categories, often missing values like
-8
. See the corresponding parameter ofsyn()
.- catgroups
a single integer or a vector of integers indicating the target number of groups for the variables in numtocat in the same order as numtocat, or as their relative postions in data. The achieved number of groups may be different if, for example there are fewer than
ngroups
distinct values.- style.groups
parameter of the function
classInt()
that determines how the breaks used to categorise each variable are chosen. See the help file forclassInt()
for details. The default setting"quantile"
makes groups of approximately equal size. To divide into approximately equal ranges we suggest using"fisher"
.
Value
A list with the following components:
- data
a data frame with the numeric variables replaced by factors grouped into ranges.
- breaks
a named list of the breaks used to divide each numeric variable into categories.
- levels
a named list of the levels for the categories of each numeric variable.
- orig
a data frame with the original numeric data.
- cont.na
a named list of the levels for the categorical version of each numeric variable.
- numtocat
names of the variables changed to categories.
- ind
positions in data of the variables changed to categories.
Examples
SD2011.cat <- numtocat.syn(SD2011, cont.na = list(income = -8 , unempdur = -8,
nofriend = -8))
#> Variable(s) age, unempdur, income, mmarr, ymarr, msepdiv, ysepdiv, depress, nofriend, nociga, height, weight, bmi grouped into categories.
summary(SD2011.cat$data)
#> sex age agegr placesize
#> MALE :2182 [16,28): 949 16-24: 702 URBAN 500,000 AND OVER: 392
#> FEMALE:2818 [28,42):1035 25-34: 726 URBAN 200,000-500,000 : 327
#> [42,54): 960 35-44: 748 URBAN 100,000-200,000 : 843
#> [54,64):1013 45-59:1361 URBAN 20,000-100,000 : 407
#> [64,97]:1043 60-64: 516 URBAN BELOW 20,000 : 642
#> 65+ : 943 RURAL AREAS :2389
#> NA's : 4
#> region edu
#> Mazowieckie : 570 PRIMARY/NO EDUCATION : 962
#> Slaskie : 500 VOCATIONAL/GRAMMAR :1613
#> Wielkopolskie: 413 SECONDARY :1482
#> Malopolskie : 371 POST-SECONDARY OR HIGHER: 936
#> Lodzkie : 358 NA's : 7
#> Dolnoslaskie : 319
#> (Other) :2469
#> eduspec
#> no specialisation :1647
#> technical science : 911
#> services for the population and transport services: 441
#> production and processing : 330
#> agriculture, forestry, fishing : 328
#> (Other) :1323
#> NA's : 20
#> socprof unempdur income
#> RETIRED :1241 -8 :1556 -8 :603
#> EMPLOYED IN PRIVATE SECTOR : 994 [0,24) :2721 [100,860) :742
#> EMPLOYED IN PUBLIC SECTOR : 600 [24,48]: 723 [1200,1500) :586
#> PUPIL OR STUDENT : 548 [1500,2000) :703
#> OTHER ECONOMICALLY INACTIVE: 444 [2000,16000]:996
#> (Other) :1140 [860,1200) :687
#> NA's : 33 NA's :683
#> marital mmarr ymarr msepdiv
#> SINGLE :1253 [1,4) : 487 [1937,1968): 725 [1,3) : 107
#> MARRIED :2979 [10,12]: 878 [1968,1977): 710 [10,12]: 164
#> WIDOWED : 531 [4,6) : 619 [1977,1985): 717 [3,6) : 157
#> DIVORCED : 199 [6,8) : 821 [1985,1997): 778 [6,7) : 104
#> LEGALLY SEPARATED : 7 [8,10) : 845 [1997,2011]: 750 [7,10) : 168
#> DE FACTO SEPARATED: 22 NA's :1350 NA's :1320 NA's :4300
#> NA's : 9
#> ysepdiv ls depress
#> [1944,1990): 135 PLEASED :1947 [0,2) :1538
#> [1990,1998): 138 MOSTLY SATISFIED :1692 [2,5) :1272
#> [1998,2003): 147 MIXED : 827 [5,8) :1012
#> [2003,2007): 138 MOSTLY DISSATISFIED: 274 [8,21]:1089
#> [2007,2011]: 167 DELIGHTED : 191 NA's : 89
#> NA's :4275 (Other) : 61
#> NA's : 8
#> trust trustfam trustneigh
#> MOST PEOPLE CAN BE TRUSTED: 678 YES :4470 YES :2959
#> ONE CAN`T BE TOO CAREFUL :3777 NO : 191 NO : 955
#> IT`S DIFFICULT TO TELL : 508 NO OPINION: 328 NO OPINION:1075
#> NA's : 37 NA's : 11 NA's : 11
#>
#>
#>
#> sport nofriend smoke nociga alcabuse alcsol
#> YES :3236 -8 : 41 YES :1277 [-8,10):3921 YES : 314 YES : 162
#> NO :1723 [0,2) : 490 NO :3713 [10,60]:1079 NO :4679 NO :4756
#> NA's: 41 [10,99]:1420 NA's: 10 NA's: 7 NA's: 82
#> [2,4) :1144
#> [4,6) :1152
#> [6,10) : 753
#>
#> workab wkabdur wkabint
#> YES : 130 Length:5000 YES, TO EU COUNTRY : 293
#> NO :4432 Class :character YES, TO NON-EU COUNTRY: 25
#> NA's: 438 Mode :character NO :4646
#> NA's : 36
#>
#>
#>
#> wkabintdur emcc englang
#> LESS THAN 1 YEAR : 91 GERMANY : 132 ACTIVE : 787
#> LESS THAN 1 TO 2 YEARS: 25 GREAT BRITAIN: 43 PASSIVE: 737
#> MORE THAN 2 YEARS : 21 NETHERLANDS : 28 NONE :3461
#> FOREVER : 29 BELGIUM : 11 NA's : 15
#> IT DEPENDS : 137 FRANCE : 11
#> NA's :4697 (Other) : 61
#> NA's :4714
#> height weight bmi
#> [116,160): 692 [37,60) : 823 [12.962963,21.96712) : 984
#> [160,165):1104 [60,70) :1149 [21.96712,24.382373) : 991
#> [165,170): 837 [70,78) : 980 [24.382373,26.573129): 988
#> [170,176):1106 [78,86) : 977 [26.573129,29.411765): 967
#> [176,202]:1226 [86,150]:1018 [29.411765,449.97973]:1011
#> NA's : 35 NA's : 53 NA's : 59
#>