Makes a codebook from a data frame
codebook.syn.Rd
Describes features of variables in a data frame relevant for synthesis.
Arguments
- data
a data frame with a data set to be synthesised.
- maxlevs
the number of factor levels above which separate tables with all labels are returned as part of
labs
component.
Value
a list with two components:
tab: a data frame with the following information about each variable:
name: variable name
class: class of variable
nmiss: number of missing values (
NA
)perctmiss: percentage of missing values
ndistinct: number of distinct values (excluding missing values)
details: range for numeric variables, maximum length for character variables, labels for factors with <= maxlevs levels
labs: a list of extra tables with labels for each factor with a number of levels greater than
maxlevs
.
Examples
codebook.syn(SD2011)
#> $tab
#> variable class nmiss perctmiss ndistinct
#> 1 sex factor 0 0.00 2
#> 2 age numeric 0 0.00 79
#> 3 agegr factor 4 0.08 6
#> 4 placesize factor 0 0.00 6
#> 5 region factor 0 0.00 16
#> 6 edu factor 7 0.14 4
#> 7 eduspec factor 20 0.40 27
#> 8 socprof factor 33 0.66 9
#> 9 unempdur numeric 0 0.00 30
#> 10 income numeric 683 13.66 406
#> 11 marital factor 9 0.18 6
#> 12 mmarr numeric 1350 27.00 12
#> 13 ymarr numeric 1320 26.40 74
#> 14 msepdiv numeric 4300 86.00 12
#> 15 ysepdiv numeric 4275 85.50 50
#> 16 ls factor 8 0.16 7
#> 17 depress numeric 89 1.78 22
#> 18 trust factor 37 0.74 3
#> 19 trustfam factor 11 0.22 3
#> 20 trustneigh factor 11 0.22 3
#> 21 sport factor 41 0.82 2
#> 22 nofriend numeric 0 0.00 44
#> 23 smoke factor 10 0.20 2
#> 24 nociga numeric 0 0.00 30
#> 25 alcabuse factor 7 0.14 2
#> 26 alcsol factor 82 1.64 2
#> 27 workab factor 438 8.76 2
#> 28 wkabdur character 0 0.00 33
#> 29 wkabint factor 36 0.72 3
#> 30 wkabintdur factor 4697 93.94 5
#> 31 emcc factor 4714 94.28 17
#> 32 englang factor 15 0.30 3
#> 33 height numeric 35 0.70 64
#> 34 weight numeric 53 1.06 90
#> 35 bmi numeric 59 1.18 1387
#> details
#> 1 'MALE' 'FEMALE'
#> 2 Range: 16 - 97
#> 3 See table in labs
#> 4 See table in labs
#> 5 See table in labs
#> 6 See table in labs
#> 7 See table in labs
#> 8 See table in labs
#> 9 Range: -8 - 48
#> 10 Range: -8 - 16000
#> 11 See table in labs
#> 12 Range: 1 - 12
#> 13 Range: 1937 - 2011
#> 14 Range: 1 - 12
#> 15 Range: 1944 - 2011
#> 16 See table in labs
#> 17 Range: 0 - 21
#> 18 'MOST PEOPLE CAN BE TRUSTED' 'ONE CAN`T BE TOO CAREFUL' 'IT`S DIFFICULT TO TELL'
#> 19 'YES' 'NO' 'NO OPINION'
#> 20 'YES' 'NO' 'NO OPINION'
#> 21 'YES' 'NO'
#> 22 Range: -8 - 99
#> 23 'YES' 'NO'
#> 24 Range: -8 - 60
#> 25 'YES' 'NO'
#> 26 'YES' 'NO'
#> 27 'YES' 'NO'
#> 28 Max length: 2
#> 29 'YES, TO EU COUNTRY' 'YES, TO NON-EU COUNTRY' 'NO'
#> 30 See table in labs
#> 31 See table in labs
#> 32 'ACTIVE' 'PASSIVE' 'NONE'
#> 33 Range: 116 - 202
#> 34 Range: 37 - 150
#> 35 Range: 12.962962962963 - 449.979730642764
#>
#> $labs
#> $labs$agegr
#> label
#> 1 16-24
#> 2 25-34
#> 3 35-44
#> 4 45-59
#> 5 60-64
#> 6 65+
#>
#> $labs$placesize
#> label
#> 1 URBAN 500,000 AND OVER
#> 2 URBAN 200,000-500,000
#> 3 URBAN 100,000-200,000
#> 4 URBAN 20,000-100,000
#> 5 URBAN BELOW 20,000
#> 6 RURAL AREAS
#>
#> $labs$region
#> label
#> 1 Dolnoslaskie
#> 2 Kujawsko-pomorskie
#> 3 Lodzkie
#> 4 Lubelskie
#> 5 Lubuskie
#> 6 Malopolskie
#> 7 Mazowieckie
#> 8 Opolskie
#> 9 Podkarpackie
#> 10 Podlaskie
#> 11 Pomorskie
#> 12 Slaskie
#> 13 Swietokrzyskie
#> 14 Warminsko-mazurskie
#> 15 Wielkopolskie
#> 16 Zachodnio-pomorskie
#>
#> $labs$edu
#> label
#> 1 PRIMARY/NO EDUCATION
#> 2 VOCATIONAL/GRAMMAR
#> 3 SECONDARY
#> 4 POST-SECONDARY OR HIGHER
#>
#> $labs$eduspec
#> label
#> 1 agriculture, forestry, fishing
#> 2 architecture and construction
#> 3 armed forces and country protection
#> 4 art
#> 5 biological sciences
#> 6 computer science
#> 7 economy and administration
#> 8 environmental protection
#> 9 healthcare
#> 10 journalism and information
#> 11 law
#> 12 liberal arts
#> 13 mathematics and statistics
#> 14 pedagogics
#> 15 physical sciences
#> 16 production and processing
#> 17 protection and safety
#> 18 public health
#> 19 services for the population and transport services
#> 20 social sciences
#> 21 social welfare
#> 22 technical science
#> 23 veterinary medicine
#> 24 other
#> 25 no specialisation
#> 26 not applicable
#> 27 lack of data
#>
#> $labs$socprof
#> label
#> 1 EMPLOYED IN PRIVATE SECTOR
#> 2 EMPLOYED IN PUBLIC SECTOR
#> 3 SELF-EMPLOYED
#> 4 FARMER
#> 5 LONG-TERM SICK/DISABLED
#> 6 RETIRED
#> 7 PUPIL OR STUDENT
#> 8 UNEMPLOYED
#> 9 OTHER ECONOMICALLY INACTIVE
#>
#> $labs$marital
#> label
#> 1 SINGLE
#> 2 MARRIED
#> 3 WIDOWED
#> 4 DIVORCED
#> 5 LEGALLY SEPARATED
#> 6 DE FACTO SEPARATED
#>
#> $labs$ls
#> label
#> 1 DELIGHTED
#> 2 PLEASED
#> 3 MOSTLY SATISFIED
#> 4 MIXED
#> 5 MOSTLY DISSATISFIED
#> 6 UNHAPPY
#> 7 TERRIBLE
#>
#> $labs$wkabintdur
#> label
#> 1 LESS THAN 1 YEAR
#> 2 LESS THAN 1 TO 2 YEARS
#> 3 MORE THAN 2 YEARS
#> 4 FOREVER
#> 5 IT DEPENDS
#>
#> $labs$emcc
#> label
#> 1 AUSTRIA
#> 2 BELGIUM
#> 3 DENMARK
#> 4 FRANCE
#> 5 GERMANY
#> 6 GREAT BRITAIN
#> 7 IRELAND
#> 8 ITALY
#> 9 NETHERLANDS
#> 10 SPAIN
#> 11 SWEDEN
#> 12 OTHER EU COUNTRIES
#> 13 AUSTRALIA
#> 14 CANADA
#> 15 USA
#> 16 NORWAY
#> 17 OTHER COUNTRIES
#>
#>