频数分布条形图
#读入数据并且查看数据集
dat=read.csv("UGdata.csv")
summary(dat)
id name sex region birth income
201205A01: 1 高** : 8 男:25 东部:16 1993-5-12 : 5 Min. : 1.10
201205A02: 1 孙** : 5 女:23 西部:17 1992-3-7 : 3 1st Qu.: 13.88
201205A03: 1 宋** : 4 中部:15 1993-5-11 : 3 Median : 17.85
201205A04: 1 杨** : 4 1994-6-15 : 3 Mean : 27.86
201205A05: 1 唐** : 3 1995-8-20 : 3 3rd Qu.: 34.17
201205A06: 1 吴** : 3 1996-10-24: 3 Max. :158.00
(Other) :42 (Other):21 (Other) :28
height weight score
Min. :147.0 Min. :45.00 Min. :46.80
1st Qu.:162.0 1st Qu.:62.00 1st Qu.:66.05
Median :166.5 Median :69.00 Median :74.00
Mean :168.2 Mean :68.96 Mean :73.21
3rd Qu.:175.0 3rd Qu.:74.25 3rd Qu.:79.62
Max. :191.0 Max. :91.00 Max. :96.00
1、将该群体的收入income划分为低中高三个等级,画条图显示频数分布;
# 对数据进行分组插入标签
dat$income.type[dat$income<=30]="低收入"
dat$income.type[dat$income>30&dat$income<=100]="中等收入"
dat$income.type[dat$income>100]="高收入"
a<-data.frame(table(dat$income.type))
x<-factor(a$Var1,levels = c("低收入","中等收入","高收入"))
barplot(Freq~x, data=a,main="群体收入频数分布柱状图",xlab="收入水平",ylab="频数",col=rainbow(3))
2、表格birth出生早于1992-3-7的同学的id、name和birth
dat$birth<-as.Date(dat$birth)
subset(dat,dat$birth>"1992-3-7",select=c(id,name,birth))
A data.frame: 40 × 3idnamebirth<fct><fct><date>
1201205A01赵**1992-04-08
2201205A02高**1993-05-12
3201205A03朱**1995-07-18
4201205A04许**1995-01-08
6201205A06吴**1995-11-27
7201205A07杨**1997-12-29
8201205A08宋**1995-08-20
.........
3、表格birth出生月份为3月的同学的id、name和birth
方法一
x<-c(grep(".*-03-.*",dat$birth))
dat[x,c(1,2,5)]
A data.frame: 6 × 3idnamebirth<fct><fct><date>5201205A05陈**1992-03-0711201205A11赵**1992-03-0712201205A12张**1992-03-0622201205A22赵**1992-03-0724201205A24王**1991-03-0434201205A34张**1992-03-06
方法二 (不可行) 原因是两向量长度不等,短的向量向唱的向量循环补齐
x<-c(grep(".*-03-.*",dat$birth,value = T))
subset(dat,dat$birth==x,select=c(id,name,birth))
A data.frame: 2 × 3idnamebirth<fct><fct><date>
12201205A12 张** 1992-03-06
22201205A22 赵** 1992-03-07