如何在对 R 数据框进行采样后更改行索引?
r programmingserver side programmingprogramming更新于 2025/4/16 1:22:17
当我们从 R 数据框中随机抽取样本时,样本行的行号与原始数据框中的行号相同,这显然是由于随机化而发生的。但在进行分析时,这可能会造成混淆,尤其是在我们需要使用行的情况下,因此,我们可以将行的索引号转换为从 1 到所选样本中的行数的数字。
示例
考虑下面的数据框 −
> set.seed(111) > x1<-rnorm(20,1.5) > x2<-rnorm(20,2.5) > x3<-rnorm(20,3) > df1<-data.frame(x1,x2,x3) > df1
输出
x1 x2 x3 1 1.735220712 2.8616625 1.824274 2 1.169264128 2.8469644 1.878784 3 1.188376176 2.6897365 1.638096 4 -0.802345658 2.3404232 3.481125 5 1.329123955 2.8265492 3.741972 6 1.640278225 3.0982542 3.027825 7 0.002573344 0.6584657 3.331380 8 0.489811581 5.2180556 3.644114 9 0.551524395 2.6912444 5.485662 10 1.006037783 1.1987039 4.959982 11 1.326325872 -0.6132173 3.191663 12 1.093401220 1.5586426 4.552544 13 3.345636264 3.9002588 3.914242 14 1.894054110 0.8795300 3.358625 15 2.297528501 0.2340040 3.175096 16 -0.066665360 3.6629936 2.152732 17 1.414148991 2.3838450 3.978232 18 1.140860519 2.8342560 4.805868 19 0.306391033 1.8791419 3.122915 20 1.864186737 1.1901551 2.870228
从 df1 创建大小为 5 的样本 −
> df1_sample<-df1[sample(nrow(df1),5),] > df1_sample
输出
x1 x2 x3 18 1.140861 2.834256 4.805868 6 1.640278 3.098254 3.027825 13 3.345636 3.900259 3.914242 5 1.329124 2.826549 3.741972 15 2.297529 0.234004 3.175096
重命名样本中的行的索引号 −
> rownames(df1_sample)<-1:nrow(df1_sample) > df1_sample
输出
x1 x2 x3 1 1.140861 2.834256 4.805868 2 1.640278 3.098254 3.027825 3 3.345636 3.900259 3.914242 4 1.329124 2.826549 3.741972 5 2.297529 0.234004 3.175096
我们再来看一个例子 −
示例
> y1<-runif(20,2,5) > y2<-runif(20,3,5) > y3<-runif(20,5,10) > y4<-runif(20,5,12) > df2<-data.frame(y1,y2,y3,y4) > df2
输出
y1 y2 y3 y4 1 2.881213 4.894022 7.797367 6.487594 2 3.052896 3.223898 7.527572 6.695535 3 2.237543 4.127740 9.864026 8.754048 4 4.475907 4.696651 5.403004 6.239423 5 2.792642 4.023536 7.786222 8.992823 6 2.791539 4.333093 9.480036 6.087904 7 2.271143 3.053019 5.539486 8.320935 8 3.382534 3.212921 7.246406 10.091843 9 4.074728 4.390884 6.544056 10.924127 10 4.546881 3.546689 6.164413 11.710035 11 2.738344 4.489939 9.140333 8.211822 12 3.952763 4.490791 5.564392 7.542578 13 4.040586 3.333465 9.420011 11.554599 14 2.313604 4.959709 8.628101 11.193405 15 2.335957 4.189517 9.601667 9.694433 16 2.646964 4.376438 5.614787 10.929413 17 2.390349 3.343716 9.755718 11.017555 18 3.999001 3.083366 8.348515 8.370818 19 3.463324 3.379700 5.425484 7.219430 20 3.059911 4.522844 7.905784 11.420429
> df2_sample<-df2[sample(nrow(df2),7),] > df2_sample
输出
y1 y2 y3 y4 20 3.059911 4.522844 7.905784 11.420429 3 2.237543 4.127740 9.864026 8.754048 10 4.546881 3.546689 6.164413 11.710035 12 3.952763 4.490791 5.564392 7.542578 15 2.335957 4.189517 9.601667 9.694433 18 3.999001 3.083366 8.348515 8.370818 5 2.792642 4.023536 7.786222 8.992823
> rownames(df2_sample)<-1:nrow(df2_sample) > df2_sample
输出
y1 y2 y3 y4 1 3.059911 4.522844 7.905784 11.420429 2 2.237543 4.127740 9.864026 8.754048 3 4.546881 3.546689 6.164413 11.710035 4 3.952763 4.490791 5.564392 7.542578 5 2.335957 4.189517 9.601667 9.694433 6 3.999001 3.083366 8.348515 8.370818 7 2.792642 4.023536 7.786222 8.992823