如何在对 R 数据框进行采样后更改行索引?

r programmingserver side programmingprogramming更新于 2025/4/16 1:22:17

当我们从 R 数据框中随机抽取样本时,样本行的行号与原始数据框中的行号相同,这显然是由于随机化而发生的。但在进行分析时,这可能会造成混淆,尤其是在我们需要使用行的情况下,因此,我们可以将行的索引号转换为从 1 到所选样本中的行数的数字。

示例

考虑下面的数据框 −

> set.seed(111)
> x1<-rnorm(20,1.5)
> x2<-rnorm(20,2.5)
> x3<-rnorm(20,3)
> df1<-data.frame(x1,x2,x3)
> df1

输出

      x1          x2       x3
1 1.735220712 2.8616625 1.824274
2 1.169264128 2.8469644 1.878784
3 1.188376176 2.6897365 1.638096
4 -0.802345658 2.3404232 3.481125
5 1.329123955 2.8265492 3.741972
6 1.640278225 3.0982542 3.027825
7 0.002573344 0.6584657 3.331380
8 0.489811581 5.2180556 3.644114
9 0.551524395 2.6912444 5.485662
10 1.006037783 1.1987039 4.959982
11 1.326325872 -0.6132173 3.191663
12 1.093401220 1.5586426 4.552544
13 3.345636264 3.9002588 3.914242
14 1.894054110 0.8795300 3.358625
15 2.297528501 0.2340040 3.175096
16 -0.066665360 3.6629936 2.152732
17 1.414148991 2.3838450 3.978232
18 1.140860519 2.8342560 4.805868
19 0.306391033 1.8791419 3.122915
20 1.864186737 1.1901551 2.870228

从 df1 创建大小为 5 的样本 −

> df1_sample<-df1[sample(nrow(df1),5),]
> df1_sample

输出

      x1       x2       x3
18 1.140861 2.834256 4.805868
6 1.640278 3.098254 3.027825
13 3.345636 3.900259 3.914242
5 1.329124 2.826549 3.741972
15 2.297529 0.234004 3.175096

重命名样本中的行的索引号 −

> rownames(df1_sample)<-1:nrow(df1_sample)
> df1_sample

输出

      x1       x2       x3
1 1.140861 2.834256 4.805868
2 1.640278 3.098254 3.027825
3 3.345636 3.900259 3.914242
4 1.329124 2.826549 3.741972
5 2.297529 0.234004 3.175096

我们再来看一个例子 −

示例

> y1<-runif(20,2,5)
> y2<-runif(20,3,5)
> y3<-runif(20,5,10)
> y4<-runif(20,5,12)
> df2<-data.frame(y1,y2,y3,y4)
> df2

输出

      y1       y2       y3       y4
1 2.881213 4.894022 7.797367 6.487594
2 3.052896 3.223898 7.527572 6.695535
3 2.237543 4.127740 9.864026 8.754048
4 4.475907 4.696651 5.403004 6.239423
5 2.792642 4.023536 7.786222 8.992823
6 2.791539 4.333093 9.480036 6.087904
7 2.271143 3.053019 5.539486 8.320935
8 3.382534 3.212921 7.246406 10.091843
9 4.074728 4.390884 6.544056 10.924127
10 4.546881 3.546689 6.164413 11.710035
11 2.738344 4.489939 9.140333 8.211822
12 3.952763 4.490791 5.564392 7.542578
13 4.040586 3.333465 9.420011 11.554599
14 2.313604 4.959709 8.628101 11.193405
15 2.335957 4.189517 9.601667 9.694433
16 2.646964 4.376438 5.614787 10.929413
17 2.390349 3.343716 9.755718 11.017555
18 3.999001 3.083366 8.348515 8.370818
19 3.463324 3.379700 5.425484 7.219430
20 3.059911 4.522844 7.905784 11.420429
> df2_sample<-df2[sample(nrow(df2),7),]
> df2_sample

输出

      y1       y2       y3       y4
20 3.059911 4.522844 7.905784 11.420429
3 2.237543 4.127740 9.864026 8.754048
10 4.546881 3.546689 6.164413 11.710035
12 3.952763 4.490791 5.564392 7.542578
15 2.335957 4.189517 9.601667 9.694433
18 3.999001 3.083366 8.348515 8.370818
5 2.792642 4.023536 7.786222 8.992823
> rownames(df2_sample)<-1:nrow(df2_sample)
> df2_sample

输出

      y1       y2       y3       y4
1 3.059911 4.522844 7.905784 11.420429
2 2.237543 4.127740 9.864026 8.754048
3 4.546881 3.546689 6.164413 11.710035
4 3.952763 4.490791 5.564392 7.542578
5 2.335957 4.189517 9.601667 9.694433
6 3.999001 3.083366 8.348515 8.370818
7 2.792642 4.023536 7.786222 8.992823

相关文章