R的資料處理

許sir

本小節主要在教以下三個方法

  • 重新編碼
  • 資料變形
  • 資料合併與分割

[PART 1]. 重新編碼

分析資料前常常需要再次整理資料,方便日後做分析,整理資料第一步往往是將資料的調整值經過一些調整

In [1]:
library(dplyr)
Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

In [2]:
data <- iris # 使用 R 內建的資料。

head(data)
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
4.6 3.1 1.5 0.2 setosa
5.0 3.6 1.4 0.2 setosa
5.4 3.9 1.7 0.4 setosa
In [3]:
data$Sepal.Length <- ifelse(data$Sepal.Length > 5, 1,2) # Sepal.Length 如果大於 5 會變成 1,不會就會變成 2
data$Species <- ifelse(data$Species %in% "setosa", "IsSetosa","Notsetosa") # %in% 代表有包含到的概念

head(data)
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
1 3.5 1.4 0.2 IsSetosa
2 3.0 1.4 0.2 IsSetosa
2 3.2 1.3 0.2 IsSetosa
2 3.1 1.5 0.2 IsSetosa
2 3.6 1.4 0.2 IsSetosa
1 3.9 1.7 0.4 IsSetosa

[PART 2]. 資料篩選 分割

2-1.資料篩選

In [4]:
data <- iris # 使用 R 內建的資料。
In [5]:
iris[1:3,] ##第1~3列Row,所有的行Column
iris[,"Species"] ##所有的列Row,名稱為Species的行Column
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
5.1 3.5 1.4 0.2 setosa
4.9 3.0 1.4 0.2 setosa
4.7 3.2 1.3 0.2 setosa
  1. setosa
  2. setosa
  3. setosa
  4. setosa
  5. setosa
  6. setosa
  7. setosa
  8. setosa
  9. setosa
  10. setosa
  11. setosa
  12. setosa
  13. setosa
  14. setosa
  15. setosa
  16. setosa
  17. setosa
  18. setosa
  19. setosa
  20. setosa
  21. setosa
  22. setosa
  23. setosa
  24. setosa
  25. setosa
  26. setosa
  27. setosa
  28. setosa
  29. setosa
  30. setosa
  31. setosa
  32. setosa
  33. setosa
  34. setosa
  35. setosa
  36. setosa
  37. setosa
  38. setosa
  39. setosa
  40. setosa
  41. setosa
  42. setosa
  43. setosa
  44. setosa
  45. setosa
  46. setosa
  47. setosa
  48. setosa
  49. setosa
  50. setosa
  51. versicolor
  52. versicolor
  53. versicolor
  54. versicolor
  55. versicolor
  56. versicolor
  57. versicolor
  58. versicolor
  59. versicolor
  60. versicolor
  61. versicolor
  62. versicolor
  63. versicolor
  64. versicolor
  65. versicolor
  66. versicolor
  67. versicolor
  68. versicolor
  69. versicolor
  70. versicolor
  71. versicolor
  72. versicolor
  73. versicolor
  74. versicolor
  75. versicolor
  76. versicolor
  77. versicolor
  78. versicolor
  79. versicolor
  80. versicolor
  81. versicolor
  82. versicolor
  83. versicolor
  84. versicolor
  85. versicolor
  86. versicolor
  87. versicolor
  88. versicolor
  89. versicolor
  90. versicolor
  91. versicolor
  92. versicolor
  93. versicolor
  94. versicolor
  95. versicolor
  96. versicolor
  97. versicolor
  98. versicolor
  99. versicolor
  100. versicolor
  101. virginica
  102. virginica
  103. virginica
  104. virginica
  105. virginica
  106. virginica
  107. virginica
  108. virginica
  109. virginica
  110. virginica
  111. virginica
  112. virginica
  113. virginica
  114. virginica
  115. virginica
  116. virginica
  117. virginica
  118. virginica
  119. virginica
  120. virginica
  121. virginica
  122. virginica
  123. virginica
  124. virginica
  125. virginica
  126. virginica
  127. virginica
  128. virginica
  129. virginica
  130. virginica
  131. virginica
  132. virginica
  133. virginica
  134. virginica
  135. virginica
  136. virginica
  137. virginica
  138. virginica
  139. virginica
  140. virginica
  141. virginica
  142. virginica
  143. virginica
  144. virginica
  145. virginica
  146. virginica
  147. virginica
  148. virginica
  149. virginica
  150. virginica
Levels:
  1. 'setosa'
  2. 'versicolor'
  3. 'virginica'
In [6]:
iris[1:10,c(T,F,T,F,T)] ##第1~10列Row,第1,3,5行Column (TRUE)
Sepal.LengthPetal.LengthSpecies
5.1 1.4 setosa
4.9 1.4 setosa
4.7 1.3 setosa
4.6 1.5 setosa
5.0 1.4 setosa
5.4 1.7 setosa
4.6 1.4 setosa
5.0 1.5 setosa
4.4 1.4 setosa
4.9 1.5 setosa
In [7]:
# Row的篩選可使用subset()函數
subset(iris,Species=="virginica") ##Species等於"virginica"的列Row,所有的行Column
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
1016.3 3.3 6.0 2.5 virginica
1025.8 2.7 5.1 1.9 virginica
1037.1 3.0 5.9 2.1 virginica
1046.3 2.9 5.6 1.8 virginica
1056.5 3.0 5.8 2.2 virginica
1067.6 3.0 6.6 2.1 virginica
1074.9 2.5 4.5 1.7 virginica
1087.3 2.9 6.3 1.8 virginica
1096.7 2.5 5.8 1.8 virginica
1107.2 3.6 6.1 2.5 virginica
1116.5 3.2 5.1 2.0 virginica
1126.4 2.7 5.3 1.9 virginica
1136.8 3.0 5.5 2.1 virginica
1145.7 2.5 5.0 2.0 virginica
1155.8 2.8 5.1 2.4 virginica
1166.4 3.2 5.3 2.3 virginica
1176.5 3.0 5.5 1.8 virginica
1187.7 3.8 6.7 2.2 virginica
1197.7 2.6 6.9 2.3 virginica
1206.0 2.2 5.0 1.5 virginica
1216.9 3.2 5.7 2.3 virginica
1225.6 2.8 4.9 2.0 virginica
1237.7 2.8 6.7 2.0 virginica
1246.3 2.7 4.9 1.8 virginica
1256.7 3.3 5.7 2.1 virginica
1267.2 3.2 6.0 1.8 virginica
1276.2 2.8 4.8 1.8 virginica
1286.1 3.0 4.9 1.8 virginica
1296.4 2.8 5.6 2.1 virginica
1307.2 3.0 5.8 1.6 virginica
1317.4 2.8 6.1 1.9 virginica
1327.9 3.8 6.4 2.0 virginica
1336.4 2.8 5.6 2.2 virginica
1346.3 2.8 5.1 1.5 virginica
1356.1 2.6 5.6 1.4 virginica
1367.7 3.0 6.1 2.3 virginica
1376.3 3.4 5.6 2.4 virginica
1386.4 3.1 5.5 1.8 virginica
1396.0 3.0 4.8 1.8 virginica
1406.9 3.1 5.4 2.1 virginica
1416.7 3.1 5.6 2.4 virginica
1426.9 3.1 5.1 2.3 virginica
1435.8 2.7 5.1 1.9 virginica
1446.8 3.2 5.9 2.3 virginica
1456.7 3.3 5.7 2.5 virginica
1466.7 3.0 5.2 2.3 virginica
1476.3 2.5 5.0 1.9 virginica
1486.5 3.0 5.2 2.0 virginica
1496.2 3.4 5.4 2.3 virginica
1505.9 3.0 5.1 1.8 virginica

Row的篩選也可搭配字串搜尋函數grepl()

In [8]:
knitr::kable(iris[grepl("color",iris$Species),]) ##Species包含"color"的列,所有的行

|    | Sepal.Length| Sepal.Width| Petal.Length| Petal.Width|Species    |
|:---|------------:|-----------:|------------:|-----------:|:----------|
|51  |          7.0|         3.2|          4.7|         1.4|versicolor |
|52  |          6.4|         3.2|          4.5|         1.5|versicolor |
|53  |          6.9|         3.1|          4.9|         1.5|versicolor |
|54  |          5.5|         2.3|          4.0|         1.3|versicolor |
|55  |          6.5|         2.8|          4.6|         1.5|versicolor |
|56  |          5.7|         2.8|          4.5|         1.3|versicolor |
|57  |          6.3|         3.3|          4.7|         1.6|versicolor |
|58  |          4.9|         2.4|          3.3|         1.0|versicolor |
|59  |          6.6|         2.9|          4.6|         1.3|versicolor |
|60  |          5.2|         2.7|          3.9|         1.4|versicolor |
|61  |          5.0|         2.0|          3.5|         1.0|versicolor |
|62  |          5.9|         3.0|          4.2|         1.5|versicolor |
|63  |          6.0|         2.2|          4.0|         1.0|versicolor |
|64  |          6.1|         2.9|          4.7|         1.4|versicolor |
|65  |          5.6|         2.9|          3.6|         1.3|versicolor |
|66  |          6.7|         3.1|          4.4|         1.4|versicolor |
|67  |          5.6|         3.0|          4.5|         1.5|versicolor |
|68  |          5.8|         2.7|          4.1|         1.0|versicolor |
|69  |          6.2|         2.2|          4.5|         1.5|versicolor |
|70  |          5.6|         2.5|          3.9|         1.1|versicolor |
|71  |          5.9|         3.2|          4.8|         1.8|versicolor |
|72  |          6.1|         2.8|          4.0|         1.3|versicolor |
|73  |          6.3|         2.5|          4.9|         1.5|versicolor |
|74  |          6.1|         2.8|          4.7|         1.2|versicolor |
|75  |          6.4|         2.9|          4.3|         1.3|versicolor |
|76  |          6.6|         3.0|          4.4|         1.4|versicolor |
|77  |          6.8|         2.8|          4.8|         1.4|versicolor |
|78  |          6.7|         3.0|          5.0|         1.7|versicolor |
|79  |          6.0|         2.9|          4.5|         1.5|versicolor |
|80  |          5.7|         2.6|          3.5|         1.0|versicolor |
|81  |          5.5|         2.4|          3.8|         1.1|versicolor |
|82  |          5.5|         2.4|          3.7|         1.0|versicolor |
|83  |          5.8|         2.7|          3.9|         1.2|versicolor |
|84  |          6.0|         2.7|          5.1|         1.6|versicolor |
|85  |          5.4|         3.0|          4.5|         1.5|versicolor |
|86  |          6.0|         3.4|          4.5|         1.6|versicolor |
|87  |          6.7|         3.1|          4.7|         1.5|versicolor |
|88  |          6.3|         2.3|          4.4|         1.3|versicolor |
|89  |          5.6|         3.0|          4.1|         1.3|versicolor |
|90  |          5.5|         2.5|          4.0|         1.3|versicolor |
|91  |          5.5|         2.6|          4.4|         1.2|versicolor |
|92  |          6.1|         3.0|          4.6|         1.4|versicolor |
|93  |          5.8|         2.6|          4.0|         1.2|versicolor |
|94  |          5.0|         2.3|          3.3|         1.0|versicolor |
|95  |          5.6|         2.7|          4.2|         1.3|versicolor |
|96  |          5.7|         3.0|          4.2|         1.2|versicolor |
|97  |          5.7|         2.9|          4.2|         1.3|versicolor |
|98  |          6.2|         2.9|          4.3|         1.3|versicolor |
|99  |          5.1|         2.5|          3.0|         1.1|versicolor |
|100 |          5.7|         2.8|          4.1|         1.3|versicolor |

2-2. 資料分割

使用subset()函式進行資料分割

In [9]:
subset(data, Sepal.Length > 5) # 只會出現 Sepal.Length > 5 的資料
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
15.1 3.5 1.4 0.2 setosa
65.4 3.9 1.7 0.4 setosa
115.4 3.7 1.5 0.2 setosa
155.8 4.0 1.2 0.2 setosa
165.7 4.4 1.5 0.4 setosa
175.4 3.9 1.3 0.4 setosa
185.1 3.5 1.4 0.3 setosa
195.7 3.8 1.7 0.3 setosa
205.1 3.8 1.5 0.3 setosa
215.4 3.4 1.7 0.2 setosa
225.1 3.7 1.5 0.4 setosa
245.1 3.3 1.7 0.5 setosa
285.2 3.5 1.5 0.2 setosa
295.2 3.4 1.4 0.2 setosa
325.4 3.4 1.5 0.4 setosa
335.2 4.1 1.5 0.1 setosa
345.5 4.2 1.4 0.2 setosa
375.5 3.5 1.3 0.2 setosa
405.1 3.4 1.5 0.2 setosa
455.1 3.8 1.9 0.4 setosa
475.1 3.8 1.6 0.2 setosa
495.3 3.7 1.5 0.2 setosa
517.0 3.2 4.7 1.4 versicolor
526.4 3.2 4.5 1.5 versicolor
536.9 3.1 4.9 1.5 versicolor
545.5 2.3 4.0 1.3 versicolor
556.5 2.8 4.6 1.5 versicolor
565.7 2.8 4.5 1.3 versicolor
576.3 3.3 4.7 1.6 versicolor
596.6 2.9 4.6 1.3 versicolor
1216.9 3.2 5.7 2.3 virginica
1225.6 2.8 4.9 2.0 virginica
1237.7 2.8 6.7 2.0 virginica
1246.3 2.7 4.9 1.8 virginica
1256.7 3.3 5.7 2.1 virginica
1267.2 3.2 6.0 1.8 virginica
1276.2 2.8 4.8 1.8 virginica
1286.1 3.0 4.9 1.8 virginica
1296.4 2.8 5.6 2.1 virginica
1307.2 3.0 5.8 1.6 virginica
1317.4 2.8 6.1 1.9 virginica
1327.9 3.8 6.4 2.0 virginica
1336.4 2.8 5.6 2.2 virginica
1346.3 2.8 5.1 1.5 virginica
1356.1 2.6 5.6 1.4 virginica
1367.7 3.0 6.1 2.3 virginica
1376.3 3.4 5.6 2.4 virginica
1386.4 3.1 5.5 1.8 virginica
1396.0 3.0 4.8 1.8 virginica
1406.9 3.1 5.4 2.1 virginica
1416.7 3.1 5.6 2.4 virginica
1426.9 3.1 5.1 2.3 virginica
1435.8 2.7 5.1 1.9 virginica
1446.8 3.2 5.9 2.3 virginica
1456.7 3.3 5.7 2.5 virginica
1466.7 3.0 5.2 2.3 virginica
1476.3 2.5 5.0 1.9 virginica
1486.5 3.0 5.2 2.0 virginica
1496.2 3.4 5.4 2.3 virginica
1505.9 3.0 5.1 1.8 virginica
In [10]:
subset(data, Sepal.Length == 5,select = c("Sepal.Length","Species")) # 只會出現 Sepal.Length 等於 5 的資料,且欄位只會出現 Sepal.Length 和 Species
Sepal.LengthSpecies
55 setosa
85 setosa
265 setosa
275 setosa
365 setosa
415 setosa
445 setosa
505 setosa
615 versicolor
945 versicolor
In [57]:
subset(data, Sepal.Length > 5,select = - Sepal.Length) # selct = 負的代表不要出現的欄位
Sepal.WidthPetal.LengthPetal.WidthSpecies
13.5 1.4 0.2 setosa
63.9 1.7 0.4 setosa
113.7 1.5 0.2 setosa
154.0 1.2 0.2 setosa
164.4 1.5 0.4 setosa
173.9 1.3 0.4 setosa
183.5 1.4 0.3 setosa
193.8 1.7 0.3 setosa
203.8 1.5 0.3 setosa
213.4 1.7 0.2 setosa
223.7 1.5 0.4 setosa
243.3 1.7 0.5 setosa
283.5 1.5 0.2 setosa
293.4 1.4 0.2 setosa
323.4 1.5 0.4 setosa
334.1 1.5 0.1 setosa
344.2 1.4 0.2 setosa
373.5 1.3 0.2 setosa
403.4 1.5 0.2 setosa
453.8 1.9 0.4 setosa
473.8 1.6 0.2 setosa
493.7 1.5 0.2 setosa
513.2 4.7 1.4 versicolor
523.2 4.5 1.5 versicolor
533.1 4.9 1.5 versicolor
542.3 4.0 1.3 versicolor
552.8 4.6 1.5 versicolor
562.8 4.5 1.3 versicolor
573.3 4.7 1.6 versicolor
592.9 4.6 1.3 versicolor
1213.2 5.7 2.3 virginica
1222.8 4.9 2.0 virginica
1232.8 6.7 2.0 virginica
1242.7 4.9 1.8 virginica
1253.3 5.7 2.1 virginica
1263.2 6.0 1.8 virginica
1272.8 4.8 1.8 virginica
1283.0 4.9 1.8 virginica
1292.8 5.6 2.1 virginica
1303.0 5.8 1.6 virginica
1312.8 6.1 1.9 virginica
1323.8 6.4 2.0 virginica
1332.8 5.6 2.2 virginica
1342.8 5.1 1.5 virginica
1352.6 5.6 1.4 virginica
1363.0 6.1 2.3 virginica
1373.4 5.6 2.4 virginica
1383.1 5.5 1.8 virginica
1393.0 4.8 1.8 virginica
1403.1 5.4 2.1 virginica
1413.1 5.6 2.4 virginica
1423.1 5.1 2.3 virginica
1432.7 5.1 1.9 virginica
1443.2 5.9 2.3 virginica
1453.3 5.7 2.5 virginica
1463.0 5.2 2.3 virginica
1472.5 5.0 1.9 virginica
1483.0 5.2 2.0 virginica
1493.4 5.4 2.3 virginica
1503.0 5.1 1.8 virginica

[PART 3]. 排序: 使用order()函數

如需對資料框做排序,可使用order()函數,order()函數可回傳由小到大之元素位置

3-1. 由小排到大

In [11]:
order(iris$Sepal.Length)

#回傳的第一個位置為14,表示iris$Sepal.Length中,數值最小的元素為第14個元素。
  1. 14
  2. 9
  3. 39
  4. 43
  5. 42
  6. 4
  7. 7
  8. 23
  9. 48
  10. 3
  11. 30
  12. 12
  13. 13
  14. 25
  15. 31
  16. 46
  17. 2
  18. 10
  19. 35
  20. 38
  21. 58
  22. 107
  23. 5
  24. 8
  25. 26
  26. 27
  27. 36
  28. 41
  29. 44
  30. 50
  31. 61
  32. 94
  33. 1
  34. 18
  35. 20
  36. 22
  37. 24
  38. 40
  39. 45
  40. 47
  41. 99
  42. 28
  43. 29
  44. 33
  45. 60
  46. 49
  47. 6
  48. 11
  49. 17
  50. 21
  51. 32
  52. 85
  53. 34
  54. 37
  55. 54
  56. 81
  57. 82
  58. 90
  59. 91
  60. 65
  61. 67
  62. 70
  63. 89
  64. 95
  65. 122
  66. 16
  67. 19
  68. 56
  69. 80
  70. 96
  71. 97
  72. 100
  73. 114
  74. 15
  75. 68
  76. 83
  77. 93
  78. 102
  79. 115
  80. 143
  81. 62
  82. 71
  83. 150
  84. 63
  85. 79
  86. 84
  87. 86
  88. 120
  89. 139
  90. 64
  91. 72
  92. 74
  93. 92
  94. 128
  95. 135
  96. 69
  97. 98
  98. 127
  99. 149
  100. 57
  101. 73
  102. 88
  103. 101
  104. 104
  105. 124
  106. 134
  107. 137
  108. 147
  109. 52
  110. 75
  111. 112
  112. 116
  113. 129
  114. 133
  115. 138
  116. 55
  117. 105
  118. 111
  119. 117
  120. 148
  121. 59
  122. 76
  123. 66
  124. 78
  125. 87
  126. 109
  127. 125
  128. 141
  129. 145
  130. 146
  131. 77
  132. 113
  133. 144
  134. 53
  135. 121
  136. 140
  137. 142
  138. 51
  139. 103
  140. 110
  141. 126
  142. 130
  143. 108
  144. 131
  145. 106
  146. 118
  147. 119
  148. 123
  149. 136
  150. 132
In [12]:
iris$Sepal.Length[14]
4.3
In [13]:
head(iris[order(iris$Sepal.Length),]) ##依照Sepal.Length欄位數值大小排序後的前六筆資料
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
144.3 3.0 1.1 0.1 setosa
94.4 2.9 1.4 0.2 setosa
394.4 3.0 1.3 0.2 setosa
434.4 3.2 1.3 0.2 setosa
424.5 2.3 1.3 0.3 setosa
44.6 3.1 1.5 0.2 setosa

3-2. 由大排到小

  • 若將decreasing參數設定為TRUE,則會回傳由大到小的元素位置,
  • 以iris$Sepal.Length為例,回傳的第一個位置為132
In [14]:
order(iris$Sepal.Length,decreasing = T)

#表示iris$Sepal.Length中,數值最大的元素為第132個元素。
  1. 132
  2. 118
  3. 119
  4. 123
  5. 136
  6. 106
  7. 131
  8. 108
  9. 110
  10. 126
  11. 130
  12. 103
  13. 51
  14. 53
  15. 121
  16. 140
  17. 142
  18. 77
  19. 113
  20. 144
  21. 66
  22. 78
  23. 87
  24. 109
  25. 125
  26. 141
  27. 145
  28. 146
  29. 59
  30. 76
  31. 55
  32. 105
  33. 111
  34. 117
  35. 148
  36. 52
  37. 75
  38. 112
  39. 116
  40. 129
  41. 133
  42. 138
  43. 57
  44. 73
  45. 88
  46. 101
  47. 104
  48. 124
  49. 134
  50. 137
  51. 147
  52. 69
  53. 98
  54. 127
  55. 149
  56. 64
  57. 72
  58. 74
  59. 92
  60. 128
  61. 135
  62. 63
  63. 79
  64. 84
  65. 86
  66. 120
  67. 139
  68. 62
  69. 71
  70. 150
  71. 15
  72. 68
  73. 83
  74. 93
  75. 102
  76. 115
  77. 143
  78. 16
  79. 19
  80. 56
  81. 80
  82. 96
  83. 97
  84. 100
  85. 114
  86. 65
  87. 67
  88. 70
  89. 89
  90. 95
  91. 122
  92. 34
  93. 37
  94. 54
  95. 81
  96. 82
  97. 90
  98. 91
  99. 6
  100. 11
  101. 17
  102. 21
  103. 32
  104. 85
  105. 49
  106. 28
  107. 29
  108. 33
  109. 60
  110. 1
  111. 18
  112. 20
  113. 22
  114. 24
  115. 40
  116. 45
  117. 47
  118. 99
  119. 5
  120. 8
  121. 26
  122. 27
  123. 36
  124. 41
  125. 44
  126. 50
  127. 61
  128. 94
  129. 2
  130. 10
  131. 35
  132. 38
  133. 58
  134. 107
  135. 12
  136. 13
  137. 25
  138. 31
  139. 46
  140. 3
  141. 30
  142. 4
  143. 7
  144. 23
  145. 48
  146. 42
  147. 9
  148. 39
  149. 43
  150. 14
In [15]:
iris$Sepal.Length[132]
7.9
In [16]:
head(iris[order(iris$Sepal.Length,decreasing = T),]) ##改為由大到小排序的前六筆資料
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
1327.9 3.8 6.4 2.0 virginica
1187.7 3.8 6.7 2.2 virginica
1197.7 2.6 6.9 2.3 virginica
1237.7 2.8 6.7 2.0 virginica
1367.7 3.0 6.1 2.3 virginica
1067.6 3.0 6.6 2.1 virginica

[PART 4].資料組合: cbird 和 rbind的應用

  • Row 列的組合 rbind()
  • Column 行的組合 cbind()

4-1. rbind的應用

In [17]:
irisAdd<-rbind(iris, #資料框
      c(1,1,1,1,"versicolor")  #在最尾巴新增一列
      ) 

tail(irisAdd)
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
1466.7 3 5.2 2.3 virginica
1476.3 2.5 5 1.9 virginica
1486.5 3 5.2 2 virginica
1496.2 3.4 5.4 2.3 virginica
1505.9 3 5.1 1.8 virginica
1511 1 1 1 versicolor

4-2. cbind的應用

In [18]:
irisAdd<-cbind(iris, #資料框
      rep("Add",nrow(iris))  #新增一行
      ) 

tail(irisAdd)
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpeciesrep("Add", nrow(iris))
1456.7 3.3 5.7 2.5 virginicaAdd
1466.7 3.0 5.2 2.3 virginicaAdd
1476.3 2.5 5.0 1.9 virginicaAdd
1486.5 3.0 5.2 2.0 virginicaAdd
1496.2 3.4 5.4 2.3 virginicaAdd
1505.9 3.0 5.1 1.8 virginicaAdd

[PART 5]. 長表與寬表轉換

In [21]:
library(reshape2)
library("tidyr")
Attaching package: ‘tidyr’

The following object is masked from ‘package:reshape2’:

    smiths

In [34]:
#隨便建一個資料集

mydata<-data.frame(
       Name = c("蘋果","谷歌","臉書","亞馬遜","騰訊"),
       Company = c("Apple","Google","Facebook","Amozon","Tencent"),
       Sale2013 = c(5000,3500,2300,2100,3100),
       Sale2014 = c(5050,3800,2900,2500,3300),
       Sale2015 = c(5050,3800,2900,2500,3300),
       Sale2016 = c(5050,3800,2900,2500,3300)
       )

mydata
NameCompanySale2013Sale2014Sale2015Sale2016
蘋果 Apple 5000 5050 5050 5050
谷歌 Google 3500 3800 3800 3800
臉書 Facebook2300 2900 2900 2900
亞馬遜 Amozon 2100 2500 2500 2500
騰訊 Tencent 3100 3300 3300 3300

寬表轉長表 melt(資料框/寬表,id.vars=需要保留的欄位)

長表轉寬表 dcast(資料框/長表,寬表分列依據~分欄位依據)

In [36]:
mydata1<-melt(
       mydata,                       #待轉換的數據集名稱
       id.vars=c("Company","Name"),  #要保留的主字段
       variable.name="Year",         #轉換後的分類字段名稱(維度)
       value.name="Sale"             #轉換後的度量值名稱
       )
mydata1
CompanyNameYearSale
Apple 蘋果 Sale20135000
Google 谷歌 Sale20133500
Facebook臉書 Sale20132300
Amozon 亞馬遜 Sale20132100
Tencent 騰訊 Sale20133100
Apple 蘋果 Sale20145050
Google 谷歌 Sale20143800
Facebook臉書 Sale20142900
Amozon 亞馬遜 Sale20142500
Tencent 騰訊 Sale20143300
Apple 蘋果 Sale20155050
Google 谷歌 Sale20153800
Facebook臉書 Sale20152900
Amozon 亞馬遜 Sale20152500
Tencent 騰訊 Sale20153300
Apple 蘋果 Sale20165050
Google 谷歌 Sale20163800
Facebook臉書 Sale20162900
Amozon 亞馬遜 Sale20162500
Tencent 騰訊 Sale20163300
In [38]:
spread(
   data=mydata1,   #帶轉換長數據框名稱
   key=Year,     #帶擴寬的類別變量(編程添加列名稱)  
   value=Sale)   #帶擴寬的度量值 (編程添加列度量值)
CompanyNameSale2013Sale2014Sale2015Sale2016
Amozon 亞馬遜 2100 2500 2500 2500
Apple 蘋果 5000 5050 5050 5050
Facebook臉書 2300 2900 2900 2900
Google 谷歌 3500 3800 3800 3800
Tencent 騰訊 3100 3300 3300 3300