資料說明
1.本單元主題僅在介紹購物籃關聯分析
2.資料集,共有786 records 15 fields
[設定所需的函式庫(libraries)以及載入資料]
setwd("/home/m600/Working Area/Rdata Practice/Customer Course/shopping list")
library(arules)
shopping=read.table("./Shopping.txt",header=T, sep=",")
[Part 1].Data-ETL
1-1.取得資料集的概況
head(shopping)
## Ready.made Frozen.foods Alcohol Fresh.Vegetables Milk Bakery.goods
## 1 1 0 0 0 0 0
## 2 1 0 0 0 0 0
## 3 1 0 0 0 0 0
## 4 1 0 0 0 1 1
## 5 1 0 0 0 0 0
## 6 1 0 0 0 0 1
## Fresh.meat Toiletries Snacks Tinned.Goods GENDER Age MARITAL
## 1 0 0 1 0 Female 18 to 30 Widowed
## 2 0 1 0 0 Female 18 to 30 Separated
## 3 0 1 1 0 Male 18 to 30 Single
## 4 0 0 0 0 Female 18 to 30 Widowed
## 5 0 0 0 0 Female 18 to 30 Separated
## 6 0 0 1 1 Male 18 to 30 Single
## CHILDREN WORKING
## 1 No Yes
## 2 No Yes
## 3 No Yes
## 4 No Yes
## 5 No Yes
## 6 No No
shopping=shopping[,1:10]
shopping=na.exclude(shopping)
- 全部總共786筆資料
- 買Milk和Frozen Food的人是85筆
- 買Bakery goods的人是337筆
- 買Milk和Frozen Food而且買Bakery goods的人是71筆
- 買Milk和Frozen Food但不買Bakery goods的人是14筆
- 後項(R的rhs) – Bakery goods
- 前項(R的lhs) – Milk和Frozen Food
- 實例– 85,即符合前項的筆數
1-2.轉換為Matrix
shopping=as.matrix(shopping)
[Part 2].Apriori analysis
rule=apriori(shopping,parameter=list(supp=0.2,conf=0.5,maxlen=5),appearance=list(rhs="Alcohol",default="lhs"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport support minlen maxlen
## 0.5 0.1 1 none FALSE TRUE 0.2 1 5
## target ext
## rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 157
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[10 item(s), 786 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(head(sort(rule,by="support"),10))
## lhs rhs support confidence lift
## 1 {Frozen.foods} => {Alcohol} 0.2302799 0.5727848 1.452287
## 2 {Bakery.goods} => {Alcohol} 0.2150127 0.5014837 1.271504
inspect(head(sort(rule,by="confidence"),10))
## lhs rhs support confidence lift
## 1 {Frozen.foods} => {Alcohol} 0.2302799 0.5727848 1.452287
## 2 {Bakery.goods} => {Alcohol} 0.2150127 0.5014837 1.271504
- 支援度–10.814 = 85/786,是指購買前項產品的客戶佔全部客戶的比例(R的support是指以下的規則支援度)
- 信賴度 - 83.529 = 71/85,是指購買前項產品的客戶中也買後項產品的比例
- 規則支援%(即支援度x信賴度)–9.033 = 10.814% x 83.529% 或= 71 / 786,指購買前項產品也買後項產品的客戶佔全部客戶的比例
- 提昇–1.948 = (71/85)/ (337/786)或 = 83.529% / 42.875%,指購買後項產品佔購買前項產品的比例除以購買後項產品佔全部客戶的比例