Responsive image
博碩士論文 etd-0629119-131411 詳細資訊
Title page for etd-0629119-131411
論文名稱
Title
利用數據流預測潛在顧客
Prediction of Potential Customers with Data Streams
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
33
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2019-07-09
繳交日期
Date of Submission
2019-07-29
關鍵字
Keywords
查全率(recall)、精確率(precision)、AUC、預測準確率(accuracy)、XGBoost 分類器、梯度提升(gradient boosting)、邏輯斯迴歸(logistic regression)、特徵整合、電商產業、數據流
gradient boosting, logistic regression, XGBoost classification, aggregated features, data streams, e-commerce
統計
Statistics
本論文已被瀏覽 5617 次,被下載 0
The thesis/dissertation has been browsed 5617 times, has been downloaded 0 times.
中文摘要
對於電商產業來說,預測潛在顧客是重要的議題。本研究對於分析該議題有兩筆資料。第一筆資料來自於Kaggle 競賽『Acquire Value Shopper Challenge』,該筆資料含有顧客的購物紀錄及優惠券資訊,大小為22GB,主要目標為預測哪些顧客得到優惠券後仍會回購。第二筆資料來自台灣電商,該筆資料含有顧客的下單交易記錄以及商品相關資訊,主要目標為根據過往購物紀錄預測哪些顧客會再購買新商品。首先,我們會將數據流整合成新的資料矩陣,該矩陣中每列代表資料中每位顧客,特徵來自於整合其他數據流該顧客的過去交易資訊。在預測模型部分,我們使用邏輯斯迴歸(logistic regression),梯度提升(gradient boosting) 以及 XGBoost 分類器。模型準則為AUC、預測準確率(accuracy)、查全率(recall)以及精確率(precision)。預測結果為電商公司提供有效資訊,使其廣告成本預算達目標。
Abstract
Predicting potential customers is an important challenge for e-commerce companies. In this study, we analyze this problem from two data sets. The first dataset, based on the Kaggle competition called “Acquire Value Shopper Challenge”, contains all the given transactions with size over 22GB. The objective is to make predictions of customers who will remain loyal after a promotional period. The second dataset is from a local e-commerce company, which contains customer purchase history and product information. The objective is to make predictions for potential customers who will buy a new product given their past purchase information. We first transfer the data stream to a new data format where columns represent aggregated features and rows represent customers. The aspect of feature engineering is based on the aggregation of past customer transactions. We consider the prediction models including logistic regression, gradient boosting and XGBoost classifier. To evaluate the models, the area under the operating characteristic curve (AUC) and prediction accuracies are adopted. The results provide useful information for e-companies to keep their advertising budget in line with promotional and marketing goals.
目次 Table of Contents
論文審定書 i
論文公開授權書 ii
致謝 iii
摘要 iv
Abstract v
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Machine Learning Model 1
2.1 Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Gradient Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Extreme Gradient Boosting . . . . . . . . . . . . . . . . . . . . . . 2
3 Acquire Value Shopper Challenge 4
3.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.3 Training set and Test set . . . . . . . . . . . . . . . . . . . . . . . . 7
3.4 Leave-one-offer-out method . . . . . . . . . . . . . . . . . . . . . 8
3.5 Prediction Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 Local E-commerce Company 10
4.1 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Training set and Test sets . . . . . . . . . . . . . . . . . . . . . . . 16
4.4 Prediction Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Conclusion 23
參考文獻 References
[1] Chen, T., and Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. International Conference on Knowledge Discovery and Data Mining 6, 785-794
[2] Géron, A. (2017). Hands-on Machine Learning with Scikit-Learn & TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media.
[3] James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. Springer.
[4] Leo, B. (2001). Random forests, 45(1) : 5–32 Maching Learning Springer.
[5] Nikulin, V. (2015). On the method for data streams aggregation to predict shoppers loyalty. International Joint Conference on Neural Networks.
[6] Nikulin, V. (2016). Prediction of the shoppers loyalty with aggregated data streams. Journal of Artificial Intelligence and Soft Computing Research 6; 69-79.
[7] Acquire Valued Shoppers Challenge, part 1 - Competitions go through | Coursera. 2019. Coursera. https://www.coursera.org/lecture/competitive-data-science/acquire-valued-shoppers-challenge-part-1-nudnt.
[8] Acquire Valued Shoppers Challenge, part 2 - Competitions go through | Coursera. 2019. Coursera. https://www.coursera.org/lecture/competitive-data-science/acquire-valued-shoppers-challenge-part-2-b5i7N.
[9] Auduno. Kaggle code for Acquire Valued Shoppers Challenge. https://github.com/auduno/Kaggle-Acquire-Valued-Shoppers-Challenge.
[10] Sanket Doshi. Brief on Recommender Systems. https://towardsdatascience.com/brief-on-recommender-systems-b86a1068a4dd.
[11] XGBoost Documentation - xgboost 0.81 documentation. 2019. https://xgboost.readthedocs.io/en/latest/index.html.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus:開放下載的時間 available 2024-07-29
校外 Off-campus:開放下載的時間 available 2024-07-29

您的 IP(校外) 位址是 18.188.61.223
現在時間是 2024-04-18
論文校外開放下載的時間是 2024-07-29

Your IP address is 18.188.61.223
The current date is 2024-04-18
This thesis will be available to you on 2024-07-29.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 2024-07-29

QR Code