Responsive image
博碩士論文 etd-0806101-104235 詳細資訊
Title page for etd-0806101-104235
論文名稱
Title
以資料探勘技術支援資料倉儲設計之研究
Supporting Data Warehouse Design with Data Mining Approach
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
75
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2001-07-23
繳交日期
Date of Submission
2001-08-06
關鍵字
Keywords
資料探勘、資料倉儲設計、知識探索、星狀綱目、資料倉儲、分群分析
Hierarchical Agglomerative Clustering Technique, Data Mining, Data Warehouse, Knowledge Discovery, Data Warehouse Design, Star Schema
統計
Statistics
本論文已被瀏覽 5766 次,被下載 4988
The thesis/dissertation has been browsed 5766 times, has been downloaded 4988 times.
中文摘要
傳統的關聯式交易資料庫在應付更大量資料的存取或查詢時已逐漸不敷使用,因而有了資料倉儲觀念的興起。資料倉儲透過交易資料之合併、轉換與整合,以提昇企業決策者之生產力及決策品質,並支援線上分析處理(OLAP)。然而,資料倉儲設計為一複雜且知識密集之過程。為了有效地支援資料倉儲設計過程,本研究的目的在於發展以資料探勘(Data Mining)技術為基礎之資料倉儲設計支援系統。更明確地說,本研究將著重於以一般化星狀綱目(Star Schemas)之知識型態來支援資料倉儲設計。本研究發展上述知識型態之學習與推理機制,同時也針對所提出的技術進行實證評估。實證評估結果顯示研究成果能有效地提昇資料倉儲設計之品質。
Abstract
Traditional relational database model does not have enough capability to cope with a great deal of data in finite time. To address these requirements, data warehouses and online analytical processing (OLAP) have emerged. Data warehouses improve the productivity of corporate decision makers through consolidation, conversion, transformation, and integration of operational data, and supports online analytical processing (OLAP). The data warehouse design is a complex and knowledge intensive process. It needs to consider not only the structure of the underlying operational databases (source-driven), but also the information requirements of decision makers (user-driven). Past research focused predominately on supporting the source-driven data warehouse design process, but paid less attention to supporting the user-driven data warehouse design process. Thus, the goal of this research is to propose a user-driven data warehouse design support system based on the knowledge discovery approach. Specifically, a Data Warehouse Design Support System was proposed and the generalization hierarchy and generalized star schemas were used as the data warehouse design knowledge. The technique for learning these design knowledge and reasoning upon them were developed. An empirical evaluation study was conducted to validate the effectiveness on the proposed techniques in supporting data warehouse design process. The result of empirical evaluation showed that this technique was useful to support data warehouse design especially on reducing the missing design and enhancing the potentially useful design.
目次 Table of Contents
CHAPTER 1. INTRODUCTION............................................1
1.1 Background.....................................................1
1.2 Research Motivation and Objective..............................3
1.3 Organization of the Thesis.....................................4

CHAPTER 2. LITERATURE REVIEW.......................................5
2.1 Definition of Star Schema......................................5
2.1.1 Fact Table...................................................5
2.1.2 Measure......................................................6
2.1.3 Dimension Table..............................................7
2.2 Commonsense Knowledge-based Approach to Database Design........7

CHAPTER 3. ARCHITECTURE OF DATA WAREHOUSE DESIGN SUPPORT SYSTEM...14

CHAPTER 4. LEARNING IN DATA WAREHOUSE DESIGN SUPPORT SYSTEM.......18
4.1 Distance Function.............................................18
4.1.1 Distance of Measures........................................18
4.1.2 Distance of Dimension Tables................................21
4.1.3 Distance of Star Schemas....................................22
4.2 Generation of Generalized Star Schema.........................23
4.3 Learning: Agglomerative Clustering as Generalization Process..24

CHAPTER 5. REASONING IN DATA WAREHOUSE DESIGN SUPPORT SYSTEM......30
5.1 Reasoning Process.............................................30
5.2 Reasoning Algorithm...........................................32

CHAPTER 6. EMPIRICAL EVALUATION...................................36
6.1 Implementation Environment....................................36
6.2 Case Collection for Learning..................................36
6.3 Parameter Tuning Experiments..................................38
6.3.1 Parameter Tuning for Distance Function......................38
6.3.2 Parameter Tuning for Extension Ratio in Learning Algorithm..41
6.4 Empirical Evaluation..........................................45
6.4.1 Participant Profile.........................................47
6.4.2 Cases for Experiment........................................47
6.4.3 Grading Criteria............................................48
6.4.4 Data Analysis and Results...................................49

CHAPTER 7. CONCLUSIONS AND FUTURE RESEARCH DIRECTIONS.............60

REFERENCES........................................................62

APPENDIX A: DESCRIPTION OF EXPERIMENT TASK........................64

APPENDIX B: DEFINITIONS OF GRADING CRITERIA.......................70
參考文獻 References
[AS94] Agrawal, R. and Srikant, R., “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, 1994.
[AV98] Adamson, C. and Venerable, M., Data Warehouse Design Solutions, John Wiley & Sons, New Work, NY, 1998.
[BHS98] Ballard, C., Herreman, D., Schau, D., Bell, R., Kim, E. and Valencic, A., Data Modeling Techniques for Data Warehousing, International Technical Support Organization, IBM, 1998 (available at: http://www.redbooks.ibm.com).
[BL97] Berry, M. J. A. and Linoff, G., Data Mining Techniques: For Marketing, Sales and Customer Support, John Wiley & Sons, New York, NY, 1997.
[BS97] Berson, A. and Smith, S. J., Data Warehousing, Data Mining, and OLAP, McGraw-Hill, New York, NY, 1997.
[BE99] Boehnlein, M. and Ende, A. U., “Deriving Initial Data Warehouse Structures from the Conceptual Data Models of the Underlying Operational Information Systems,” Proceedings of the ACM Second International Conference on Data Warehousing and OLAP, Kansas City, MO, November 2-6, 1999, pp.15-21.
[CS98] Cooper, D. R. and Schindler, P. S., Business Research Methods, 6th Ed, McGraw-Hill, New York, NY, 1998, pp.675-676.
[CVB99] Craig, R. S., Vivona, J. A. and Bercovitch, D., Microsoft Data Warehousing: Building Distributed Decision Support Systems, John Wiley & Sons, New York, NY, 1999.
[GMR98] Golfarelli, M., Maio, D. and Rizzi, S., “Conceptual Design of Data Warehouses from E/R Schemes,” Proceedings of 31st Hawaii International Conference on System Sciences, Kona, Hawaii, 1998.
[GR99] Golfarelli, M. and Rizzi, S., “A Methodological Framework for Data Warehouse Design”, Proceedings of the ACM First International Workshop on Data Warehouse and OLAP, Washington DC, 1999, pp.3-9.
[I92] Inmon, W. H., Building the Data Warehouse, John Wiley & Sons, New York, NY, 1992.
[K97] Kimball, R., The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses, John Wiley & Sons, New York, NY, 1997.
[KRR98] Kimball, R., Reeves, L., Ross, M. and Thornthwaite, W., The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses, John Wiley & Sons, New York, NY, 1998.
[LR97] Livingston, G. and Rumsby, B., Database Design for Data Warehouses: The Basic Requirements in Planning and Designing the Data Warehouse, R. Barquin and H. Edelstein (Eds.), Prentice Hall, Upper Saddle River, NJ, 1997.
[M94] Maes, P., “Agents that Reduce Work and Information Overload”, Communications of the ACM, Vol. 37, No. 7, July 1994, pp.31-40.
[P96] Poe, V., Building a Data Warehouse for Decision Support, Prentice Hall, Upper Saddle River, NJ, 1996.
[SIG97] Silverston, L., Inmon, W. H. and Graziano, K., The Data Model Resource Book: A Library of Logical Data Models and Data Warehouse Designs, John Wiley & Sons, Inc., New York, NY, 1997.
[S91] Sowa, J., “Issues in Knowledge Representation,” Principles of Semantic Networks: Explorations in the Representation of Knowledge, J. Sowa (Ed.), Morgan Kaufmann Publishers Inc., San Francisco, CA., 1991, pp.1-11.
[SCD97] Storey, V. C., Chiang, R. H. L., Dey, D., Goldstein, R. C. and Sundaresan, S., “Database Design with Common Sense Business Reasoning and Learning,” ACM Transactions on Database Systems, Vol. 22, No. 4, Dec. 1997, pp 471-512.
[TBC99] Tryfona, N., Busborg, F. and Christiansen, J. B., “starER: A Concetual Model for Data Warehouse Design”, Proceedings of the ACM Second International Workshop on Data Warehouse and OLAP, Kansas City, MO, November 2-6, 1999.
[WD01] Wei, C. and Dong, Y. X., “A Mining-based Category Evolution Approach to Managing Online Document Categories,” Proceedings of 34th Hawaii International Conference on System Sciences, Maui, Hawaii, Jan. 2001.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code