國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,一個資料儲倉中查詢區域和的遞迴相對前置和方法 ,A Recursive Relative Prefix Sum Approach to Range Queries in Data Warehouses

論文名稱 Title	一個資料儲倉中查詢區域和的遞迴相對前置和方法 A Recursive Relative Prefix Sum Approach to Range Queries in Data Warehouses
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	90 學年度第 2 學期 The spring semester of Academic Year 90	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	101
研究生 Author	吳發榮 Fa-Jung Wu
指導教授 Advisor	張玉盈 Ye-In Chang
召集委員 Convenor	郭大維 Tei-Wei Kuo
口試委員 Advisory Committee	李建億 Chien-I Lee
口試日期 Date of Exam	2002-06-21	繳交日期 Date of Submission	2002-07-07
關鍵字 Keywords	資料方體、資料更新、資料儲倉、加總運算、查詢區域和 data warehouse, aggregation operation, range sum query, update, data cube
統計 Statistics	本論文已被瀏覽 5678 次，被下載 1336 次 The thesis/dissertation has been browsed 5678 times, has been downloaded 1336 times.

中文摘要
資料儲倉(Data Warehouse)提供了整合多個操作型資料庫(Operational Database)的一致性正確歷史資料，因此適合用於資料的分析應用。線上資料分析處理系統(OLAP)進一步提供了分析用的工具，如加總過後的資料，以便從資料儲倉萃取有用的資訊。區域查詢(Range Query)是我們常在線上資料分析處理系統上，對特定的數值維度區域作加總的運算。其中查詢區域和運算對於探索資料庫中不同屬性之間的關聯性，以及在做趨勢分析上非常的有用。藉由預先計算好的輔助資料，前置和方法(Prefix Sum Method)對於在資料方體上作查詢區域和的回應，只需要常數時間。然而，前置和方法對於資料更新的處理效能卻不甚理想。目前一些需要當前資訊的應用，要求的是迅速的回應時間，以及合理的更新處理時間。然而由於資料方體的大小，是和它的維度數量成指數關係，因此重新建立資料方體的成本是很驚人的，而且不切實際。為了處理動態資料方體上的問題，目前有幾種解決方法。這些方法都是利用特殊的資料結構，以額外的空間儲存相關資料，來對查詢區域和運算做快速的回應。例如雙重相對前置和方法(Double Relative Prefix Sum Method)使用三個特殊的資料結構來儲存輔助的資料。雖然雙重相對前置和方法改善了資料更新處理的時間，然而其代價是增加查詢所需的時間。我們在這篇論文裡，提出了一個新的方法，叫做遞迴相對前置和方法(Recursive Relative Prefix Sum Method)；這方法對於資料的查詢與更新處理提供了一個折衷的方式。在遞迴k層的相對前置和方法中，我們使用了一個相對前置陣列，以及k個相對覆蓋值陣列來儲存一些輔助的資料。從效能分析的結果來看，我們所提出的方法在資料更新的處理上，所花的成本小於前置和方法，以及相對前置和方法(Relative Prefix Sum Method)；此外，在大多數的情況下，我們所提出的方法在查詢上所花的成本小於雙重相對前置和方法；而相較於動態資料方體方法(Dynamic Data Cub Method)，我們所提出的方法所需的額外儲存空間較少，並且有較短的查詢回應時間。因此，在我們所提出的遞迴相對前置和方法上，對於查詢區域和運算我們提供了合理的回應時間，並且大幅度的減少了資料更新處理的時間。同時，對於一些應用領域，資料更新的發生頻率在某些區域會小於其他的區域，我們也提供了一個方法─加權相對前置和方法(Weighted Relative Prefix Sum Method)，它是遞迴相對前置和方法的一個延伸。在考慮區域的資料更新發生機率不同的情況下，加權相對前置和方法同樣的對於查詢區域和以及資料更新處理的成本，提供了一個折衷的方案。
Abstract
Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. On-Line Analytical Processing (OLAP) provides advanced analysis tools to extract information from data stored in a Data Warehouse. OLAP is designed to provide aggregate information that can be used to analyze the contents of databases and data warehouses. A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. Range sum queries are very useful in finding trends and in discovering relationships between attributes in the database. There is a method, prefix sum method, promises that any range sum query on a data cube can be answered in constant time by precomputing some auxiliary information. However, it is hampered by its update cost. For today's applications, interactive data analysis applications which provide current or "near current" information will require fast response time and have reasonable update time. Since the size of a data cube is exponential in the number of its dimensions, rebuilding the entire data cube can be very costly and is not realistic. To cope with this dynamic data cube problem, several strategies have been proposed. They all use specific data structures, which require extra storage cost, to response range sum query fast. For example, the double relative prefix sum method makes use of three components: a block prefix array, a relative overlay array and a relative prefix array to store auxiliary information. Although the double relative prefix sum method improves the update cost, it increases the query time. In the thesis, we present a method, called the recursive relative prefix sum method, which tries to provide a compromise between query and update cost. In the recursive relative prefix sum method with k levels, we use a relative prefix array and k relative overlay arrays. From our performance study, we show that the update cost of our method is always less than that of the prefix sum method. In most of cases, the update cost of our method is less than that of the relative prefix sum method. Moreover, in most of cases, the query cost of our method is less than that of the double relative prefix sum method. Compared with the dynamic data cube method, our method has lower storage cost and shorter query time. Consequently, our recursive relative prefix sum method has a reasonable response time for ad hoc range queries on the data cube, while at the same time, greatly reduces the update cost. In some applications, however, updating in some regions may happen more frequently than others. We also provide a solution, called the weighted relative prefix sum} method, for this situation. Therefore, this method can also provide a compromise between the range sum query cost and the update cost, when the update probabilities of different regions are considered.

目次 Table of Contents
Page ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . .. . . iv LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Data Warehouses . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 OLAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Range Sum Queries on OLAP . . . . . . . . . . . . . . . . . . . ..9 1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.5 The Organization of the Thesis . . . . . . . . . . . . . . . . . 14 2. A Survey of Strategies for Range Sum Queries . . . . . . . . . . . 15 2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . ..15 2.2 Prex Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Relative Prex Sum . . . . . . . . . . . . . . . . . . . . . .. . 19 2.4 The Double Relative Prex Sum Approach . . . . . . . . . . . . . 23 2.5 The Dynamic Data Cube . . . . . . . . . . . . . . . . . . . . . . 29 3. The Recursive Relative Prex Sum Approach . . . . . . . . . . . . 34 3.1 Level-k Relative Overlay Arrays . . . . . . . . . . . . . . . . . 34 3.2 Relative Prex Arrays . . . . . . . . . . . . . . . . . . . . . . 40 3.3 Queries and Updates . . . . . . . . . . . . . . . . . . . . . . . 42 3.3.1 Range Sum Queries . . . . . . . . . . . . . . . . . . . . . . . 43 3.3.2 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.4 The Recursive Relative Prex Sum Method with 3-Levels . . . . . . 46 3.5 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4. The Weighted Relative Prex Sum Approach . . . . . . . . . . . . 52 4.1 The Weighted Relative Prex Sum Method with 2-Levels . . . . . . 52 4.2 Queries and Updates . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.1 Range Sum Queries . . . . . . . . . . . . . . . . . . . . . . . 56 4.2.2 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.3 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5. Performance Study . . . . . . . . . . . . . . . . . . . . . . . . 64 5.1 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 64 5.2 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . 70 5.3 Improving Updates . . . . . . . . . . . . . . . . . . . . . . . . 78 6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 A. Details of Calculation . . . . . . . . . . . . . . . . . . . . . . 94 B. Discussion of Simulation . . . . . . . . . . . . . . . . . . . . . 98

參考文獻 References
[1] S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ra- makrishnan, and S. Sarawagi, "On the Computation of Multidimensional Aggre- gates," In Proc. of the 22nd VLDB Conf., pp. 506-521, 1996. [2] R. Agrawal. A. Gupta, and S. Sarawagi, "Modeling Multidimensional Databases," In Proc. of the 13th IEEE Int'l. Conf. on Data Engineering, pp. 232-243, 1997. [3] S. Chaudhuri, and U. Dayal, "An Overview of Data Warehouse and OLAP Tech- nology." ACM SIGMOD Record, Vol. 26, No. 1, pp. 65-74, 1997. [4] E. F. Codd, "Providing OLAP (On-Line Analytical Processing) to User-Analysts: An IT Mandate." Technical Report, E. F. Codd and Associates, 1993. [5] S. Ge ner, D. Agrawal, A. El Abbadi, and T. Smith, "Relative Pre x Sums: An Efficient Approach for Querying Dynamic OLAP Data Cubes," In Proc. of the 15th IEEE Int'l. Conf. on Data Engineering, pp. 328-335, 1999. [6] S. Ge ner, D. Agrawal, and A. El Abbadi, "The Dynamic Data Cube," In Proc. of the 7th Int'l Conf. on Extending Database Technology, pp.237-248, 2000. [7] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, "Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tabs and Sub-Totals," In Proc. of the 12th IEEE Int'l Conf. on Data Engineering, pp. 152-159, 1996. [8] H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman, "Index Selection for OLAP," In Proc. of the 13rd IEEE Int'l. Conf. on Data Engineering, pp. 208-219, 1997. [9] V. Harinarayan, A. Rajaraman, and J. D. Ullman, "Implementing Data Cube Efficiently," In Proc. of the ACM SIGMOD Conf. on the Management of Data, pp. 205-216, 1996. [10] C. Ho, R. Agrawal, N. Megiddo, and R. Srikant, "Range Queries in OLAP Data Cubes," In Proc. of the ACM SIGMOD Conf. on the Management of Data, pp. 73-88, 1997. [11] B. Inmon, H. William, and C. Kelley, "The 12 Rules of Data Warehouse For A Client/Server World," Data Management Review, Vol. 4, pp. 6-16, May, 1994. [12] T. Johnson, and D. Shasha, "Hierarchically Split Cube Forests for Decision Support: Description and Tuned Design," Working paper, 1996. [13] W. Liang, H. Wang, and M. E. Orlowska, "Range Queries in Dynamic OLAP Data Cubes," Data and Knowledge Engineering, Vol. No. 34, pp. 21-38, Mar. 2000. [14] I. S. Mumick, D. Quass, and B. S. Mumick, "Maintenance of Data Cubes and Summary Tables in a Warehouse." In Proc. of the ACM SIGMOD Conf. on the Management of Data, pp. 100-111, 1997. [15] The OLAP Council. MD-API the OLAP Application Program Inerface Version 5.0 Speci cation, September 1996. [16] T. Palpanas, "Knowledge Discovery in Data Warehouses," ACM SIGMOD Record, Vol. 29, No. 3, pp. 88-100, 2000. [17] K. A. Ross, and D. Srivastava, "Fast Computation of Sparse Datacubes," In Proc. of 23rd VLDB Conf., pp. 116-125, 1997. [18] B. Salzberg, and A. Reuter, "Indexing for Aggregation." Working paper, 1996. [19] A. Shukla, P. M. Deshpande, J. F. Naughton, and K. Ramasamy, "Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies." In Proc. of 22nd VLDB Conf., pp. 522-531, 1996. [20] Y. Zhao, P. M. Deshpande, and J. F. Naughton, "An Array-Based Algorithm for Simultaneous Multidimensional Aggregates," In Proc. of the ACM SIGMOD Conf. on the Management of Data , pp. 159-170, 1997.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內外都一年後公開 withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0707102-200510.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS