Responsive image
博碩士論文 etd-0707102-200510 詳細資訊
Title page for etd-0707102-200510
論文名稱
Title
一個資料儲倉中查詢區域和的遞迴相對前置和方法
A Recursive Relative Prefix Sum Approach to Range Queries in Data Warehouses
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
101
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2002-06-21
繳交日期
Date of Submission
2002-07-07
關鍵字
Keywords
資料方體、資料更新、資料儲倉、加總運算、查詢區域和
data warehouse, aggregation operation, range sum query, update, data cube
統計
Statistics
本論文已被瀏覽 5678 次,被下載 1336
The thesis/dissertation has been browsed 5678 times, has been downloaded 1336 times.
中文摘要
資料儲倉(Data Warehouse)提供了整合多個操作型資料庫(Operational Database)的一致性正確歷史資料,因此適合用於資料的分析應用。線上資料分析處理系統(OLAP)進一步提供了分析用的工具,如加總過後的資料,以便從資料儲倉萃取有用的資訊。區域查詢(Range Query)是我們常在線上資料分析處理系統上,對特定的數值維度區域作加總的運算。其中查詢區域和運算對於探索資料庫中不同屬性之間的關聯性,以及在做趨勢分析上非常的有用。藉由預先計算好的輔助資料,前置和方法(Prefix Sum Method)對於在資料方體上作查詢區域和的回應,只需要常數時間。然而,前置和方法對於資料更新的處理效能卻不甚理想。目前一些需要當前資訊的應用,要求的是迅速的回應時間,以及合理的更新處理時間。然而由於資料方體的大小,是和它的維度數量成指數關係,因此重新建立資料方體的成本是很驚人的,而且不切實際。為了處理動態資料方體上的問題,目前有幾種解決方法。這些方法都是利用特殊的資料結構,以額外的空間儲存相關資料,來對查詢區域和運算做快速的回應。例如雙重相對前置和方法(Double Relative Prefix Sum Method)使用三個特殊的資料結構來儲存輔助的資料。雖然雙重相對前置和方法改善了資料更新處理的時間,然而其代價是增加查詢所需的時間。我們在這篇論文裡,提出了一個新的方法,叫做遞迴相對前置和方法(Recursive Relative Prefix Sum Method);這方法對於資料的查詢與更新處理提供了一個折衷的方式。在遞迴k層的相對前置和方法中,我們使用了一個相對前置陣列,以及k個相對覆蓋值陣列來儲存一些輔助的資料。從效能分析的結果來看,我們所提出的方法在資料更新的處理上,所花的成本小於前置和方法,以及相對前置和方法(Relative Prefix Sum Method);此外,在大多數的情況下,我們所提出的方法在查詢上所花的成本小於雙重相對前置和方法;而相較於動態資料方體方法(Dynamic Data Cub Method),我們所提出的方法所需的額外儲存空間較少,並且有較短的查詢回應時間。因此,在我們所提出的遞迴相對前置和方法上,對於查詢區域和運算我們提供了合理的回應時間,並且大幅度的減少了資料更新處理的時間。同時,對於一些應用領域,資料更新的發生頻率在某些區域會小於其他的區域,我們也提供了一個方法─加權相對前置和方法(Weighted Relative Prefix Sum Method),它是遞迴相對前置和方法的一個延伸。在考慮區域的資料更新發生機率不同的情況下,加權相對前置和方法同樣的對於查詢區域和以及資料更新處理的成本,提供了一個折衷的方案。
Abstract
Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. On-Line Analytical Processing (OLAP) provides advanced analysis tools to extract information from data stored in a Data Warehouse.
OLAP is designed to provide aggregate information that can be used to analyze the contents of databases and data warehouses. A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing
ranges of values for numeric dimensions. Range sum queries are very useful in finding trends and in discovering relationships between attributes in the database. There is a method, prefix sum method, promises that any range sum query on a data cube can be answered in constant time by precomputing some auxiliary information. However, it is hampered by its update cost. For
today's applications, interactive data analysis applications which provide current or "near current" information will require fast
response time and have reasonable update time. Since the size of a data cube is exponential in the number of its dimensions, rebuilding the entire data cube can be very costly and is not
realistic. To cope with this dynamic data cube problem, several strategies have been proposed. They all use specific data structures, which require extra storage cost, to response range
sum query fast. For example, the double relative prefix sum method makes use of three components: a block prefix array, a relative overlay array and a relative prefix array to store auxiliary
information. Although the double relative prefix sum method improves the update cost, it increases the query time. In the thesis, we present a method, called the recursive relative
prefix sum method, which tries to provide a compromise between query and update cost. In the recursive relative prefix sum method with k levels, we use a relative prefix array and k
relative overlay arrays. From our performance study, we show that the update cost of our method is always less than that of the prefix sum method. In most of cases, the update cost of our method is less than that of the relative prefix sum method. Moreover, in most of cases, the query cost of our method is less than that of
the double relative prefix sum method. Compared with the dynamic data cube method, our method has lower storage cost and shorter query time. Consequently, our recursive relative prefix sum method has a reasonable response time for ad hoc range queries on the data cube, while at the same time, greatly reduces the update cost. In some applications, however, updating in some regions may happen more frequently than others. We also provide a solution, called the weighted relative prefix sum} method, for this situation. Therefore, this method can also provide a compromise between the range sum query cost and the update cost, when the update probabilities of different regions are considered.
目次 Table of Contents
Page
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . .. . . iv
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Data Warehouses . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 OLAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Range Sum Queries on OLAP . . . . . . . . . . . . . . . . . . . ..9
1.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.5 The Organization of the Thesis . . . . . . . . . . . . . . . . . 14
2. A Survey of Strategies for Range Sum Queries . . . . . . . . . . . 15
2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . ..15
2.2 Pre x Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Relative Pre x Sum . . . . . . . . . . . . . . . . . . . . . .. . 19
2.4 The Double Relative Pre x Sum Approach . . . . . . . . . . . . . 23
2.5 The Dynamic Data Cube . . . . . . . . . . . . . . . . . . . . . . 29
3. The Recursive Relative Pre x Sum Approach . . . . . . . . . . . . 34
3.1 Level-k Relative Overlay Arrays . . . . . . . . . . . . . . . . . 34
3.2 Relative Pre x Arrays . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Queries and Updates . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.1 Range Sum Queries . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.2 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 The Recursive Relative Pre x Sum Method with 3-Levels . . . . . . 46
3.5 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4. The Weighted Relative Pre x Sum Approach . . . . . . . . . . . . 52
4.1 The Weighted Relative Pre x Sum Method with 2-Levels . . . . . . 52
4.2 Queries and Updates . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.1 Range Sum Queries . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.2 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.3 A Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5. Performance Study . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . 64
5.2 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 Improving Updates . . . . . . . . . . . . . . . . . . . . . . . . 78
6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
A. Details of Calculation . . . . . . . . . . . . . . . . . . . . . . 94
B. Discussion of Simulation . . . . . . . . . . . . . . . . . . . . . 98
參考文獻 References
[1] S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ra-
makrishnan, and S. Sarawagi, "On the Computation of Multidimensional Aggre-
gates," In Proc. of the 22nd VLDB Conf., pp. 506-521, 1996.
[2] R. Agrawal. A. Gupta, and S. Sarawagi, "Modeling Multidimensional
Databases," In Proc. of the 13th IEEE Int'l. Conf. on Data Engineering, pp.
232-243, 1997.
[3] S. Chaudhuri, and U. Dayal, "An Overview of Data Warehouse and OLAP Tech-
nology." ACM SIGMOD Record, Vol. 26, No. 1, pp. 65-74, 1997.
[4] E. F. Codd, "Providing OLAP (On-Line Analytical Processing) to User-Analysts: An IT Mandate." Technical Report, E. F. Codd and Associates, 1993.
[5] S. Ge ner, D. Agrawal, A. El Abbadi, and T. Smith, "Relative Pre x Sums: An Efficient Approach for Querying Dynamic OLAP Data Cubes," In Proc. of the
15th IEEE Int'l. Conf. on Data Engineering, pp. 328-335, 1999.
[6] S. Ge ner, D. Agrawal, and A. El Abbadi, "The Dynamic Data Cube," In Proc.
of the 7th Int'l Conf. on Extending Database Technology, pp.237-248, 2000.
[7] J. Gray, A. Bosworth, A. Layman, and H. Pirahesh, "Data Cube: A Relational
Aggregation Operator Generalizing Group-by, Cross-Tabs and Sub-Totals," In
Proc. of the 12th IEEE Int'l Conf. on Data Engineering, pp. 152-159, 1996.
[8] H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman, "Index Selection
for OLAP," In Proc. of the 13rd IEEE Int'l. Conf. on Data Engineering, pp.
208-219, 1997.
[9] V. Harinarayan, A. Rajaraman, and J. D. Ullman, "Implementing Data Cube
Efficiently," In Proc. of the ACM SIGMOD Conf. on the Management of Data,
pp. 205-216, 1996.
[10] C. Ho, R. Agrawal, N. Megiddo, and R. Srikant, "Range Queries in OLAP Data Cubes," In Proc. of the ACM SIGMOD Conf. on the Management of Data, pp.
73-88, 1997.
[11] B. Inmon, H. William, and C. Kelley, "The 12 Rules of Data Warehouse For A Client/Server World," Data Management Review, Vol. 4, pp. 6-16, May, 1994.
[12] T. Johnson, and D. Shasha, "Hierarchically Split Cube Forests for Decision Support: Description and Tuned Design," Working paper, 1996.
[13] W. Liang, H. Wang, and M. E. Orlowska, "Range Queries in Dynamic OLAP
Data Cubes," Data and Knowledge Engineering, Vol. No. 34, pp. 21-38, Mar.
2000.
[14] I. S. Mumick, D. Quass, and B. S. Mumick, "Maintenance of Data Cubes and
Summary Tables in a Warehouse." In Proc. of the ACM SIGMOD Conf. on the
Management of Data, pp. 100-111, 1997.
[15] The OLAP Council. MD-API the OLAP Application Program Inerface Version
5.0 Speci cation, September 1996.
[16] T. Palpanas, "Knowledge Discovery in Data Warehouses," ACM SIGMOD
Record, Vol. 29, No. 3, pp. 88-100, 2000.
[17] K. A. Ross, and D. Srivastava, "Fast Computation of Sparse Datacubes," In
Proc. of 23rd VLDB Conf., pp. 116-125, 1997.
[18] B. Salzberg, and A. Reuter, "Indexing for Aggregation." Working paper, 1996.
[19] A. Shukla, P. M. Deshpande, J. F. Naughton, and K. Ramasamy, "Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies." In Proc. of 22nd VLDB Conf., pp. 522-531, 1996.
[20] Y. Zhao, P. M. Deshpande, and J. F. Naughton, "An Array-Based Algorithm
for Simultaneous Multidimensional Aggregates," In Proc. of the ACM SIGMOD
Conf. on the Management of Data , pp. 159-170, 1997.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內外都一年後公開 withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code