Abstract
Exploring Spatial Datasets with Histograms
by: Chengyu Sun, Divyakant Agrawal, Amr El Abbadi
Abstract:
As online spatial datasets grow both in number and sophistication,it becomes increasingly difficult for users to decide whether a datasetis suitable for their tasks, especially when they do not have priorknowledge of the dataset. In this paper, wepropose {\\em browsing} as an effective and efficient way to explorethe content of a spatial dataset. Browsing allows users to view thesize of a result set before evaluating the query at the database,thereby avoiding zero-hit/mega-hit queries and saving time and resources.Although the underlying technique supporting browsing issimilar to range query aggregation and selectivity estimation,spatial dataset browsing poses some unique challenges.In this paper, we identify a set of spatial relations that need to be supported in browsing applications, namely, the {\\em contains},{\\em contained} and the {\\em overlap} relations. We prove a lowerbound on the storage required to answer queries about the {\\em contains}relation accurately at a given resolution. We then present threestorage-efficient approximation algorithms which we believe to be thefirst to estimate query results about these spatial relations.We evaluate these algorithms with both synthetic and real worlddatasets and show that they provide highly accurate estimates fordatasets with various characteristics.
Keywords:
spatial databases, browsing, approximation, Euler\'s Formula
Date:
February 2001
Document: 2001-03