White Papers
Hashed Samples: Selectivity Estimators for Set Similarity Selection Queries
Category: Software and Web Development
Tags: cpu, real-time, applications
Overview This paper studies selectivity estimation techniques for set similarity queries. A wide variety of similarity measures for sets have been proposed in the past. In this paper they concentrate on the class of weighted similarity measures (e.g., TF/IDF and BM25 cosine similarity and variants) and design selectivity estimators based on a priori constructed samples. First, they study the pitfalls associated with straightforward applications of random sampling, and argue that care needs to be taken in how the samples are constructed; uniform random sampling yields very low accuracy, while query sensitive real-time sampling is more expensive than exact solutions (both in CPU and I/O cost). They show how to build robust samples a priori, based on existing synopses for distinct value estimation.
- Publisher
- Association for Computing Machinery
- File Format
- Date Published
- May 29, 2009
- Format
- White Papers
- Topics
- Software Engineering
Similiar White Papers
High Level Best Practices in Software Configuration Management
When deploying new software configuration management (SCM) tools, implementers sometimes focus on perfecting fine-graine
Publisher: Perforce Software | Tags: management, software
Software Configuration Management: The Foundation of Global Distributed Development Today
By distributing development, you can create a collaborative work environment staffed by the best developers you can hire
Publisher: Perforce Software | Tags: developers, it department, network
BMC Best Practice Process Flow for Release Management
The Release Management process consists of four procedures. The first procedure is called "Request for Change Handling".
Publisher: BMC Software
Application Lifecycle Management With ClearQuest 7.1.0.0
This overview of the concepts and design goals behind an out-of-the-box Application Lifecycle Management (ALM) solution
White Paper: Tips for Writing Good Use Cases
Writing good use cases is more of an art than a science. In this IBM Rational white paper "Tips for writing good use cas
Association for Computing Machinery White Papers
Managing ETL Processes
ETL tools allow the definition of sometimes complex processes to extract, transform, and load heterogeneous data into a
Publisher: Association for Computing Machinery | Tags: data, data integration, data warehouse, management
GPS-Free Node Localization in Mobile Wireless Sensor Networks
An important problem in mobile ad-hoc wireless sensor networks is the localization of individual nodes, i.e., each node'
Publisher: Association for Computing Machinery | Tags: gps, infrastructure, network
A Black-Box Approach for Web Application SLA
Web servers nowadays have to cope with unprecedented amounts of workload, due to increasing popularity and complexity; i
Publisher: Association for Computing Machinery | Tags: applications, server
Load Balancing for Multimedia Streaming in Heterogeneous Peer-to-Peer Systems
Multimedia streaming of mostly user generated content is an ongoing trend, not only since the upcoming of Last.fm and Yo
Publisher: Association for Computing Machinery | Tags: user generated, user generated content, youtube
Multiobjective Network Design for Realistic Traffic Models
Network topology design problems find application in several real life scenarios. However, most designs in the past eith
Publisher: Association for Computing Machinery | Tags: network, realistic
Featured white papers
-
The Value of Location Intelligence in the Communications Industry
Public Services are under pressure, the challenge is to do more with less. How do you improve citizen satisfaction, increase cost efficiencies and improve service delivery? The power of location intelligence is helping many local authorities...
-
Best Practices for Translating Customer Satisfaction into Revenue
Today's support organisations are focused on two top-level metrics: financial results and customer satisfaction. For most, it's easy to track financial performance, but customer satisfaction is akin to speaking a foreign language...
-
HP print solutions and 3M
The objective for 3M was to optimize office printing infrastructure at 3M locations worldwide, reduce total cost and environmental footprint. Some of the business benefits acheived by switching to HP print solutions...
-
Check out these top business apps for your iPhone
-
Inside a Microsoft datacentre
-
Green IT without losing your edge
-
Peter Cochrane's latest video blog
-
What you need to know about Windows 7