White Papers
Fast Indexes and Algorithms for Set Similarity Selection Queries
Category: Software and Web Development
Overview Data collections often have inconsistencies that arise due to a variety of reasons, and it is desirable to be able to identify and resolve them efficiently. Set similarity queries are commonly used in data cleaning for matching similar data. This paper concentrates on set similarity selection queries: Given a query set, retrieve all sets in a collection with similarity greater than some threshold. Various set similarity measures have been proposed in the past for data cleaning purposes. This paper concentrates on weighted similarity functions like TF/IDF, and introduces variants that are well suited for set similarity selections in a relational database context. These variants have special semantic properties that can be exploited to design very efficient index structures and algorithms for answering queries efficiently.
- Publisher
- AT&T Intellectual Property
- File Format
- Date Published
- May 29, 2009
- Format
- White Papers
- Topics
- Software Engineering
Similiar White Papers
High Level Best Practices in Software Configuration Management
When deploying new software configuration management (SCM) tools, implementers sometimes focus on perfecting fine-graine
Publisher: Perforce Software | Tags: management, software
Software Configuration Management: The Foundation of Global Distributed Development Today
By distributing development, you can create a collaborative work environment staffed by the best developers you can hire
Publisher: Perforce Software | Tags: developers, it department, network
BMC Best Practice Process Flow for Release Management
The Release Management process consists of four procedures. The first procedure is called "Request for Change Handling".
Publisher: BMC Software
Application Lifecycle Management With ClearQuest 7.1.0.0
This overview of the concepts and design goals behind an out-of-the-box Application Lifecycle Management (ALM) solution
White Paper: Tips for Writing Good Use Cases
Writing good use cases is more of an art than a science. In this IBM Rational white paper "Tips for writing good use cas
AT&T Intellectual Property White Papers
On-Demand Webcast: Lowering the TCO for Business Applications
Critical business applications require continuous care and investment. Now, more than ever, enterprises must ensure thei
Publisher: AT&T Intellectual Property | Tags: applications, business applications, infrastructure, network, real-time, tco
Design, Implementation and Operation of a Large Enterprise Content Distribution Network
Content Distribution Networks (CDNs) are becoming an important resource in enterprise networks. They are being used in a
Publisher: AT&T Intellectual Property | Tags: applications
On-Demand Webcast: When Is VPLS the Right Choice?
Networking is constantly evolving, bringing new and demanding applications for today's enterprises to manage. Making WAN
Publisher: AT&T Intellectual Property | Tags: applications, network, wan
FastRWeb: Fast Interactive Web Framework for Data Mining Using R
R is widely used and accepted as a very versatile tool for statistical computing and data analysis. It provides a pletho
Publisher: AT&T Intellectual Property | Tags: computing, data, infrastructure
On-Demand Webcast: Comparing WAN Technology Choices
As new applications are added to network, how to choose the right networking solution? Frame Relay, ATM, Ethernet, MPLS,
Publisher: AT&T Intellectual Property | Tags: applications, atm, ethernet, ip, mpls, network, wan
Featured white papers
-
The Value of Location Intelligence in the Communications Industry
Public Services are under pressure, the challenge is to do more with less. How do you improve citizen satisfaction, increase cost efficiencies and improve service delivery? The power of location intelligence is helping many local authorities...
-
Best Practices for Translating Customer Satisfaction into Revenue
Today's support organisations are focused on two top-level metrics: financial results and customer satisfaction. For most, it's easy to track financial performance, but customer satisfaction is akin to speaking a foreign language...
-
HP print solutions and 3M
The objective for 3M was to optimize office printing infrastructure at 3M locations worldwide, reduce total cost and environmental footprint. Some of the business benefits acheived by switching to HP print solutions...
-
Check out these top business apps for your iPhone
-
Inside a Microsoft datacentre
-
Green IT without losing your edge
-
Peter Cochrane's latest video blog
-
What you need to know about Windows 7