White Papers

Evaluating Structural Similarity in XML Documents

Overview XML documents on the web are often found without DTDs, particularly when these documents have been created from legacy HTML. Yet having knowledge of the DTD can be valuable in querying and manipulating such documents. Recent work has given a means to (re-)construct a DTD to describe the structure common to a given set of document instances. However, given a collection of documents with unknown DTDs, it may not be appropriate to construct a single DTD to describe every document in the collection. Instead, they would wish to partition the collection into smaller sets of "Similar" documents, and then induce a separate DTD for each such set. It is this partitioning problem that they address in this paper.

Download White Paper

By downloading you agree to our Terms and Conditions. These include information regarding use of your personal data.

Publisher
University of Michigan
File Format
PDF
Date Published
Dec 4, 2008
Format
White Papers
Topics
HTML, XML

Similiar White Papers

Using the SAS Output Delivery System and PROC TEMPLATE to Create XHTML Files

Using the SAS Output Delivery System and PROC TEMPLATE to Create XHTML Files

SAS 8.2 introduced the ODS MARKUP statement, allowing users to export to a variety of markup languages, including HTML,

Publisher: SAS Institute

Transforming Word Documents Into the XSL-FO Format

Transforming Word Documents Into the XSL-FO Format

Microsoft made customizing Microsoft Office Word documents much easier and simpler when it introduced a new XML file for

Publisher: Microsoft  |  Tags: data, microsoft office, office

ODS Markup, Tagsets, and Styles!: Taming ODS Styles and Tagsets

ODS Markup, Tagsets, and Styles!: Taming ODS Styles and Tagsets

This paper shows how ODS styles can be used in new ways when they are combined with tagsets. The paper shows how a simpl

Publisher: SAS Institute

Using an ODS Tagset to Create Distributable, Editable Data Islands

Using an ODS Tagset to Create Distributable, Editable Data Islands

This paper discusses how in a Windows environment HTML, Javascript, XML and some other native Windows functionality can

Publisher: SAS Institute  |  Tags: applications, data, pc, server

The Output Delivery System (ODS) From Scratch

The Output Delivery System (ODS) From Scratch

The Output Delivery System (ODS) was created to enable SAS customers to generate more reports with new features than cou

Publisher: SAS Institute  |  Tags: ascii, excel, pdf

University of Michigan White Papers

High-Bandwidth Video Conferencing Systems: When Is the Quality Worth the Cost?

High-Bandwidth Video Conferencing Systems: When Is the Quality Worth the Cost?

The marketing literature for videoconferencing products often makes recommendations for settings that will yield accepta

Publisher: University of Michigan  |  Tags: marketing

Rethinking Antivirus: Executable Analysis in the Network Cloud

Rethinking Antivirus: Executable Analysis in the Network Cloud

Antivirus software installed on each end host in an organization has become the de-facto security mechanism used to defe

Publisher: University of Michigan  |  Tags: antivirus, network, software

Low-Rate TCP-Targeted DoS Attack Disrupts Internet Routing

Low-Rate TCP-Targeted DoS Attack Disrupts Internet Routing

Compared to attacks against end hosts, Denial of Service (DoS) attacks against the Internet infrastructure such as those

Publisher: University of Michigan  |  Tags: infrastructure, ip, network, routers

The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets

The Zombie Roundup: Understanding, Detecting, and Disrupting Botnets

Global Internet threats are undergoing a profound transformation from attacks designed solely to disable infrastructure

Publisher: University of Michigan  |  Tags: data, infrastructure

ELSC: Scalable Linux Scheduling on a Symmetric Multi-Processor Machine

ELSC: Scalable Linux Scheduling on a Symmetric Multi-Processor Machine

Concerns about the scalability of multithreaded network servers running on Linux have prompted to investigate possible i

Publisher: University of Michigan  |  Tags: linux, network, os