White Papers

Towards Combining Web Classification and Web Information Extraction: A Case Study

Overview Web content analysis often has two sequential and separate steps: Web Classification to identify the target Web pages, and Web Information Extraction to extract the metadata contained in the target Web pages. This decoupled strategy is highly ineffective since the errors in Web classification will be propagated to Web information extraction and eventually accumulate to a high level. This paper studies the mutual dependencies between these two steps and proposes to combine them by using a model of Conditional Random Fields (CRFs). This model can be used to simultaneously recognize the target Web pages and extract the corresponding metadata. Systematic experiments in the project Of-Course for online course search show that this model significantly improves the F1 value for both of the two steps.

Download White Paper

By downloading you agree to our Terms and Conditions. These include information regarding use of your personal data.

Publisher
Association for Computing Machinery
File Format
PDF
Date Published
May 1, 2009
Format
White Papers
Topics
Application Development, Web Content Management

Similiar White Papers

Best Practices for Building WEB Applications Using IBM Content Manager OnDemand Web Enablement Kit Java API's

Best Practices for Building WEB Applications Using IBM Content Manager OnDemand Web Enablement Kit Java API's

The Content Manager OnDemand Web Enablement Kit (ODWEK) Java API's provide a rich development environment for Java devel

Publisher: IBM  |  Tags: developers, java, server

Information Retrieval in Web2.0

Information Retrieval in Web2.0

This presentation discusses Information retrieval in web2.0.

Publisher: Hanze University Groningen

Web 2.0: The Potential of RSS and Location Based Services

Web 2.0: The Potential of RSS and Location Based Services

This presentation presents Web 2.0: The Potential of RSS and location based services.<!-- Application Development, Web C

Publisher: University of Bath  |  Tags: location based services, rss

Ektron CMS400.NET Instant Demo

Ektron CMS400.NET Instant Demo

Ektron CMS400.NET lets you do more than just what you need to do on the Web, it also lets you do everything you want to

Publisher: Ektron  |  Tags: intranet, management

Creating Higher Physician and Patient Satisfaction Using Online Channels -- the Memorial Health System Story

Creating Higher Physician and Patient Satisfaction Using Online Channels -- the Memorial Health System Story

As a healthcare provider, do you want to better recruit and retain physicians and clinicians? Do you want to increase yo

Publisher: IBM  |  Tags: marketing

Association for Computing Machinery White Papers

Managing ETL Processes

Managing ETL Processes

ETL tools allow the definition of sometimes complex processes to extract, transform, and load heterogeneous data into a

Publisher: Association for Computing Machinery  |  Tags: data, data integration, data warehouse, management

GPS-Free Node Localization in Mobile Wireless Sensor Networks

GPS-Free Node Localization in Mobile Wireless Sensor Networks

An important problem in mobile ad-hoc wireless sensor networks is the localization of individual nodes, i.e., each node'

Publisher: Association for Computing Machinery  |  Tags: gps, infrastructure, network

A Black-Box Approach for Web Application SLA

A Black-Box Approach for Web Application SLA

Web servers nowadays have to cope with unprecedented amounts of workload, due to increasing popularity and complexity; i

Publisher: Association for Computing Machinery  |  Tags: applications, server

Load Balancing for Multimedia Streaming in Heterogeneous Peer-to-Peer Systems

Load Balancing for Multimedia Streaming in Heterogeneous Peer-to-Peer Systems

Multimedia streaming of mostly user generated content is an ongoing trend, not only since the upcoming of Last.fm and Yo

Publisher: Association for Computing Machinery  |  Tags: user generated, user generated content, youtube

Multiobjective Network Design for Realistic Traffic Models

Multiobjective Network Design for Realistic Traffic Models

Network topology design problems find application in several real life scenarios. However, most designs in the past eith

Publisher: Association for Computing Machinery  |  Tags: network, realistic