White Papers
Towards Combining Web Classification and Web Information Extraction: A Case Study
Category: Software and Web Development
Tags: metadata
Overview Web content analysis often has two sequential and separate steps: Web Classification to identify the target Web pages, and Web Information Extraction to extract the metadata contained in the target Web pages. This decoupled strategy is highly ineffective since the errors in Web classification will be propagated to Web information extraction and eventually accumulate to a high level. This paper studies the mutual dependencies between these two steps and proposes to combine them by using a model of Conditional Random Fields (CRFs). This model can be used to simultaneously recognize the target Web pages and extract the corresponding metadata. Systematic experiments in the project Of-Course for online course search show that this model significantly improves the F1 value for both of the two steps.
- Publisher
- Association for Computing Machinery
- File Format
- Date Published
- May 1, 2009
- Format
- White Papers
- Topics
- Application Development, Web Content Management
Similiar White Papers
Best Practices for Building WEB Applications Using IBM Content Manager OnDemand Web Enablement Kit Java API's
The Content Manager OnDemand Web Enablement Kit (ODWEK) Java API's provide a rich development environment for Java devel
Publisher: IBM | Tags: developers, java, server
Information Retrieval in Web2.0
This presentation discusses Information retrieval in web2.0.
Publisher: Hanze University Groningen
Web 2.0: The Potential of RSS and Location Based Services
This presentation presents Web 2.0: The Potential of RSS and location based services.<!-- Application Development, Web C
Publisher: University of Bath | Tags: location based services, rss
Ektron CMS400.NET Instant Demo
Ektron CMS400.NET lets you do more than just what you need to do on the Web, it also lets you do everything you want to
Publisher: Ektron | Tags: intranet, management
Creating Higher Physician and Patient Satisfaction Using Online Channels -- the Memorial Health System Story
As a healthcare provider, do you want to better recruit and retain physicians and clinicians? Do you want to increase yo
Association for Computing Machinery White Papers
Managing ETL Processes
ETL tools allow the definition of sometimes complex processes to extract, transform, and load heterogeneous data into a
Publisher: Association for Computing Machinery | Tags: data, data integration, data warehouse, management
GPS-Free Node Localization in Mobile Wireless Sensor Networks
An important problem in mobile ad-hoc wireless sensor networks is the localization of individual nodes, i.e., each node'
Publisher: Association for Computing Machinery | Tags: gps, infrastructure, network
A Black-Box Approach for Web Application SLA
Web servers nowadays have to cope with unprecedented amounts of workload, due to increasing popularity and complexity; i
Publisher: Association for Computing Machinery | Tags: applications, server
Load Balancing for Multimedia Streaming in Heterogeneous Peer-to-Peer Systems
Multimedia streaming of mostly user generated content is an ongoing trend, not only since the upcoming of Last.fm and Yo
Publisher: Association for Computing Machinery | Tags: user generated, user generated content, youtube
Multiobjective Network Design for Realistic Traffic Models
Network topology design problems find application in several real life scenarios. However, most designs in the past eith
Publisher: Association for Computing Machinery | Tags: network, realistic
Featured white papers
-
The Value of Location Intelligence in the Communications Industry
Public Services are under pressure, the challenge is to do more with less. How do you improve citizen satisfaction, increase cost efficiencies and improve service delivery? The power of location intelligence is helping many local authorities...
-
Best Practices for Translating Customer Satisfaction into Revenue
Today's support organisations are focused on two top-level metrics: financial results and customer satisfaction. For most, it's easy to track financial performance, but customer satisfaction is akin to speaking a foreign language...
-
HP print solutions and 3M
The objective for 3M was to optimize office printing infrastructure at 3M locations worldwide, reduce total cost and environmental footprint. Some of the business benefits acheived by switching to HP print solutions...
-
Check out these top business apps for your iPhone
-
Inside a Microsoft datacentre
-
Green IT without losing your edge
-
Peter Cochrane's latest video blog
-
What you need to know about Windows 7