White Papers

Web Spam: A Survey With Vision for the Archivist

Category: Data Management, Security

Tags: unite, spam, data

Overview While Web archive quality is endangered by Web spam, a side effect of the high commercial value of top-ranked search-engine results, so far Web spam filtering technologies are rarely used by Web archivists. This paper makes the first attempt to disseminate existing methodology and envision a solution for Web archives to share knowledge and unite efforts in Web spam hunting. It surveys the state of the art in Web spam filtering illustrated by the recent Web spam challenge data sets and techniques and describe the filtering solution for archives envisioned in the LiWA - Living Web Archives project.

Download White Paper

By downloading you agree to our Terms and Conditions. These include information regarding use of your personal data.

Publisher
MTA SZTAKI
File Format
PDF
Date Published
Jun 30, 2009
Format
White Papers
Topics
Spam - E-mail Fraud - Phishing, Data Mining - Analysis, Security Management

MTA SZTAKI White Papers

Latent Dirichlet Allocation in Web Spam Filtering

Latent Dirichlet Allocation in Web Spam Filtering

Latent Dirichlet allocation (LDA) (Blei, Ng, Jordan 2003) is a method in information retrieval to model the content and

Publisher: MTA SZTAKI  |  Tags: spam, union

SpamRank - Fully Automatic Link Spam Detection: Work in Progress

SpamRank - Fully Automatic Link Spam Detection: Work in Progress

Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. Thi

Publisher: MTA SZTAKI  |  Tags: spam

Semi-Supervised Learning: A Comparative Study for Web Spam and Telephone User Churn

Semi-Supervised Learning: A Comparative Study for Web Spam and Telephone User Churn

This paper compares a wide range of semi-supervised learning techniques both for Web spam filtering and for telephone us

Publisher: MTA SZTAKI  |  Tags: network, social network, spam