White Papers

Latent Dirichlet Allocation in Web Spam Filtering

Category: Security

Tags: union, spam

Overview Latent Dirichlet allocation (LDA) (Blei, Ng, Jordan 2003) is a method in information retrieval to model the content and topics of a collection of documents. This paper applies a modification of LDA, the novel multi-corpus LDA technique, for supervised webspam classification. They treat the web-corpus in site-level, creating a bag-of-words document for every site, and run LDA both on the collection of sites labeled as spam, and as non-spam. In this way spam and non-spam topics are created in the training phase. In the test phase they take the union of these topics, and an unseen site is deemed spam if their totals spam topic distribution is above a threshold.

Download White Paper

By downloading you agree to our Terms and Conditions. These include information regarding use of your personal data.

Publisher
MTA SZTAKI
File Format
PDF
Date Published
Jul 1, 2009
Format
White Papers
Topics
Spam - E-mail Fraud - Phishing, Network Security, Security Management

Similiar White Papers

Top five strategies for combating modern threats: Is anti-virus dead?

Top five strategies for combating modern threats: Is anti-virus dead?

Today's fast, targeted, silent threats take advantage of the open network and new technologies that support an increasin

Publisher: Sophos  |  Tags: email, malware, network

Gain a Competitive Advantage by Aligning Your IT Infrastructure with Business Objectives

Gain a Competitive Advantage by Aligning Your IT Infrastructure with Business Objectives

This paper looks at what IT Security means to your company and how services can assist in the battle against the threats

Publisher: IBM

Sophos Email Security and Control - Free 30 Day Trial

Sophos Email Security and Control - Free 30 Day Trial

Proactively block inbound and outbound threats with unrivaled effectiveness and simplicity, delivering high-capacity, hi

Publisher: Sophos

What is the (Real) Threat and How to Deal With It? A Route to Security as a Service

What is the (Real) Threat and How to Deal With It? A Route to Security as a Service

This paper looks at what IT Security means to your company and how services can assist in the battle against the threats

Publisher: IBM

Demystifying Web 2.0: Opportunities, Threats, Defenses

Demystifying Web 2.0: Opportunities, Threats, Defenses

Every new technology introduced into the enterprise brings with it new threats. Web 2.0 is no different, with threats in

Publisher: Clearswift  |  Tags: downtime, social networking, spyware

MTA SZTAKI White Papers

Web Spam: A Survey With Vision for the Archivist

Web Spam: A Survey With Vision for the Archivist

While Web archive quality is endangered by Web spam, a side effect of the high commercial value of top-ranked search-eng

Publisher: MTA SZTAKI  |  Tags: data, spam, unite

SpamRank - Fully Automatic Link Spam Detection: Work in Progress

SpamRank - Fully Automatic Link Spam Detection: Work in Progress

Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. Thi

Publisher: MTA SZTAKI  |  Tags: spam

Semi-Supervised Learning: A Comparative Study for Web Spam and Telephone User Churn

Semi-Supervised Learning: A Comparative Study for Web Spam and Telephone User Churn

This paper compares a wide range of semi-supervised learning techniques both for Web spam filtering and for telephone us

Publisher: MTA SZTAKI  |  Tags: network, social network, spam