White Papers
Latent Dirichlet Allocation in Web Spam Filtering
Overview Latent Dirichlet allocation (LDA) (Blei, Ng, Jordan 2003) is a method in information retrieval to model the content and topics of a collection of documents. This paper applies a modification of LDA, the novel multi-corpus LDA technique, for supervised webspam classification. They treat the web-corpus in site-level, creating a bag-of-words document for every site, and run LDA both on the collection of sites labeled as spam, and as non-spam. In this way spam and non-spam topics are created in the training phase. In the test phase they take the union of these topics, and an unseen site is deemed spam if their totals spam topic distribution is above a threshold.
- Publisher
- MTA SZTAKI
- File Format
- Date Published
- Jul 1, 2009
- Format
- White Papers
- Topics
- Spam - E-mail Fraud - Phishing, Network Security, Security Management
Similiar White Papers
Top five strategies for combating modern threats: Is anti-virus dead?
Today's fast, targeted, silent threats take advantage of the open network and new technologies that support an increasin
Gain a Competitive Advantage by Aligning Your IT Infrastructure with Business Objectives
This paper looks at what IT Security means to your company and how services can assist in the battle against the threats
Publisher: IBM
Sophos Email Security and Control - Free 30 Day Trial
Proactively block inbound and outbound threats with unrivaled effectiveness and simplicity, delivering high-capacity, hi
Publisher: Sophos
What is the (Real) Threat and How to Deal With It? A Route to Security as a Service
This paper looks at what IT Security means to your company and how services can assist in the battle against the threats
Publisher: IBM
Demystifying Web 2.0: Opportunities, Threats, Defenses
Every new technology introduced into the enterprise brings with it new threats. Web 2.0 is no different, with threats in
Publisher: Clearswift | Tags: downtime, social networking, spyware
MTA SZTAKI White Papers
Web Spam: A Survey With Vision for the Archivist
While Web archive quality is endangered by Web spam, a side effect of the high commercial value of top-ranked search-eng
Publisher: MTA SZTAKI | Tags: data, spam, unite
SpamRank - Fully Automatic Link Spam Detection: Work in Progress
Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. Thi
Publisher: MTA SZTAKI | Tags: spam
Semi-Supervised Learning: A Comparative Study for Web Spam and Telephone User Churn
This paper compares a wide range of semi-supervised learning techniques both for Web spam filtering and for telephone us
Publisher: MTA SZTAKI | Tags: network, social network, spam
Featured white papers
-
The Value of Location Intelligence in the Communications Industry
Public Services are under pressure, the challenge is to do more with less. How do you improve citizen satisfaction, increase cost efficiencies and improve service delivery? The power of location intelligence is helping many local authorities...
-
Best Practices for Translating Customer Satisfaction into Revenue
Today's support organisations are focused on two top-level metrics: financial results and customer satisfaction. For most, it's easy to track financial performance, but customer satisfaction is akin to speaking a foreign language...
-
HP print solutions and 3M
The objective for 3M was to optimize office printing infrastructure at 3M locations worldwide, reduce total cost and environmental footprint. Some of the business benefits acheived by switching to HP print solutions...
-
Check out these top business apps for your iPhone
-
Inside a Microsoft datacentre
-
Green IT without losing your edge
-
Peter Cochrane's latest video blog
-
What you need to know about Windows 7