NEWS Children using Google's SafeSearch feature, designed to filter out links to websites with adult content, may be shielded from far more than their parents ever intended. A report released this week by the Harvard Law School's Berkman Center for Internet & Society says that SafeSearch excludes many innocuous web pages from search-result listings, including ones created by the White House, IBM, the American Library Association and clothing company Liz Claiborne. The omissions occur because of the way Google designed the feature, which can be enabled or disabled through a preferences page. The feature uses a proprietary algorithm that automatically analyses the pages and makes an educated guess, without intervention by Google employees. That technique reduces the cost of the SafeSearch service but it can lead to odd results. It's perhaps unlikely that many humans would have classified a BBC News report on East Timor, Mattel's site about its Scrabble game - the URL includes the word 'adults' - or the Nashville Public Library's teen health issues page as unsuitable for minors. Some articles from silicon.com and other titles in the CNET group of websites are also invisible to SafeSearch users. "If Google put some of its smart people on this task, they could do a much better job than they have so far," said Ben Edelman, the student fellow at the Berkman Center who performed the research. "They've got a lot of smart people. It would be shocking if their great engineers couldn't do better. The question is whether that's a priority for Google." Google admits that the thousands of innocuous sites listed by the Berkman Center's report are invisible to SafeSearch users. But the company challenged the methodology of the study, saying that some of the sites are missing because their webmasters employ a device called the 'robots.txt' file, which is designed to limit automated web crawlers in various ways. Such a file might, for example, ask web crawlers not to visit a certain area of the site because repeated visits would slow down the server considerably. Social etiquette dictates that crawlers should obey a robots.txt file. Google chooses not to include pages that use such files in SafeSearch listings because its crawler can't explore the entire site and thus, the company says, can't be expected to judge the site's content. Edelman said he was unaware of the robots.txt exclusion when he conducted the study, and revised his report on Thursday to include a discussion of the issue. The report was originally released on Wednesday. Edelman said only 11.3 per cent of the sites listed in his study are filtered because their webmasters created robots.txt files. Those include sites from IBM, Apple Computer, the City University of New York, Groliers and the Library of Congress. "It doesn't matter whether SafeSearch omits a site because the site has a robots.txt file or because SafeSearch is imperfect," Edelman said in an interview. "Either way, the site would have been relevant but disappears from results." Some of the thousands of non-pornographic sites without robots.txt files that are filtered include offerings from the Vermont Republican Party, the Stonewall Democrats of Austin, a UK government site on vocational training and the Pittsburgh Coalition Against Pornography. News sites take a hit too, with articles from Fox News, Wired News, The Baton Rouge Daily News and some web logs affected. Declan McCullagh writes for CNET News.com.
Google's porn filtering technology shabby, study says
"It would be shocking if their great engineers couldn't do better."
Post your comment
In order to post a comment you need to be registered and logged in.
You can also log in with Facebook. Log in or create your silicon.com account below
Latest Networks stories
Get silicon.com's daily newsletter
-

Enter your email to register
Featured white papers
-
WAN Optimization for Today and Tomorrow.
It was only a few years ago when the idea of mobile computing seemed like a distant reality. Many could see it coming,...
-
Six iPad tests for multimedia-grade Wi-Fi
Along with most companies, the University of Ottawa has seen a massive increase in the numbers of highly mobile...
-
Solution Brief: Top 5 Reasons to Choose Blue Coat WAN Optimization
There's a pretty good chance your wide area network (WAN) looks like a mess right now. The rapid adoption of new...
Popular Networks stories
Keep in touch with silicon.com
-
Connect with silicon.com on Facebook
Discuss the news of the day with the silicon.com team
-
Follow silicon.com on Twitter
Get regular updates from the silicon.com editors
-
Join the silicon.com LinkedIn networking group
Network with your peers and share expertise
Latest jobs
-
Architect Java, J2EE, Oracle, Spring London £55-65K
Java, J2EE, Oracle, PL/SQL, SQL, Spring, Struts, Maven, Swing Java, J2EE, Oracle My client a premiere...
-
Business Analyst ( ISEB, CBAP, BA, Analyst)
Business Analyst ( ISEB, CBAP, BA, Analyst) £31,000-£42,000 + excellent benefits We take the best Business...
-
Embedded C / MISRA C / DO178B - SouthCoast
I have just received instruction from a key client of mine for an Electronic Design Engineer in the Gloucestershire...
silicon.com newsletters
-
Stay up to date with silicon.com newsletters
Keep up with the latest news and analysis from silicon.com with our free email newsletters




