Why choose Emerald Shield's domain list

Our category list is updated every 15 minutes of every day.

About our list:

Emerald's categorized list's consist of domain names, not URL’s. We chose to use domains rather than URL’s because we wanted to rate the domains overall intent. For example CNN’s site contains sport, finance, and entertainment information. But the overall intent of the site is to provide news.

We have been crawling sites for our spam filter, and other internal product uses since 2001.  No other company has the history of crawling these types of sites for the exlicit purpose of categorizing them.  We started using our Stop And Dig technology for our spam filter before most spam companies even looked at the URL in the body.

Publicly available lists claim to have millions of unique sites in them; in fact most do not. We have merged the complete DMOZ database and found it only to contain 1.2 million unique domains. Not the 5+ million sites they claim. Lots of these can be explained by free hosting sites (thousands of “sites” may be on that one domain), and blogs (which generally have lots of blogs “sites” hosted at their location). We also merged down one category of the DMOZ for a client and found that 8% of the domains had expired and were purchased for domain parking schemes.

Porn sites and other expired domain sales locations often target domains that are contained in the DMOZ because they know they will get traffic. The most recent trend is for “parked domains”. They purchase domains that are common misspellings or expired domains and place a parked page there with advertising. They in turn hope that the user will click on the ad, or continue the search using their page. They make money from referrals to search engines and ad placements.

Recrawl every 45 days

We at Emerald are attempting to expire and re-crawl the domains in our database every 45 days at a minimum. Some domains get re-crawled faster (if they returned an error code when we last crawled them). We are also working on new systems to allow us to detect when a domain has changed owners, or server locations. This will allow us to more rapidly detect changes in domains and get them re-classified. Emerald currently has 1,960,620 domains (Jan 2008) in our whitelisted categories and 1,456,540 domains (Jan 2008) in our blacklisted categories.

We currently spend a large part of our time crawling Domain Kiting sites - Sites that will live less than 4 days.  They are "reserved" at a registrar, then spam or some other mechanism is used to refer to the domain. Then domain is never paid for, and the registrar drops it after 4 days.  We have over 8 million domains that are currently inactive, but have been used like this in the past year alone.

We have several partner companies that use our lists as the power behind their products already.  They trust our list for their mission critical systems.  Over 2 million users today utilize part of our technology through these partners.

Trapping more malicious sites

In mid 2007 we added another layer to our scanning technology.  We started scanning for malicious software by downloading all the javascript, EXEs, ZIPs, etc we find on a site.  We then run those files through two different antivirus engines.  If we find the site is deploying malicious software it is immediately added to the illegal activities category and pulled from any other categories.  This new technique has resulted in us finding over 15,000 websites that distributed virus, trojan, and other malicious software to users.  Many of these sites looked legitimate.  Video codec downloads, movie viewers, etc were all offered to get users to install their bad software.

 

News

RSS Newsfeed offline

Website updates are in progress. 

Not all of the content on this new site is complete.  If you have questions please contact us for more information.

Uncomplicated solutions for categorized URLs

Technology at work

We believe that making technologies that are easy to deploy and manage are essential to our partners success.