Information Analysis Technologies
Emerald employs a myriad of tools used for the analysis and classification of data.
Of these tools two serve as a back bone for many of our needs.
Scan Bot:
Emerald's Scan Bot is used to analize web sites on a multitude
of levels. It does this using modular engines that can be activated individualy
to allow a conclusion to be reached based on which ever criteria is necessary. We
currently employ a Regular Expression engine, Script engine (dynamically loads C#
script files), Keyword engine, and Link analysis engine. Each engine is capable
of reaching their own decisions using internal scoring algorithms embedded in each
of them. The Scan Bot then uses its own core Result engine to determine the correct
category using all the gathered information. The Scan Bot uses these snap in modular
engines because it allows us to rapidly develop custom engines for our clients for
use with out ever changing the Scan Bot itself or other modules. This allows us
to be extremely flexible. Some engines currently in development include an OCR engine
and ActiveX control analysis engine. The Results of this process can be stored in
flat text files, xml files, SQL database, or an x-base database.
Site Review Tool:
The Site Review Tool (SRT for short) is used by technicians
to review sites and classify them when automated means cannot. The SRT is a self
contained web browser and voting tool. Users load a site list from disk or from
an XML web service. Users then view each site one a time voting on their categories.
The tools in this application include the ability to view the source for the page
being displayed automatically, the links this page references, and any included
information found by the Scan Bot. This tool is run on individual workstations by
users classifying sites. Sites categorized by the SRT are transmitted back to a
server where it is stored. The server side component of this software assigns each
technician a level of trust. This trust level is obtained by soliciting the same
site to mulitple technicians early in their usage history and verifing they are
all reaching a consensus. Until a technicians trust level is high enough any site
they categorize will need to also be categorized by atleast 1 other technican or
as many as 3 depending on each of their trust levels. If a consensus is not reached
the site will be set for review by a technician in the top tier of the trust system.