Overall system's architecture

Typically, an expert system has two complementary ways of interaction with users: the data acquisition mode and the query mode. In the first mode, the system interacts with expert users in order to modify the rules and facts of its knowledge base. In the second, the system provides answers to non-expert's questions. Accordingly, we structured Araucaria into three distributed applications:

    1. A main program that contains the knowledge base and the routines to perform both unsupervised and supervised classifications
    2. A remote application to allow experts to create and modify the system's knowledge base corresponding to expert-defined vegetation classes;
    3. An end-user client application to submit vegetation relevés to be classified by the system. Two distinct end-user client applications currently exist for the Araucaria system: the relevé editor QUERCUS of the VEGANA package and the web server of the Biodiversity Data Bank of Catalonia.


Data acquisition mode

The knowledge base of Araucaria contains two types of items: database sections, which are simply containers of a relevé data, and classification areas, each one containing a set of related PCM clusters. Such knowledge compartments are remotely accessed and modified by vegetation experts, using the remote expert application. Each database section or classification area conceptually matches a high-level vegetation unit (e.g. defined physiognomically or corresponding to a high-level syntaxon). Dividing the knowledge base into compartments allows restricting management tasks to a specific scientist or group of scientists.

The relevé data import procedure incorporates a data quality checking functions (e.g. homogenizing nomenclature or deleting species entries marked as doubtful). Once relevés are imported to the system they are stored in database sections. The system provides an easy-to-use tool that allows experts to configure new classification areas as described in the main text. The set of relevés taken from database sections and used in each classification area is called the training set. The configuration tool facilitates creating PCM clusters by looking for relevés suitable as cluster seeds and facilitating the "growing" of the cluster unit as described in the main text. Each PCM cluster is finally accepted or rejected by the vegetation expert. The system warns the expert user when two clusters become nested or show a certain amount of overlap.

Query mode

The information flow in a relevé classification query is fairly simple. First, a relevé table must be available, by loading it from the user's file system or by retrieval from a relevé data bank. The user then submits his/her relevé table through the internet as a query. The main program receives the table and applies the same checking protocol as described above for entering new relevé data to the system. Relevés in this checked relevé table are then classified and the main program returns a response to the client application.
The system's classification procedure involves the following steps:

    1. For each relevé, compute its distance to each PCM cluster in each classification area, and determine which are the four closest clusters (which dramatically reduces the number of clusters to be sent back to the user).
    2. Build a common set of PCM clusters by pooling the lists of closest clusters determined for each relevé.
    3. Calculate the relative membership of each relevé to all the clusters in this set. The cluster for which a given relevé shows the highest relative membership (i.e. the lowest distance) is the most probable unit.

System's response includes the table of relative memberships, as well as matrices containing distances to clusters and cluster typicalities, and a written report.Cited PCM clusters are returned accompanied with a description of the represented expert vegetation unit. Such description may include a syntaxon (if any), as well as a vector of species fidelity values. The latter are computed using a fuzzy analogue of the phi coefficient of association and the training relevé set of the clustering area.