eggbird
software development and
knowledge engineering

Mail

ClaM - A Classification Manager

Indexing a classification

Indexer builds an index on the classes in the classification. You can choose which rubric kinds should be used for indexing. In the example below the words in rubric kind Exclusion are not considered for indexing.

Many words like 'the' are not relevant for searching, and hence could be left out the index. While indexing ClaM counts the number of occurences of a wordform in the corpus. On the basis of number of occurences, and relevance of a word the user may decide to exclude it. The word can be placed on the list of stopwords. The stopwords can be saved in a file for later re-use.

ClaM has facilities to index classifications with an external user maintained lexicon. Lexical tools are not part of the Clam tool itself. The lexical tools allow amongst others for defining synonyms and different word forms, which in turn can be assigned to a word in the classification. If 'Search actual search terms' is selected, the external term can be entered in the search field, where it is automatically being replaced by the actual word sought for in the classification.

The Export button lists all individual words in the classification sorted alpabetically. Please bear in mind that these exported words are encoded as UTF-8. Not all editors (e.g. Notepad on Windows XP) do support UTF-8 properly!