ClaM - A Classification Manager
Importing a classification
The import function is meant to be used for importing well defined structured text files, which must be character encoded as UTF-8 before import. The present version of ClaM supports several formats. The latest CEN EN14463:2007 (ClaML format ), and 'delimited text format'. The latter format needs to be parsed through a converter to transform it in the correct internal format.
- Select Import from the File menu.
- Select the file you want to import.
- Press the Open button.
NOTE: If you want to import a series (chapters) of files, give them a base name plus appropriate consecutive numbers/letters and use the wildcard characters * and ? to specify the importfilename(see below).
For example "F:\\sources\classifications\icd10-nl-2010-ch*.xml lets you import the chapters 1-22 of the ICD 10.
When your classification is not yet available in ClaML, you can use this Import function to read your classification into ClaM. It reads text files and has many options that can tune it to the way your classification is represented in the text. There are two basic input formats:
- Two column multi line input, where the first column contains the code, or the type of information (preferred, inclusion, etc). The second column holds the value of the item in column1
- Multicolumn single line input, where the column number determines the kind of information it containss. (e.g. col1=code, col2=preferred, col3=inclusion, etc)
In the two column form the source file can be parsed into ClaM in one single run. In the Code specification you have to fill in Field 1 and leave all the others blank or 0. If the code is in itself hierarchical, you can select that, which instructs ClaM to build the hierarchical relations at import. For the box Preferred (main) rubric, you fill 'preferred' as kind, and select Field number 2 for the value. For the lines after the line containing the Code 'preferred' is replaced by the mapping as defined in the box 'Other rubrics..'. This takes care for translating Innefattar from the source file into inclusion in the internal representation.
In the single line multicolumn input form you can import only one code rubric combination at a time. so more runs are necessary to build the classification. Codes and terms can both be represented in sources as multiple fields(=columns of your table). In converter you can select up to three fields to construct both code and term. If only one field is needed, leave the others blank or 0. You can separate the imported fields with a space by selecting the checkboxes between the three field number selectors. First specify which field(s) of your table contain the code. Next specify the field number(s) that contain the the preferred rubrics. If all is specified hit OK for the first run. Repeat this process until all information you need has been imported. The Code specification can remain the same in the subsequent runs. NOTE: Do take care to fill the box 'Preferred (main) Rubric' with the aproriate Rubrickind (inclusion, exclusion) and corresponding Field (column) number.
How to use Converter
- Specify the file name of the text file
- Give the name of the language for the classification
- If the text is in the OEM (DOS) character set, check OEM Font
- Specify the separator character used in the text file
- Tell which fields (columns) on each line contain the code. The code may be spread over more than one field, in such a case, the different parts of the code may be concatenated with or without a dot. Use the check boxes if you want a dot between two parts.
- Indicate the field containing the code of the parent, or use the check box Hierarchical coding when the codes indicate the hierarchical order.
- Some coding schemes use block codes, e.g. ICD-10. For such coding schemes specify the separator used in the block codes.
- Specify the kind of the main rubric in the text file. This is the rubric that is contained on the same line as the code. When this is the preferred rubric, you should use the kind preferred.
- The text for a rubric may be spread across more than one field. The different parts of the text may be concatenated with or without a space. Use the check boxes if you want a space between two parts.
- In most classifications there is more than one rubric per code. If the text file contains more rubrics per code, you can replace the code field with the (abbreviation of the) kind of rubric. Then, you have to specify the rubric kinds included in the text file and in which rubric kinds those should result in the classification, in the Other rubrics box.
The maximum length of a rubric the Converter can read is limited to 400 characters. During the conversion, the lines containing rubrics that are too long are reported. In such a case, you may split such rubrics over a number of lines. The first of these lines should contain in the code field the kind of rubric, and the remaining lines should contain the character '+' instead of the rubric kind.
Import of META tags
As of ClaM 7.30.03 the import of META tags at a class is supported. Select "Import meta information insted of rubric" on the input form. Import is similar as for Rubrics. Please bear in mind that a META tag can have only one value at one specific class.
Example Two column multi-line input
Suppose you want to convert the following file Example.txt:
A01;the rubric for A01 incl;an inclusion rubric for A01 incl;and one more inclusion rubric for A01 excl;here is an exclusion rubric for A01 +;spread across three +;lines A02;the rubric for A02
- the file name is Example.txt
- the language appears to be English
- The separator character is ;
- The first column contains the code.
- Let's suppose the code indicates the hierarchical order.
- The rubric on the same line as the code is the preferred rubric.
- The text file contains two other kinds of rubrics besides the preferred rubric, i.e. incl and excl. If you want to have these present in the classification as inclusion and exclusion, you should add the rules:
If code is: incl
then add a rubric: inclusion
If code is: excl
then add a rubric: exclusion
- Press the Ok button to start the conversion.
Import of XML and HTML tags
By default some characters like < and > are translated into a so-called web safe representation. This would make import of HTML and XML tags in otherwise plain text impossible. As of version 7.30.03 Clam supports the import of these tags. Just place a "\" before the < or >. For example: