How to Prepare an Input Data File (part 5)
N - number of examples .............. max. 250
M - number of input attributes ....... max. 50
W - number of characters in an attribute name or value .... max. 30
The server can use four different delimiter types in input data files but in one input data file only one delimiter type may be used and its type must be selected when the data file is uploaded!
Three delimiter types are single character delimiters:
TAB (ASCII 09 decimal).
comma ',' (ASCII 44 decimal)
semicolon ';' (ASCII 59 decimal)
Their common property is that they allow space (ASCII 32 decimal) to be used in attribute names and in values of nominal attributes, although it is not a suggested practice. Also, when in input data file two of such delimiters occur immediately one after another, DMS assumes a missing value for that attribute.
The fourth delimiter type is multi-character delimiter consisting of one or more spaces including TABs which is the default delimiter type for this server. It means that any combination of spaces and TABs occurring one after other is one delimiter. In this situation no spaces (and TABs) are allowed in attribute names and values. Also, when using this type of delimiter, all missing values have to be specified using question mark (?), as explained before.
Delimiter may be used also after the last attribute value in a row (before Carriage Return or New Line signs). Every row must have at least M delimiters.
© 2001 LIS - Rudjer Boskovic Institute
Last modified: February 15 2018 22:13:29.