Other information for new users
- Expert knowledge and experience about the problem (domain) is important both during the data preparation and the model analysis phase. Data mining is typically an iterative process in which results of one iteration suggest to the user to either change set of input attributes or to change the set of cases used for induction. Sometimes even the target attribute may be changed. Induced models can be directly used for prediction purposes but often only their interpretation by domain experts may give new relevant knowledge about the domain.
- During model induction the only source of information about the domain are
cases submitted by the user. If the quality of the induced models is not
satisfactory, results in the next iteration can be improved by adding cases
from the domain subspace for which previous models did not work well. In our
'smoker' problem it means that if we are not satisfied with model
SMOKER IF SEX is equal male AND INCOME is less than 15000
because we know that women smoke as well, it is possible to force the server to look for better models so to add examples about female smokers and male non-smokers. The other possibility, in case when we do not like the input attribute 'sex' as model descriptor, is to temporary disable this attribute and force the server to look for models that do not use this attribute.
- If the target attribute has more than two classes, positive examples may be examples of more than one attribute class but there must always remain some negative examples. Multiclass decision problems can be transformed to a series of two-class decision problems in a few ways. This server can be then used to solve these sub-problems. If the target attribute has continuous numerical representation, then this server can not be used. The only possibility is to classify the values (like small, medium, large) and then to use the server for modeling these classes.
- Except the data file, no registration or other information is needed at the basic level (induction of confirmation rules and noise detection). The service is available for all users on the network and the only restriction is the size of the data file (up to 250 examples with up to 50 attributes).
© 2001 LIS - Rudjer Boskovic Institute
Last modified: December 09 2018 12:50:13.