Data format
To use the information system, you need to prepare data in the form of Microsoft Excel (xls or xlsx) or CSV files (the first row contains column names, the field separator is ";", the separator between the integer and fractional parts is ".", and the encoding is UTF-8 with BOM). The format depends on the usage type of the file:
- for training (mandatory);
- for predictions (optional).
File format for training
The first row contains column names (and is not processed). The second and subsequent rows contain data (one object per row).
The first column contains the object name (usually a composition formula).
The second column contains the value of the objective function (a real number).
The third and subsequent columns contain a feature description of the object, interpreted as a vector of real numbers. The length of the vector must be the same for all objects.
File format for predictions
The first row contains column names (and is not processed).
The second column is empty (the values of the objective function are unknown).
The third and subsequent columns contain a feature description of the object, interpreted as a vector of real numbers. The length of the vector must be the same for all objects and must match the length of the vector in the training sample. The order of properties in the vector must be the same as in the training sample.