Regression and Binary Classification are very similar and use the same pages to be set up.
Regression takes a number of known factors (known as ‘features’) and from them establishes a predicted outcome.
Binary Classification takes a number of known factors (known as ‘features’) and from them establishes a predicted outcome of true or false.
Select a DSD to use for training. Only those with a purpose of “Machine Learning” will be shown.
Select the Data Group within the DSD that contains the training data.
The Label field will show all fields from the Data Group.
The Label is the numeric field that is the outcome. In training, the model will use the value of this field as the outcome based on the Features selected. Note that the Label cannot also be a Feature.
The Features list in the middle of the page will show the same fields.
Select the appropriate Label and Features.
If the model has been trained before, the right hand side of the page shows the metrics of that training. See below for more information on metrics.
Save the training settings and the click on “Train”
The only settings available are the duration for the training to run and whether to include PFI (Permutation Feature Importance).
The longer it runs, the more accurate the model is likely to be. Note that a duration of, say, 10 seconds will take longer to process as there is more work to do following the training execution.
PFI gives the relative contribution each feature makes to a prediction and so is useful when first trialling a model to see what features effective and which may be excluded.
Click Start to begin the training.
Metrics On completion metrics for the training will be shown.
There are two sets of metrics: the model metrics and Feature Performance.
PFI values show the relative effect of each of the features used. They are displayed in order of effectiveness. If a feature has a very low value you may wish to remove it from the training settings and training again.
The test page lets you enter values for each feature and then click Start to get the model's prediction for your entries.