User Tools

Site Tools


machinelearning_dynamic

Dynamic Model

Regression or Binary Classification

Regression and Binary Classification are very similar and use the same pages to be set up.

Regression takes a number of known factors (known as ‘features’) and from them establishes a predicted outcome.

Binary Classification takes a number of known factors (known as ‘features’) and from them establishes a predicted outcome of true or false.

Label and Features

Select a DSD to use for training. Only those with a purpose of “Machine Learning” will be shown.

Select the Data Group within the DSD that contains the training data.

The Label field will show all fields from the Data Group.

The Label is the numeric field that is the outcome. In training, the model will use the value of this field as the outcome based on the Features selected. Note that the Label cannot also be a Feature.

The Features list in the middle of the page will show the same fields.

Select the appropriate Label and Features.

If the model has been trained before, the right hand side of the page shows the metrics of that training. See below for more information on metrics.

Save the training settings and the click on “Train”

Train the Model

The only settings available are the duration for the training to run and whether to include PFI (Permutation Feature Importance).

The longer it runs, the more accurate the model is likely to be. Note that a duration of, say, 10 seconds will take longer to process as there is more work to do following the training execution.

PFI gives the relative contribution each feature makes to a prediction and so is useful when first trialling a model to see what features effective and which may be excluded.

Click Start to begin the training.

Metrics On completion metrics for the training will be shown.

Regression Metrics

There are two sets of metrics: the model metrics and Feature Performance.

  • MeanAbsoluteError – the average size of mistakes in predictions – the closer to zero the better
  • MeanSquaredError – how close predictions are to actual values - the closer to zero the better
  • RootMeanSquaredError – difference between predictions and actual - the closer to zero the better
  • RSquared – the accuracy of the predictions – the closer to one the better

Binary Classification Metrics

  • AreaUnderRocCurve – area under curve between true positive rate and false negative – the closer to one the better
  • Accuracy – proportion of correct predictions to total samples – the closer to one the better
  • PositivePrecision – proportion of correct positive predictions– the closer to one the better
  • PositiveRecall – proportion of correct positive predictions among positive instances - the closer to one the better
  • NegativePrecision – proportion of correct Negative predictions– the closer to one the better
  • NegativeRecall – proportion of correct Negative predictions among Negative instances - the closer to one the better
  • F1Score – a measure of quality - the closer to one the better
  • AreaUnderPrecisionRecallCurve – measure of success when classes are imbalanced - the closer to one the better

Multi-class Classification Metrics

  • LogLoss
  • LogLossReduction - measures the performance of a classification model where the prediction input is a probability value between 0.00 and 1.00 - the closer to zero the better
  • MacroAccuracy - The accuracy for each class is computed and the macro-accuracy is the average of these accuracies - the closer to one the better
  • MicroAccuracy - the fraction of instances predicted correctly - the closer to one the better
  • TopKAccuracy - the number of times where the correct label is among the top k labels predicted (ranked by predicted scores)

Permutation Feature Importance (PFI)

PFI values show the relative effect of each of the features used. They are displayed in order of effectiveness. If a feature has a very low value you may wish to remove it from the training settings and training again.

Test the Model

The test page lets you enter values for each feature and then click Start to get the model's prediction for your entries.