User Tools

Site Tools


machinelearning

Machine Learning

Machine Learning (ML) is about finding the pattern between some input values and associated outcomes. If you know what’s happened before, it should be possible to predict what will hap[ned in similar circumstances.

fData is the fSeries module that deals with gathering data for use in outputs such as documents, spreadsheets and dashboards.

fData is therefore a perfect tool to get data to train ML models, and ML is then ideal to add predictive data to data gathered with fData.

There are five different types of machine learning models available in fSeries ML:

  • Regression – predicting a value based on a number of known factors.
  • Binary Classification – predicting one of two possible outcomes for known factors
  • Time Series – forecasting values based on time-based historical values
  • Sentiment – classifying a sentence or paragraph as positive/negative or true/false
  • Multiple classification – identifying a class based on text in a sentence or paragraph

Model Training

In each case, the process of model training and application to fData is similar:

  1. Define training data (known factors and their associated outcomes)
  2. Train the model
  3. Test/Refine the model
  4. Apply the model to fData DSDs

Define Training Data

Training data is gathered using a DSD created for the purpose.

The DSD must require no user input to be executed (the DSD will be run without asking for selection criteria), have a purpose of ‘Machine Learning’ (to make it available to the module), and have a designated data group that contains the training data.

The content of the training data group depends on the type of model. Details are provide in separate sections.

In the Training section of the Machine Learning module, you can specify the model type and all of the settings associated. These depend on the type of model. For example Regression requires a number of fields to be identified as the “features” (factors) of the model; Time Series allows for a value field and the fields that define the time series itself (e.g. date and index fields). Mode details are given other section sections.

Train the model

For each model type there is a training process. Some allow for additional settings such as the duration to run for or the periods to forecast. Training can take some time, depending on the training data available. Once training is complete, in some model types some metrics are shown to give some idea of how accurate the model is believed to be.

Test the model

For some model types there is a page where you can enter values and get a single prediction in order to test the accuracy of the model. Fill in the form and click Start to receive the test result.

If no tester page is available, use fData to test run the predictions.

Apply the model to fData DSDs

An fData Data Group may have any number of ‘predictions’ applied. In the fData designer use the ‘Predictions’ option to specify each model to be used again the data group.

In each case you will need to specify the fields to be used as the “features” (e.g. monthly values, or sentiment text).

The results of applying the model are applied to designated fields. Use data items to create fields of appropriate type. These are then set with the corresponding predicted values, and become part of the data groups records.

If you don’t want a prediction for every row of your data group (e.g. only calculate sentiments on row that don’t currently have one, or for a time series, only forecast future months, not all rows) user the “To Predict” field. The value of the field is checked and if present and its value is “true”, the prediction/forecast is applied; otherwise the row is unaffected.