How Evaluate Ensemble Stacking Learning

How to develop a Stacking Ensemble for Deep Learning Neural Networks in Python with Keras
- Tutorial Overview
  - Stacked Generalization Ensemble
  - Multi-Form Classfication Problem
  - Multilayer Perceptron Model
  - Train and Relieve Sub-Models
  - Separate Stacking Model
- Integrated Stacking Model
- Extensions
- Further Reading

How to develop a Stacking Ensemble for Deep Learning Neural Networks in Python with Keras

Model averaging is an ensemble technique where multiple sub-models contribute every bit to a combined prediction

Model averaging can be improved past weighting the contributions of each sub-model to the combined prediction by the expected operation of the submodel. This can exist extended farther by training an entirely new model to learn how to best combine the contributions from each submodel. This approach is called stacked generalization, or stacking for short, and can result in better predictive operation than any single contributing model.

In this tutorial, you will detect how to develop a stacked generalization ensemble for deep learning neural networks.

After completing this tutorial, yous will know:

Stacked generalization is an ensemble method where a new model learns how to best combine the predictions from multiple existing models.
How to develop a stacking model using neural networks as a submodel and a scikit-learn classifier as the meta-learner.
How to develop a stacking model where neural network sub-models are embedded in a larger stacking ensemble model for training and prediction.

Tutorial Overview

This tutorial is divided into six parts, they are:

Stacked Generalization ensemble
Multi-Class Classification Problem
Multilayer Perceptron Model
Train and Relieve Sub-model
Separate Stacking Model
Integrated Stacking Model

Stacked Generalization Ensemble

A model averaging ensemble combines the predictions from multiple trained models.

A limitation of this approach is that each model contributes the aforementioned amount to the ensemble prediction, regardless of how well the model performed. A variation of this approach, called a weighted boilerplate ensemble, weighs the contribution of each ensemble member by the trust or expected performance of the model on a holdout dataset. This allows well-performing models to contribute more than and less-well-performing models to contribute less. The weighted average ensemble provides an improvement over the model average ensemble.

A further generalization of this arroyo is replacing the linear weighted sum (e.yard. linear regression) model used to combine the predictions of the sub-models with any learning algorithm. This approach is chosen stacked generalization, or stacking for short.

In stacking, an algorithm takes the outputs of sub-models as input and attempts to acquire how to best combine the input predictions to make a better output prediction.

It may be helpful to think of the stacking procedure equally having two levels: level 0 and level 1

Level 0: The level 0 data is the grooming dataset inputs and level 0 models larn to make predictions from this data
Level one: The level i information takes the output of the level 0 model due south as input and the unmarried level 1 model, or meta-learner, learns to make predictions from this data

Unlike a weighted boilerplate ensemble, a stacked generalization ensemble tin apply the set of predictions as a context and conditionally decide to weigh the input predictions differently, potentially resulting in meliorate operation.

Interestingly, although stacking is described equally an ensemble learning method with 2 or more level 0 models, information technology tin exist used in the case where there is only a single level 0 model. In this case, the level one, or meta-learner, model learns to correct the predictions from the level 0 model.

It is important that the meta-learner is trained on a separate dataset to the examples used to railroad train the level 0 models to avert overfitting.

A simple manner that this can exist achieved is by splitting the grooming dataset into a train and validation fix. The level 0 models are and then trained on the train ready. The level 1 model is then trained using the validation set, where the raw inputs are outset fed through the level 0 models to get predictions that are used as inputs to the level i model.

A limitation of the concord-out validation ready approach to training a stacking model is that level 0 and level i models are not trained on the total dataset.

A more sophisticated approach to grooming a stacked model involves using k-fold cantankerous-validation to develop the grooming dataset for the meta-learner model. Each level 0 model is trained using k-fold cantankerous-validation (or fifty-fifty exit-one-out cross-validation for maximum effect); the models are then discarded, but the predictions are retained. This means for each model, there are predictions made past a version of the model that was non trained on those examples, eastward.g. like having holdout examples, only in this case for the entire training dataset.

The predictions are and then used as inputs to train the meta-learner. Level 0 models are then trained on the entire preparation dataset and together with the meta-learner, the stacked model tin can be used to make predictions on new data.

In do, it is common to use different algorithms to prepare each of the level 0 models, to provide a diverse set of predictions.

It is besides common to utilize a unproblematic linear model to combine the predictions. Because use of a linear model is mutual, stacking is more recently referred to as "model blending" or only "blending," particularly in machine learning competitions.

A stacked generalization ensemble can exist developed for regression and classification problems. In the case of nomenclature problems, better results have been seen when using the prediction of class probabilities equally input to the meta-learner instead of course labels.

Now that we are familiar with stacked generalization, we can piece of work through a case study of developing a stacked deep learning model.

Multi-Class Classfication Problem

Nosotros volition use a minor multi-class classification problem as the footing to demonstrate the stacking ensemble.

The Scikit-acquire form provides the make_blobs() function

The problem has two input variables (to stand for the ten and y coordinates of the points) and a standard deviation of 2.0 for points within each group. We will use the same random state (seed for the pseudorandom number generator to ensue that we always get the same information points

                                  from                  sklearn.datasets                  import                  make_blobs                  from                  matplotlib                  import                  pyplot                  as                  plt                  import                  pandas                  as                  pd                  # generate 2nd classification dataset                                    X                  ,                  y                  =                  make_blobs                  (                  n_samples                  =                  1000                  ,                  centers                  =                  3                  ,                  n_features                  =                  two                  ,                  cluster_std                  =                  2                  ,                  random_state                  =                  two                  )                  # besprinkle plot, dots colored by class value                                    df                  =                  pd                  .                  DataFrame                  (                  dict                  (                  ten                  =                  X                  [:,                  0                  ],                  y                  =                  X                  [:,                  1                  ],                  characterization                  =                  y                  ))

The results are the input and output elements of a dataset that we tin can model.

In guild to become a feeling for the complication of the problem, we tin can graph each point on a two-dimensional besprinkle plot and color each indicate by class value.

                                  colors                  =                  {                  0                  :                  'red'                  ,                  1                  :                  'bluish'                  ,                  ii                  :                  'greenish'                  }                  fig                  ,                  ax                  =                  plt                  .                  subplots                  ()                  grouded                  =                  df                  .                  groupby                  (                  'label'                  )                  for                  primal                  ,                  group                  in                  grouded                  :                  group                  .                  plot                  (                  ax                  =                  ax                  ,                  kind                  =                  'besprinkle'                  ,                  x                  =                  'x'                  ,                  y                  =                  'y'                  ,                  characterization                  =                  key                  ,                  color                  =                  colors                  [                  key                  ])                  plt                  .                  show                  ()

plot

Running the example creates a scatter plot of the unabridged dataset. We can see that the standard deviation of two.0 means that the classes are not linearly separable (separable by a line) causing many ambiguous points.

This is desirable as information technology means that the trouble is not-fiddling and will permit a neural network model to find many dissimilar "adept enough" candidate solutions, resulting in a high variance.

Multilayer Perceptron Model

Before we define a model, we need to contrive a trouble that is advisable for the stacking ensemble.

In our problem, the training dataset is relatively small. Specifically, there is a 10:i ratio of examples in the training dataset to the holdout dataset. This mimics a situation where we may accept a vast number of unlabeled examples and a small number of labeled examples with which to railroad train a model.

Nosotros will create i,100 data points from the blobs problem. The model volition exist trained on the kickoff 100 points and the remaining 1,000 will be held back in a test dataset, unavailable to the model.

The trouble is a multi-form nomenclature problem, and we volition model it using a softmax activation function on the output layer. This means that the model will predict a vector with three elements with the probability that the sample belongs to each of the three classes. Therefore, we must one hot encode the class values before nosotros split up the rows into the train and test datasets. We can do this using the Keras to_categorical() function.

                                  # apply PlaidML as backend intead of default TensorFlow,  # so that can utilize the power of MacBook Pro'south AMD GPU                                    import                  bone                  os                  .                  environ                  [                  "KERAS_BACKEND"                  ]                  =                  "plaidml.keras.backend"

                                  # import the modules                                    from                  keras.utils                  import                  to_categorical                  from                  keras.models                  import                  Sequential                  from                  keras.layers                  import                  Dense

                Using plaidml.keras.backend backend.

                                  # generate 2d clasification dataset                                    X                  ,                  y                  =                  make_blobs                  (                  n_samples                  =                  1100                  ,                  centers                  =                  3                  ,                  n_features                  =                  2                  ,                  cluster_std                  =                  two                  ,                  random_state                  =                  2                  )                  # use one-hot encoding                                    y                  =                  to_categorical                  (                  y                  )                  # carve up railroad train examination gear up                                    n_train                  =                  100                  X_train                  ,                  X_test                  =                  X                  [:                  n_train                  ,                  :],                  10                  [                  n_train                  :,                  :]                  y_train                  ,                  y_test                  =                  y                  [:                  n_train                  ],                  y                  [                  n_train                  :]                  print                  (                  f"The shape of X train gear up is                                    {                  X_train                  .                  shape                  }                  , the shape of X examination set is                                    {                  X_test                  .                  shape                  }                  ."                  )

                The shape of X train set is (100, 2), the shape of X exam gear up is (1000, 2).

Adjacent, we tin ascertain and combine the model.

The model volition expect samples with 2 input variables. The model so has a single hidden layer with 25 nodes and a rectified linear activation function, then an output layer with iii nodes to predict the probability of each of the three classes and a softmax activation part.

Because the problem is multi-class, we will use the categorical cross entropy loss function to optimize the model and the efficient Adam flavour of stochastic gradient desent

                                  # ascertain model                                    model                  =                  Sequential                  ()                  model                  .                  add                  (                  Dense                  (                  25                  ,                  input_dim                  =                  2                  ,                  activation                  =                  'relu'                  ))                  model                  .                  add                  (                  Dumbo                  (                  3                  ,                  activation                  =                  'softmax'                  ))                  model                  .                  compile                  (                  loss                  =                  'categorical_crossentropy'                  ,                  optimizer                  =                  'adam'                  ,                  metrics                  =                  [                  'accurateness'                  ])

                INFO:plaidml:Opening device "metal_amd_radeon_pro_560x.0"

The model is fit for 500 training epochs and nosotros volition evaluate the model each epoch on the test fix, using the exam ready as validation set

                                  # fit the model                                    history                  =                  model                  .                  fit                  (                  X_train                  ,                  y_train                  ,                  validation_data                  =                  (                  X_test                  ,                  y_test                  ),                  epochs                  =                  150                  ,                  verbose                  =                  1                  )

                Epoch 150/150 100/100 [==============================] - 0s 1ms/footstep - loss: 0.4755 - acc: 0.7900 - val_loss: 0.5328 - val_acc: 0.7800

At the end of the run, we volition evaluate the performance of the model on the train and examination sets.

                                  # evaluate the model                                    _                  ,                  train_acc                  =                  model                  .                  evaluate                  (                  X_train                  ,                  y_train                  ,                  verbose                  =                  one                  )

                100/100 [==============================] - 0s 450us/footstep

                                  # evaluate the model                                    _                  ,                  test_acc                  =                  model                  .                  evaluate                  (                  X_test                  ,                  y_test                  ,                  verbose                  =                  1                  )

                grand/1000 [==============================] - 0s 112us/step

                                  print                  (                  f"The train accuracy is                                    {                  train_acc                  }                  , and the exam accuracy is                                    {                  test_acc                  }                  ."                  )

                The railroad train accuracy is 0.79, and the test accurateness is 0.78.

So finally, we will plot learning curves of the model accuracy over each training epoch on both the training and validation datasets.

                                  # learning curves of model accurateness                                    plt                  .                  plot                  (                  history                  .                  history                  [                  'acc'                  ],                  label                  =                  'train'                  )                  plt                  .                  plot                  (                  history                  .                  history                  [                  'val_acc'                  ],                  label                  =                  'exam'                  )                  plt                  .                  legend                  ()                  plt                  .                  prove                  ()

plot_2

Running the example starting time prints the shape of each dataset for confirmation, so the performance of the final model on the railroad train and test datasets.

Your specific results will vary (by design!) given the high variance nature of the model.

In this case, nosotros can see that the model accomplished about 78% accuracy on the training dataset, which nosotros know is optimistic, and about 73.9% on the examination dataset, which we would await to be more realistic.

Nosotros can now expect at using instances of this model as office of a stacking ensemble.

Train and Save Sub-Models

To keep this case simple, nosotros will apply multiple instances of the same model as level-0 or sub-models in the stacking ensemble.

Nosotros volition likewise utilise a holdout validation dataset to train the level-1 or meta-learner in the ensemble.

A more than advanced example may use different types of MLP models (deeper, wider, etc.) every bit sub-models and train the meta-learner using one thousand-fold cross-validation

In this section, we volition train multiple sub-models and save them to file for later utilize in our stacking ensembles.

The outset step is to create a function that will ascertain and fit an MLP model on the training dataset.

                                  # fit model on dataset                                    def                  fit_model                  (                  X_train                  ,                  y_train                  ):                  # define the model                                    model                  =                  Sequential                  ()                  model                  .                  add together                  (                  Dense                  (                  25                  ,                  input_dim                  =                  2                  ,                  activation                  =                  'relu'                  ))                  model                  .                  add                  (                  Dense                  (                  3                  ,                  activation                  =                  'softmax'                  ))                  model                  .                  compile                  (                  loss                  =                  'categorical_crossentropy'                  ,                  optimizer                  =                  'adam'                  ,                  metrics                  =                  [                  'accuracy'                  ])                  # fit the model                                    model                  .                  fit                  (                  X_train                  ,                  y_train                  ,                  epochs                  =                  150                  ,                  verbose                  =                  0                  )                  return                  model

Next, we can create a sub-directory to shop the models.

Annotation, if the directory already exists, you lot may have to delete information technology when re-running this code.

                                  from                  os                  import                  makedirs                  makedirs                  (                  'models'                  )

Finally, we can create multiple instances of the MLP and save each to the "models/" subdirectory with a unique filename.

In this instance, nosotros will create five sub-models, but you can experiment with a different number of models and see how it impacts model performance.

                                  # fit and save models:                                    n_members                  =                  five                  for                  i                  in                  range                  (                  n_members                  ):                  # fit model                                    model                  =                  fit_model                  (                  X_train                  ,                  y_train                  )                  # save model                                    filename                  =                  'models/model_'                  +                  str                  (                  i                  +                  one                  )                  +                  '.h5'                  model                  .                  save                  (                  filename                  )                  print                  (                  f"[INFO]>>Save                                    {                  filename                  }                  ."                  )

                INFO:plaidml:Opening device "metal_amd_radeon_pro_560x.0" [INFO]>>Save models/model_1.h5. [INFO]>>Save models/model_2.h5. [INFO]>>Save models/model_3.h5. [INFO]>>Salve models/model_4.h5. [INFO]>>Save models/model_5.h5.

                total 280 -rw-r--r--  1 johnnylu  staff  27936 Jan 16 12:forty model_1.h5 -rw-r--r--  i johnnylu  staff  27936 Jan xvi 12:40 model_2.h5 -rw-r--r--  one johnnylu  staff  27936 Jan 16 12:40 model_3.h5 -rw-r--r--  i johnnylu  staff  27936 Jan 16 12:41 model_4.h5 -rw-r--r--  ane johnnylu  staff  27936 Jan 16 12:41 model_5.h5

Dissever Stacking Model

We tin can at present train a meta-learner that will best combine the predictions from the sub-models and ideally perform amend than any single sub-model.

The offset pace is to load the saved models.

We can use the load_model() Keras function and create a Python list of loaded models.

                                  n_members                  =                  5                  from                  keras.models                  import                  load_model                  # load models from file                                    def                  load_all_model                  (                  n_models                  ):                  all_models                  =                  []                  for                  i                  in                  range                  (                  n_models                  ):                  # ascertain filename for this ensemble                                    filename                  =                  'models/model_'                  +                  str                  (                  i                  +                  1                  )                  +                  '.h5'                  model                  =                  load_model                  (                  filename                  )                  # add to listing of members                                    all_models                  .                  append                  (                  model                  )                  print                  (                  f"[INFO]>>loaded                                    {                  filename                  }                  ."                  )                  return                  all_models

                                  # load all models:                                    members                  =                  load_all_model                  (                  n_members                  )                  print                  (                  f"Loaded                                    {                  len                  (                  members                  )                  }                                      models."                  )

                [INFO]>>loaded models/model_1.h5. [INFO]>>loaded models/model_2.h5. [INFO]>>loaded models/model_3.h5. [INFO]>>loaded models/model_4.h5. [INFO]>>loaded models/model_5.h5. Loaded 5 models.

It would be useful to know how well the single models perform on the examination dataset as nosotros would expect a stacking model to perform ameliorate.

We can hands evaluate each single model on the training dataset and establish a baseline of performance.

                                  # evaluate standalone models on test dataset                                    for                  model                  in                  members                  :                  _                  ,                  acc                  =                  model                  .                  evaluate                  (                  X_test                  ,                  y_test                  ,                  verbose                  =                  ane                  )                  print                  (                  f"Model Examination Set Accuracy:                                    {                  acc                  }                  ."                  )

                1000/k [==============================] - 0s 402us/step Model Test Set Accurateness: 0.78. one thousand/yard [==============================] - 0s 98us/step Model Test Set Accuracy: 0.724. thou/1000 [==============================] - 0s 100us/step Model Exam Set Accuracy: 0.755. 1000/1000 [==============================] - 0s 103us/step Model Test Set Accuracy: 0.778. 1000/1000 [==============================] - 0s 101us/step Model Exam Prepare Accuracy: 0.752.

Next, we can train our meta-learner. This requires two steps:

Set up a training dataset for the meta-learner.
Use the prepared training gear up to fit a meta-learner model

We will prepare a training dataset for the meta-learner by providing examples from the test set to each of the submodels and collecting the predictions. In this example, each model volition output iii predictions for each case for the probabilities that a given example belongs to each of the three classes. Therefore, the 1,000 examples in the test set volition result in five arrays with the shape [1000, 3].

We tin combine these arrays into a three-dimensional array with the shape [1000, five, three] by using the dstack() numpy functionthat will stack each new set of predictions

As input for new model, nosotros volition require 1,000 examples with some number of features. Given that we accept 5 models and each model makes three predictions per example, and so nosotros would have xv (3 x 5) features for each example provided to the submodels. We can transform the [1000, 5, 3] shaped predictions from the sub-models into a [1000, 15] shaped array to be used to railroad train a meta-learner using the reshape() numpy function and flattening the last two dimensions. The stacked_dataset() function implements this steps

                                  # check the sub-model prediction output shape                                    test                  =                  model                  .                  predict                  (                  X_test                  )                  print                  (                  test                  )                  print                  (                  "                  \northward                  "                  )                  print                  (                  test                  .                  shape                  )                  print                  (                  "                  \n                  "                  )                  impress                  (                  y_test                  )

                [[0.87745166 0.00568948 0.11685885]  [0.00420502 0.08186033 0.91393465]  [0.0560521  0.27464437 0.66930354]  ...  [0.22192474 0.524442   0.25363323]  [0.8633721  0.05750959 0.0791183 ]  [0.66364163 0.11609959 0.2202588 ]]   (1000, 3)   [[ane. 0. 0.]  [0. 0. 1.]  [0. 0. ane.]  ...  [0. 1. 0.]  [1. 0. 0.]  [0. 0. 1.]]

                                  # numpy dstack example                                    import                  numpy                  equally                  np                  from                  numpy                  import                  dstack                  a                  =                  np                  .                  array                  ((                  ane                  ,                  ii                  ,                  3                  ))                  impress                  (                  f"The array a is                                    {                  a                  }                  "                  )                  impress                  (                  "                  \n                  "                  )                  b                  =                  np                  .                  array                  ((                  4                  ,                  5                  ,                  6                  ))                  impress                  (                  f"The assortment b is                                    {                  b                  }                  "                  )                  print                  (                  "                  \n                  "                  )                  c                  =                  dstack                  ((                  a                  ,                  b                  ))                  print                  (                  f"dstack:                                    {                  c                  }                  "                  )                  print                  (                  f"Shape of dstack:                                    {                  c                  .                  shape                  }                  "                  )

                The array a is [1 ii 3]   The array b is [iv 5 6]   dstack: [[[1 four]   [two v]   [three half-dozen]]] Shape of dstack: (1, three, 2)

                                  # create stacked model input dataset every bit output from the ensemble                                    def                  stacked_dataset                  (                  members                  ,                  inputX                  ):                  stackX                  =                  None                  for                  model                  in                  members                  :                  # brand prediction                                    yhat                  =                  model                  .                  predict                  (                  inputX                  ,                  verbose                  =                  0                  )                  # stack predictions into [rows, members, probabilities]                                    if                  stackX                  is                  None                  :                  stackX                  =                  yhat                  else                  :                  stackX                  =                  dstack                  ((                  stackX                  ,                  yhat                  ))                  # flatten predictions to [rows, members x probabilities]                                    stackX                  =                  stackX                  .                  reshape                  ((                  stackX                  .                  shape                  [                  0                  ],                  stackX                  .                  shape                  [                  1                  ]                  *                  stackX                  .                  shape                  [                  2                  ]))                  return                  stackX

                                  stackX                  =                  stacked_dataset                  (                  members                  ,                  X_test                  )

Once prepared, nosotros can utilise this input dataset forth with the output, or y part, of the test set up to train a new meta-learner

In this case, we volition train a simple logistic regression algorithm from the scikit-larn library

Logistic Regression only supports binary nomenclature, although the implementation of logistic regression in scikit-larn in the LogisticRegression class support multi-class classification (more than two classes) using a one-vs-residue scheme. The function fit_stacked_model() below volition prepare the training dataset for meta-learner by calling the stacked_dataset() part, then fit a logistic regression model that is then returned.

                                  # import Logistic Regression class                                    from                  sklearn.linear_model                  import                  LogisticRegression                  # fit a model based on the outputs from the ensemble members                                    def                  fit_stacked_model                  (                  members                  ,                  inputX                  ,                  inputy                  ):                  # create dataset using ensemble                                    stackedX                  =                  stacked_dataset                  (                  members                  ,                  inputX                  )                  # fit standalone mode                                    model                  =                  LogisticRegression                  ()                  model                  .                  fit                  (                  stackedX                  ,                  inputy                  )                  render                  model

                                  # brand a prediction with the stacked model                                    def                  stacked_prediction                  (                  members                  ,                  model                  ,                  inputX                  ):                  # create dataset using ensemble                                    stackedX                  =                  stacked_dataset                  (                  members                  ,                  inputX                  )                  # make a prediction                                    yhat                  =                  model                  .                  predict                  (                  stackedX                  )                  return                  yhat

We can call this role and pass in the listing of loaded models and the training dataset

                                  from                  sklearn.datasets                  import                  make_blobs                  # reset the X, y and X_test, y_test variable                                    X                  ,                  y                  =                  make_blobs                  (                  n_samples                  =                  1100                  ,                  centers                  =                  3                  ,                  n_features                  =                  2                  ,                  cluster_std                  =                  2                  ,                  random_state                  =                  ii                  )                  # split into railroad train and examination                                    n_train                  =                  100                  X_train                  ,                  X_test                  =                  X                  [:                  n_train                  ,                  :],                  X                  [                  n_train                  :,                  :]                  y_train                  ,                  y_test                  =                  y                  [:                  n_train                  ],                  y                  [                  n_train                  :]                  print                  (                  X_train                  .                  shape                  ,                  X_test                  .                  shape                  )

                                  from                  sklearn.metrics                  import                  accuracy_score                  # load all models                                    n_members                  =                  five                  members                  =                  load_all_model                  (                  n_members                  )                  print                  (                  'Loaded %d models'                  %                  len                  (                  members                  ))                  # evaluate standalone models on exam dataset                                    for                  model                  in                  members                  :                  testy_enc                  =                  to_categorical                  (                  y_test                  )                  _                  ,                  acc                  =                  model                  .                  evaluate                  (                  X_test                  ,                  testy_enc                  ,                  verbose                  =                  0                  )                  print                  (                  'Model Accuracy: %.3f'                  %                  acc                  )                  # fit stacked model using the ensemble                                    model                  =                  fit_stacked_model                  (                  members                  ,                  X_test                  ,                  y_test                  )                  # evaluate model on exam gear up                                    yhat                  =                  stacked_prediction                  (                  members                  ,                  model                  ,                  X_test                  )                  acc                  =                  accuracy_score                  (                  y_test                  ,                  yhat                  )                  print                  (                  'Stacked Test Accuracy: %.3f'                  %                  acc                  )

                [INFO]>>loaded models/model_1.h5. [INFO]>>loaded models/model_2.h5. [INFO]>>loaded models/model_3.h5. [INFO]>>loaded models/model_4.h5. [INFO]>>loaded models/model_5.h5. Loaded five models Model Accuracy: 0.780 Model Accuracy: 0.724 Model Accurateness: 0.755 Model Accuracy: 0.778 Model Accuracy: 0.752 Stacked Test Accurateness: 0.826

Integrated Stacking Model

When using neural networks as sub-models, it may be desirable to use a neural network as a meta-learner.

Specifically, the sub-networks can be embedded in a larger multi-headed neural network that so learns how to best combine the predictions from each input sub-model. It allows the stacking ensemble to be treated equally a single big model.

The benefit of this approach is that the outputs of the submodels are provided directly to the meta-learner. Further, it is likewise possible to update the weights of the submodels in conjunction with the meta-learner model, if this is desirable.

This tin can be achieved using the Keras functional interface for developing models.

After the models are loaded as a list, a larger stacking ensemble model tin exist defined where each of the loaded models is used every bit a separate input-head to the model. This requires that all of the layers in each of the loaded models exist marked as not trainable so the weights cannot be updated when the new larger model is beingness trained. Keras also requires that each layer has a unique name, therefore the names of each layer in each of the loaded models volition take to exist updated to indicate to which ensemble member they belong.

                                  # import modules                                    from                  sklearn.datasets                  import                  make_blobs                  from                  sklearn.metrics                  import                  accuracy_score                  from                  keras.models                  import                  load_model                  from                  keras.utils                  import                  to_categorical                  from                  keras.utils                  import                  plot_model                  from                  keras.models                  import                  Model                  from                  keras.layers                  import                  Input                  from                  keras.layers                  import                  Dumbo                  from                  keras.layers.merge                  import                  concatenate                  from                  numpy                  import                  argmax

Once the sub-models have been prepared, nosotros can define the stacking ensemble model.

The input layer for each of the sub-models will be used as separete input head to this new model. This means that thou copies of any input data will have to exist provided to the model, where yard is the number of input models, in this case, k = 5.

The outputs of each of the models can so exist merged. In this case, we volition use a elementary chain merge, where a single 15-elememts vector will be created from the 3 course-probabilities predicted by each of the 5 models.

We will then ascertain a hidden layer to interpret this input to the meta-learner and an output layer that will make its own probabilistic prediction. The define_stacked_model() office below implements this and will render a stacked generalization neural network model given a list of trained sub-models.

                                  # ascertain stacked model from multiple member input models                                    def                  define_stacked_model                  (                  members                  ):                  # update all layers in all models to non be trainable                                    for                  i                  in                  range                  (                  len                  (                  members                  )):                  model                  =                  members                  [                  i                  ]                  for                  layer                  in                  model                  .                  layers                  :                  # brand not trainable                                    layer                  .                  trainable                  =                  Faux                  # rename to avert 'unique layer proper name' issue                                    layer                  .                  proper noun                  =                  'ensemble_'                  +                  str                  (                  i                  +                  1                  )                  +                  '_'                  +                  layer                  .                  name                  # define multi-headed input                                    ensemble_visible                  =                  [                  model                  .                  input                  for                  model                  in                  members                  ]                  # concatenate merge output from each model                                    ensemble_outputs                  =                  [                  model                  .                  output                  for                  model                  in                  members                  ]                  merge                  =                  concatenate                  (                  ensemble_outputs                  )                  hidden                  =                  Dense                  (                  10                  ,                  activation                  =                  'relu'                  )(                  merge                  )                  output                  =                  Dumbo                  (                  iii                  ,                  activation                  =                  'softmax'                  )(                  hidden                  )                  model                  =                  Model                  (                  inputs                  =                  ensemble_visible                  ,                  outputs                  =                  output                  )                  # plot graph of ensemble                                    plot_model                  (                  model                  ,                  show_shapes                  =                  Truthful                  ,                  to_file                  =                  'model_graph.png'                  )                  # complie                                    model                  .                  compile                  (                  loss                  =                  'categorical_crossentropy'                  ,                  optimizer                  =                  'adam'                  ,                  metrics                  =                  [                  'accurateness'                  ])                  return                  model

                                  stacked_model                  =                  define_stacked_model                  (                  members                  )

A plot of the network graph is created when this function is called to give an idea of how the ensemble model fits together.

One time the model is defined, it tin exist fit. We can fit it directly on the holdout test dataset.

Because the sub-models are not trainable, their weights wil non exist updated during training and only the weights of the new hidden and output layer volition be updated. The fit_stacked_model() function below volition fit the stacking neural network model on for 300 epochs.

                                  # fit a stacked model                                    def                  fit_stacked_model                  (                  model                  ,                  inputX                  ,                  inputy                  ):                  # ready input information                                    X                  =                  [                  inputX                  for                  _                  in                  range                  (                  len                  (                  model                  .                  input                  ))]                  # encode y variable                                    inputy_enc                  =                  to_categorical                  (                  inputy                  )                  # fit the way                                    model                  .                  fit                  (                  Ten                  ,                  inputy_enc                  ,                  epochs                  =                  300                  ,                  verbose                  =                  0                  )

Once fit, we can use the new stacked model to brand a prediction on new data.

This is as simple every bit calling the predict() function on the model. I small-scale change is that we require thousand copies of the input data in a list to be provided to the model for each of the k sub-models. the predict_stacked_model() role below simplifies this process of making a prediction with the stacking model.

                                  # brand a prediction with a stacked model                                    def                  predict_stacked_model                  (                  model                  ,                  inputX                  ):                  # prepare input data                                    X                  =                  [                  inputX                  for                  _                  in                  range                  (                  len                  (                  model                  .                  input                  ))]                  # make prediction                                    return                  model                  .                  predict                  (                  Ten                  ,                  verbose                  =                  one                  )

Nosotros tin can call this function to make a prediction for the examination dataset and report the accurateness

Nosotros would wait the performance of the neural network learner to be better than whatever individual submodel and perhaps competitive with the linear meta-learner used in the previous section.

                                  # generate 2nd nomenclature dataset                                    from                  sklearn.model_selection                  import                  train_test_split                  X                  ,                  y                  =                  make_blobs                  (                  n_samples                  =                  1100                  ,                  centers                  =                  iii                  ,                  n_features                  =                  2                  ,                  cluster_std                  =                  2                  ,                  random_state                  =                  2                  )                  # split into train and test                                    X_train                  ,                  X_test                  ,                  y_train                  ,                  y_test                  =                  train_test_split                  (                  X                  ,                  y                  ,                  test_size                  =                  0.5                  ,                  random_state                  =                  3                  )                  # load all models                                    n_members                  =                  five                  members                  =                  load_all_model                  (                  n_members                  )                  print                  (                  'Loaded %d models'                  %                  len                  (                  members                  ))                  # define ensemble model                                    stacked_model                  =                  define_stacked_model                  (                  members                  )                  # fit stacked model on test dataset                                    fit_stacked_model                  (                  stacked_model                  ,                  X_test                  ,                  y_test                  )                  # make predictions and evaluate                                    yhat                  =                  predict_stacked_model                  (                  stacked_model                  ,                  X_test                  )                  yhat                  =                  argmax                  (                  yhat                  ,                  axis                  =                  1                  )                  acc                  =                  accuracy_score                  (                  y_test                  ,                  yhat                  )                  impress                  (                  'Stacked Exam Accuracy: %.3f'                  %                  acc                  )

                [INFO]>>loaded models/model_1.h5. [INFO]>>loaded models/model_2.h5. [INFO]>>loaded models/model_3.h5. [INFO]>>loaded models/model_4.h5. [INFO]>>loaded models/model_5.h5. Loaded 5 models 550/550 [==============================] - 0s 448us/step Stacked Test Accuracy: 0.820

Running the instance commencement loads the five sub-models.

A larger stacking ensemble neural network is defined and fit on the exam dataset, then the new model is used to make a prediction on the test dataset. We can see that in this case, the model achieved an college accurateness. out-performing the linear model from the previous section.

Extensions

This section lists some ideas for extending the tutorial that you may wish to explor

Alternate Meta-Learner. Update the case to use an alternate meta-learner classifier model to the logistic regression model
Unmarried Level 0 Models. Update the example to use a single level-0 model and compare the results.
** Vary Level0 Models**. Develop a study that demostrates the human relationship between test classification accuracy and the number of sub-models used in the stacked ensemble.
Cross-Validation Stacking Ensemble. Update the example to apply chiliad-fold cross-validation to prepare the training dataset for the meta-learner model.
Use Raw Input in Meta-Learner. Update the example so that the meta-learner algorithms have the raw input data for the sample as well as the output from the sub-models and compare performance.

Farther Reading

Books

Section viii.viii Model Averaging and Stacking, The Elements of Statistical Learning: Information Mining, Inference and Prediction, second Edition, 2016
Department 7.5 Combining multiple models, Data Mining: Practical Machine Learning Tools and Techniques, 2nd Editions, 2005
Section 9.8.ii Stacked Generalization, Neural Networks for Pattern Recognition, 1995

Papers