Getting Started with PLS

This is a tutorial for PEAXACT - Software for Quantitative Spectroscopy from S-PACT. Its main objective is to get you familiar with multivariate calibration using Projection to Latent Structures, also known as Partial Least Squares. The tutorial is intended for PEAXACT users and persons interested in PEAXACT.

In this tutorial you learn how to:

  1. Modify the Pretreatment Model and create Data Filters
  2. Perform PLS calibration
  3. Improve the Calibration Model
  4. Evaluate calibration alternatives

If you have PEAXACT installed on your computer you may try this tutorial right away. If you don't have PEAXACT yet, get a free trial now.

Preparations

You can find data of this tutorial in %ProgramFiles%\S-PACT\PEAXACT 5\Data\NIR - Gasoline. The directory will be referred to as DATA from this point forward.

  • Start PEAXACT.
  • Choose File > New Session > NIR from the menu which opens a new modelling session with default settings for near-infrared data.

Pretreatment Model and Data Filters

If you never worked with PLS before, this tutorial is probably not the best starting point, except when you are willing to accept that PLS is harder to interpret than... well, almost all other methods PEAXACT has to offer. So we don't bother to try explaining it here. On the other hand, PLS is very easy to use and produces good results, if (and that's a big if) you provide good training samples, and make that a lot of it.

  • Choose Data > Load Table... from the menu, browse to DATA\References and select DataTableCalibration.xls to load 60 near-infrared spectra of gasoline with associated octane numbers.
  • Select the first sample in the Samples Panel.
  • Choose File > New to create a new model. Expand the model tree and select the Pretreatment item. The Pretreatment Model is displayed in the Model Properties Panel.
  • For PLS is highly recommended to enable some sort of Resampling and reduce the Global Range a bit, so that all spectra always get identical x-axes (it's a PLS requirement). Change the Pretreatment Model as follows:
    • Resampling: Equidistant Points
    • Number of Points: 400
    • Global Range: 5880 - 11110
  • PLS correlates variance in the spectral signal with variance in feature values. So let's also increase the spectral variance. Make the following modifications to the Pretreatment Model:
    • Derivative/Smoothing: 1st order derivatives
    • Filter Length: 5
    • Standardization: SNV normalization

You probably agree that some spectral regions have less variance than others, so we should exclude those irrelevant regions. The problem is that we cannot know which regions PLS considers irrelevant until after the calibration. The solution to this problem are Data Filters. A Data Filter defines a spectral region that can be modified during calibration.

  • Enable the Data Filter Tool in the toolbar. Click into the Plot Panel and use the mouse to draw a filtered region.
  • Select the whole spectral region for now. We are going to adjust the Data Filter later.

Calibration

    Select all samples in the Samples Panel and choose Edit Model > Calibration Model > New.... This displays the Calibration Setup Dialog.
  • Tick feature Octane Number and assign the Data Filter.
  • Choose a Maximum rank of 10. PEAXACT performs separate regressions for each rank from 1 up to the maximum and calculates performance numbers which you can then use to decide on a specific rank. Speaking of performance numbers: also enable cross-validation by setting Partitioning to K-fold with k=10.
  • Click OK to run the calibration and cross-validation.
  • Calibration results are displayed in a Report Window. The RMSE vs. Rank plot displays calibration and validation errors over PLS rank. The best rank is as small as possible but has a small RMSE too. E.g. a rank of 3 with RMSECV = 0.32 would be a reasonable choice here.
  • Mark your choice in the bottom-left table. Verify your choice by looking at other reports, e.g. the Predicted vs. True plot which displays the deviation from an ideal calibration, or the Predicted vs. ... plot which shows errors bars representing the prediction interval for a 95% level of confidence.
  • You could finish calibration now and nobody would blame you. Except you can do better.

Improving the Calibration Model

  • First, let's modify the Data Filter. Select the Variable Importance in Projection (VIP) report from the top-right drop-down list and enable the Data Filter Tool in the toolbar.
  • Use the mouse to adjust the filter to include spectral regions of high importance. Ignore the region at the very right edge of the spectrum though (trust me on this).
  • Right-click and select Apply Changes from the context menu. This triggers a re-calculation of the Calibration Model using the changed filter.
  • Next, we want to check for and remove outliers. Select the Mahalanobis Distance vs. RMS Spectral Residuals plot from the top-right drop-down list and enable the Selection Tool in the toolbar.
  • Change the selected Rank from 1 to 2 to 3. You will notice that one sample has unusually large values for Mahalanobis distance and RMS residuals. Select the outlier sample (use the mouse to draw a rectangle around it), then right-click and choose Usage > Ignore from the context menu. Again, this triggers a re-calculation of the Calibration Model.
  • Select the RMSE vs. Rank plot again. A rank of 3 still looks like a good choice and has a much better cross-validation error close to 0.2 now.
  • You could click OK to accept the calibration and all would be good. Except you can do better.

Evaluating Calibration Alternatives

One problem with PLS is that you can easily get tricked into believing that all is good. Granted, the cross-validation error is small and we didn't even have to use a high rank to achieve it, but in the end you are not interested in how PLS performs on the data it was trained with, but how robust the Calibration Model is when predicting features from new unknown samples. Ideally you would want to validate the model on an independent set of test samples, but in this case we only have training samples. Then, the next best thing is a 2-fold cross-validation.

  • Click the + button next to the drop-down list saying Calibration #1. This displays the Calibration Setup Dialog again.
  • Change the cross-validation settings to K-fold with k=2 and click OK. A second Calibration Model, Calibration #2, is added to the drop-down list.
  • Inspect the RMSE vs. Rank plot one last time. The RMSECV for rank 3 is still good given the strict cross-validation conditions, which gives us some confidence on the model's robustness.
  • Pick one Calibration Model and don't forget to set the correct rank in the bottom-right table before clicking OK.

This concludes the tutorial on PLS. Congratulations, you made it to the end despite the lack of details here and there. If you want to learn more about PLS please contact us.


Have you seen our other tutorials yet? Check the overview on the PEAXACT Quick Start page!

 

SPACT GmbH

Burtscheider Str. 1
52064 Aachen
Tel.: +49 241 - 9569 9812
Fax: +49 241 - 4354 4308
E-Mail:
Internet: www.s-pact.de