c# - MATLAB および C# の PLS 回帰係数 (Accord.NET)

Question

C# で部分最小二乗回帰分析を実行しようとしています。MATLAB で実行される pls 手法は、ベータ (回帰係数の行列) を提供する SIMPLS アルゴリズムを使用します。

両方のケースで行列が異なる理由がわかりません。C# バージョンに入力を渡す方法に誤りがありますか?
また、入力は両方で同じであり、ここに含まれている論文を参照しています。

最小限の作業例:

MATLAB : Hervé Abdi (Hervé Abdi, Partial Least Square Regression) による小さな例に従います。参考文献：PDF

clear all;
clc;
inputs = [7, 7, 13, 7; 4, 3, 14, 7; 10, 5, 12, 5; 16, 7, 11, 3; 13, 3, 10, 3];
outputs = [14, 7, 8; 10, 7, 6; 8, 5, 5; 2, 4,7; 6, 2, 4];
[XL,yl,XS,YS,beta,PCTVAR] = plsregress(inputs,outputs, 1);
disp 'beta'
beta
disp 'beta size'
size(beta)
yfit = [ones(size(inputs,1),1) inputs]*beta;
residuals = outputs - yfit;

% stem(residuals)
% xlabel('Observation');
% ylabel('Residual');

beta =

   1.0484e+01   6.1899e+00   6.2841e+00
  -6.3488e-01  -3.0405e-01  -7.2608e-02
   2.1949e-02   1.0512e-02   2.5102e-03
   1.9226e-01   9.2078e-02   2.1988e-02
   2.8948e-01   1.3864e-01   3.3107e-02

アコード.NET:

double[][] inputs = new double[][]
    {
        //      Wine | Price | Sugar | Alcohol | Acidity
        new double[] {   7,     7,      13,        7 },
        new double[] {   4,     3,      14,        7 },
        new double[] {  10,     5,      12,        5 },
        new double[] {  16,     7,      11,        3 },
        new double[] {  13,     3,      10,        3 },
    };

double[][] outputs = new double[][]
    {
        //             Wine | Hedonic | Goes with meat | Goes with dessert
        new double[] {           14,          7,                 8 },
        new double[] {           10,          7,                 6 },
        new double[] {            8,          5,                 5 },
        new double[] {            2,          4,                 7 },
        new double[] {            6,          2,                 4 },
    };

var pls = new PartialLeastSquaresAnalysis()
        {
            Method = AnalysisMethod.Center,
            Algorithm = PartialLeastSquaresAlgorithm.NIPALS
        };

var regression = pls.Learn(inputs, outputs);

double[][] coeffs = regression.Weights;
>>
-1.69811320754717 -0.0566037735849056   0.0707547169811322
1.27358490566038   0.29245283018868     0.571933962264151
-4                 1                    0.5
1.17924528301887   0.122641509433962    0.159198113207547

score 2 · Accepted Answer

PLS の MATLAB バージョンと Accord.NET バージョンの呼び出し方法には、少なくとも 3 つの相違点があると思います。

ご指摘のとおり、MATLAB は SIMPLS を使用しています。ただし、Accord.NET は NIPALS を使用するように指示されています。
MATLAB バージョンはplsregress(inputs, outputs, 1 )として呼び出されています。つまり、回帰は PLS の 1 つの潜在コンポーネントのみを考慮して計算されていますが、Accord.NET は同じことを行うように指示されていません。
Accord.NET は、重みの行列と切片のベクトルの両方を含む MultivariateLinearRegression オブジェクトを返しますが、MATLAB は重み行列の最初の列として切片を返します。

これらすべてを考慮すると、MATLAB バージョンとまったく同じ結果を生成できます。

double[][] inputs = new double[][]
{
    //      Wine | Price | Sugar | Alcohol | Acidity
    new double[] {   7,     7,      13,        7 },
    new double[] {   4,     3,      14,        7 },
    new double[] {  10,     5,      12,        5 },
    new double[] {  16,     7,      11,        3 },
    new double[] {  13,     3,      10,        3 },
};

double[][] outputs = new double[][]
{
    //             Wine | Hedonic | Goes with meat | Goes with dessert
    new double[] {           14,          7,                 8 },
    new double[] {           10,          7,                 6 },
    new double[] {            8,          5,                 5 },
    new double[] {            2,          4,                 7 },
    new double[] {            6,          2,                 4 },
};

// Create the Partial Least Squares Analysis
var pls = new PartialLeastSquaresAnalysis()
{
    Method = AnalysisMethod.Center,
    Algorithm = PartialLeastSquaresAlgorithm.SIMPLS, // First change: use SIMPLS
};

// Learn the analysis
pls.Learn(inputs, outputs);

// Second change: Use just 1 latent factor/component
var regression = pls.CreateRegression(factors: 1);

// Third change: present results as in MATLAB
double[][] w = regression.Weights.Transpose();
double[] b = regression.Intercepts;

// Add the intercepts as the first column of the matrix of
// weights and transpose it as in the way MATLAB presents it
double[][] coeffs = (w.InsertColumn(b, index: 0)).Transpose();

// Show results in MATLAB format
string str = coeffs.ToOctave();

これらの変更により、上記の係数行列は次のようになります。

[ 10.4844779770616    6.18986077674717    6.28413863347486    ;
  -0.634878923091644 -0.304054829845448  -0.0726082626993539  ;
   0.0219492754418065 0.0105118991463605  0.00251024045589416 ;
   0.192261724966225  0.0920775662006966  0.0219881135215502  ; 
   0.289484835410222  0.13863944631343    0.033107085796122   ]

c# - MATLAB および C# の PLS 回帰係数 (Accord.NET)

1 に答える 1

Related

Reference