Pada postingan terakhir saya, saya menyebutkan secara singkat Tautan Fungsional Vektor Acak jaringan dan postingan ini akan membahasnya, terinspirasi oleh fakta bahwa hasil awal dari postingan terakhir menunjukkan struktur jaringan yang dangkal daripada yang dalam.
Gagasan jaringan RVFL sudah ada sejak dua dekade lalu dan mungkin telah dibayangi oleh gagasan yang lebih baru. Mesin Pembelajaran Ekstrim, meskipun seperti yang disebutkan dalam posting terakhir saya ada beberapa kontroversi tentang plagiarisme sehubungan dengan ELM. Jaringan RVFL pada dasarnya adalah ELM dengan koneksi langsung tambahan dari lapisan input ke lapisan output. Koneksi dari lapisan input ke lapisan tersembunyi tunggal dibuat secara acak dan kemudian diperbaiki, lapisan tersembunyi digabungkan dengan lapisan input asli untuk membentuk lapisan baru, H, dan koneksi dari H ke lapisan output dipecahkan dalam satu langkah menggunakan Moore Penrose terbalik untuk mendapatkan Kuadrat terkecil linier solusi, atau alternatifnya menggunakan Kuadrat terkecil yang teraturKeuntungan dari pendekatan bentuk tertutup ini adalah waktu pelatihan yang cepat dibandingkan dengan pendekatan bentuk tertutup lainnya. optimasi rutinitas seperti keturunan gradien.
Makalah yang ditautkan di atas merinci serangkaian uji komparatif yang dijalankan pada berbagai konfigurasi jaringan RVFL pada sejumlah set data yang berbeda. Beberapa kesimpulan utama yang diambil adalah:
- hubungan langsung dari input ke output meningkatkan kinerja jaringan
- apakah akan memasukkan bias output atau tidak adalah faktor yang dapat disesuaikan tergantung pada data
- itu fungsi basis radial untuk unit tersembunyi selalu mengarah pada kinerja yang lebih baik
- kuadrat terkecil yang terregulasi (regresi ridge) berkinerja lebih baik daripada invers Moore Penrose
Berdasarkan kode disediakan oleh penulis studi Saya telah menulis dua fungsi tujuan berikut untuk digunakan dengan Perpustakaan BayesOpt.
## Copyright (C) 2019 dekalog
##
## This program is free software: you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program. If not, see
## .
## -*- texinfo -*-
## @deftypefn {} {@var{J} =} rvfl_training_of_cyclic_embedding_with_cv (@var{x})
##
## Function for Bayesian training of RVFL networks with fixed parameteres of:
##
## direct links,
## radial basis function activation,
## ridge regression for regularized least squares,
##
## and optimisable parameters of:
##
## number of neurons in hidden layer,
## lambda for the least squares regression,
## scaling of hidden layer inputs,
## with or without an output bias.
##
## The input X is a vector of 6 values to be optimised by the BayesOpt library
## function 'bayesoptcont.'
## The output J is the Brier Score for the test fold cross validated data.
## @seealso{}
## @end deftypefn
## Author: dekalog
## Created: 2019-11-04
function J = rvfl_training_of_cyclic_embedding_with_cv ( x )
global sample_features ; global sample_targets ;
epsilon = 1e-15 ; ## to ensure log() does not give out a nan
Nfea = size( sample_features , 2 ) ;
## check input x
if ( numel( x ) != 6 )
error( 'The input vector x must be of length 6.' ) ;
endif
## get the parameters from input x
hidden_layer_size = floor( x( 1 ) ) ; ## number of neurons in hidden layer
randomisation_type = floor( x( 2 ) ) ; ## 1 == uniform, 2 == Gaussian
scale_mode = floor( x( 3 ) ) ; ## 1 will scale the features for all neurons, 2 will scale the features for each hidden
## neuron separately, 3 will scale the range of the randomization for uniform distribution
scale = x( 4 ) ; ## Linearly scale the random features before feeding into the nonlinear activation function.
## In this implementation, we consider the threshold which leads to 0.99 of the maximum/minimum
## value of the activation function as the saturating threshold.
## scale = 0.9 means all the random features will be linearly scaled
## into 0.9 * [ lower_saturating_threshold , upper_saturating_threshold ].
if_output_bias = floor( x( 5 ) + 0.5 ) ; ## Use output bias, or not? 1 == yes , 0 == no.
lambda = x( 6 ) ; ## the regularization coefficient lambda
length_jj_loop = 25 ;
all_brier_values = zeros( length_jj_loop , 1 ) ;
##rand( 'seed' , 0 ) ;
##randn( 'seed' , 0 ) ;
##U_sample_targets = unique( sample_targets ) ;
##nclass = numel( U_sample_targets ) ;
##sample_targets_temp = zeros( numel( sample_targets ) , nclass ) ;
##
#### get the 0 - 1 one hot coding for the target,
##for i = 1 : nclass
## idx = sample_targets == U_sample_targets( i ) ;
## sample_targets_temp( idx , i ) = 1 ;
##endfor
###### information for splitting into training and test sets ###############
ix_positive_targets = find( sample_targets == 1 ) ;
ix_negative_targets = ( 1 : numel( sample_targets ) )' ;
ix_negative_targets( ix_positive_targets ) = [] ;
## split 20/80
split_no1 = round( 0.2 * numel( ix_positive_targets ) ) ;
split_no2 = round( 0.2 * numel( ix_negative_targets ) ) ;
######### get type of randomisation from input x #################
if ( randomisation_type == 1 ) ## uniform randomisation
if ( scale_mode == 3 ) ## range scaled for uniform randomisation
Weight = scale * ( rand( Nfea , hidden_layer_size ) * 2 - 1 ) ; ## scaled uniform random input weights to hidden layer
Bias = scale * rand( 1 , hidden_layer_size ) ; ## scaled random bias weights to hidden layer
else
Weight = rand( Nfea , hidden_layer_size ) * 2 - 1 ; ## unscaled random input weights to hidden layer
Bias = rand( 1 , hidden_layer_size ) ; ## unscaled random bias weights to hidden layer
endif
elseif ( randomisation_type == 2 ) ## gaussian randomisation
Weight = randn( Nfea , hidden_layer_size ) ; ## gaussian random input weights to hidden layer
Bias = randn( 1 , hidden_layer_size ) ; ## gaussian random bias weights to hidden layer
else
error( 'only Gaussian and Uniform are supported' )
endif
############################################################################
## Activation Function
Saturating_threshold = [ -2.1 , 2.1 ] ;
Saturating_threshold_activate = [ 0 , 1 ] ;
for jj = 1 : length_jj_loop
## shuffle
randperm1 = randperm( numel( ix_positive_targets) ) ;
randperm2 = randperm( numel( ix_negative_targets) ) ;
test_ix1 = ix_positive_targets( randperm1( 1 : split_no1 ) ) ;
test_ix2 = ix_negative_targets( randperm2( 1 : split_no2 ) ) ;
test_ix = [ test_ix1 ; test_ix2 ] ;
train_ix1 = ix_positive_targets( randperm1( split_no1 + 1 : end ) ) ;
train_ix2 = ix_negative_targets( randperm2( split_no2 + 1 : end ) ) ;
train_ix = [ train_ix1 ; train_ix2 ] ;
sample_targets_train = sample_targets( train_ix ) ;
sample_features_train = sample_features( train_ix , : ) ;
Nsample = size( sample_features_train , 1 ) ;
Bias_train = repmat( Bias , Nsample , 1 ) ;
H = sample_features_train * Weight + Bias_train ;
if ( scale_mode == 1 )
## scale the features for all neurons
[ H , k , b ] = Scale_feature( H , Saturating_threshold , scale ) ;
elseif ( scale_mode == 2 )
## else scale the features for each hidden neuron separately
[ H , k , b ] = Scale_feature_separately( H , Saturating_threshold , scale ) ;
endif
## actual activation, the radial basis function
H = exp( -abs( H ) ) ;
if ( if_output_bias == 1 )
## we will use an output bias
H = [ H , ones( Nsample , 1 ) ] ;
endif
## the direct link scaling options, concatenate hidden layer and sample_features_train
if ( scale_mode == 1 )
## scale the features for all neurons
sample_features_train = sample_features_train .* k + b ;
H = [ H , sample_features_train ] ;
elseif ( scale_mode == 2 )
## else scale the features for each hidden neuron separately
[ sample_features_train , ktr , btr ] = Scale_feature_separately( sample_features_train , Saturating_threshold_activate , scale ) ;
H = [ H , sample_features_train ] ;
else
H = [ H , sample_features_train ] ;
endif
H( isnan( H ) ) = 0 ; ## avoids any 'blowups' due to nans in H
## do the regularized least squares for concatenated hidden layer output
## and the original, possibly scaled, input sample_features
if ( hidden_layer_size < Nsample )
beta = ( eye( size( H , 2 ) ) / lambda + H' * H ) H' * sample_targets_train ;
else
beta = H' * ( ( eye( size( H , 1 ) ) / lambda + H * H' ) sample_targets_train ) ;
endif
############# now the test on test data ####################################
Bias_test = repmat( Bias , numel( sample_targets( test_ix ) ) , 1 ) ;
H_test = sample_features( test_ix , : ) * Weight + Bias_test ;
if ( scale_mode == 1 )
## scale the features for all neurons
H_test = H_test .* k + b ;
elseif ( scale_mode == 2 )
## else scale the features for each hidden neuron separately
nSamtest = size( H_test , 1 ) ;
kt = repmat( k , nSamtest , 1 ) ;
bt = repmat( b , nSamtest , 1 ) ;
H_test = H_test .* kt + bt ;
endif
## actual activation, the radial basis function
H_test = exp( -abs( H_test ) ) ;
if ( if_output_bias == 1 )
## we will use an output bias
H_test = [ H_test , ones( numel( sample_targets( test_ix ) ) , 1 ) ] ;
endif
## the direct link scaling options, concatenate hidden layer and sample_features_train
if ( scale_mode == 1 )
## scale the features for all neurons
testX_temp = sample_features( test_ix , : ) .* k + b ;
H_test = [ H_test , testX_temp ] ;
elseif ( scale_mode == 2 )
## else scale the features for each hidden neuron separately
nSamtest = size( H_test , 1 ) ;
kt = repmat( ktr , nSamtest , 1 ) ;
bt = repmat( btr , nSamtest , 1 ) ;
testX_temp = sample_features( test_ix , : ) .* kt + bt ;
H_test = [ H_test , testX_temp ] ;
else
H_test = [ H_test , sample_features( test_ix , : ) ] ;
endif
H_test( isnan( H_test ) ) = 0 ; ## avoids any 'blowups' due to nans in H_test
## get the test predicted target output
test_targets = H_test * beta ;
##Y_temp = zeros( Nsample , 1 ) ;
##% decode the target output
##for i = 1 : Nsample
## [ maxvalue , idx ] = max( sample_targets_temp( i , : ) ) ;
## Y_temp( i ) = U_sample_targets( idx ) ;
##endfor
############################################################################
## the final logistic output
final_output = 1.0 ./ ( 1.0 .+ exp( -test_targets ) ) ;
## get the Brier_score
## https://en.wikipedia.org/wiki/Brier_score
all_brier_values( jj ) = mean( ( final_output .- sample_targets( test_ix ) ) .^ 2 ) ;
rand( 'state' ) ; randn( 'state' ) ; ## reset rng
endfor ## end of jj loop
J = mean( all_brier_values ) ;
endfunction
## Various measures of goodness
## https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models
## https://www.fharrell.com/post/classification/
## https://stats.stackexchange.com/questions/433628/what-is-a-reliable-measure-of-accuracy-for-logistic-regression
## https://www.jstatsoft.org/article/view/v090i12
## https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/
## https://stats.stackexchange.com/questions/319666/aic-with-test-data-is-it-possible
## https://www.learningmachines101.com/lm101-076-how-to-choose-the-best-model-using-aic-or-gaic/
## https://stackoverflow.com/questions/48185090/how-to-get-the-log-likelihood-for-a-logistic-regression-model-in-sklearn
## https://stats.stackexchange.com/questions/67903/does-down-sampling-change-logistic-regression-coefficients
## https://stats.stackexchange.com/questions/163221/whats-the-measure-to-assess-the-binary-classification-accuracy-for-imbalanced-d
## https://stats.stackexchange.com/questions/168929/logistic-regression-is-predicting-all-1-and-no-0
## https://stats.stackexchange.com/questions/435307/multiple-linear-regression-lse-when-one-of-parameter-is-known
Fungsi-fungsi ini berdiri di bahu dari tautan langsung di atas dan kode keras, aktivasi fungsi basis radial dan regresi punggungan, dengan jumlah neuron di lapisan tersembunyi, lambda untuk regresi punggungan, berbagai pilihan penskalaan dan penyertaan bias keluaran atau tidak sebagai parameter yang dapat dioptimalkan. Tujuan minimisasi fungsi adalah Skor Brier.
Fungsi kedua ini sedikit berbeda karena Kriteria informasi akaike adalah tujuan minimisasi dan ada pilihan untuk menggunakan laboratorium jaring Model linier umum berfungsi untuk memecahkan bobot keluaran yang tersembunyi (berikan komentar pada kode terkait jika diperlukan.)
## Copyright (C) 2019 dekalog
##
## This program is free software: you can redistribute it and/or modify it
## under the terms of the GNU General Public License as published by
## the Free Software Foundation, either version 3 of the License, or
## (at your option) any later version.
##
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
## GNU General Public License for more details.
##
## You should have received a copy of the GNU General Public License
## along with this program. If not, see
## .
## -*- texinfo -*-
## @deftypefn {} {@var{J} =} rvfl_training_of_cyclic_embedding (@var{x})
##
## Function for Bayesian training of RVFL networks with fixed parameteres of:
##
## direct links,
## radial basis function activation,
## ridge regression for regularized least squares,
##
## and optimisable parameters of:
##
## number of neurons in hidden layer,
## lambda for the least squares regression,
## scaling of hidden layer inputs,
## with or without an output bias.
##
## The input X is a vector of 6 values to be optimised by the BayesOpt library
## function 'bayesoptcont.'
## The output J is the AIC value for the tested model.
## @seealso{}
## @end deftypefn
## Author: dekalog
## Created: 2019-11-04
function J = rvfl_training_of_cyclic_embedding ( x )
global sample_features ; global sample_targets ;
epsilon = 1e-15 ; ## to ensure log() does not give out a nan
## check input x
if ( numel( x ) != 4 )
error( 'The input vector x must be of length 6.' ) ;
endif
## get the parameters from input x
hidden_layer_size = floor( x( 1 ) ) ; ## number of neurons in hidden layer
randomisation_type = floor( x( 2 ) ) ; ## 1 == uniform, 2 == Gaussian
scale_mode = floor( x( 3 ) ) ; ## 1 will scale the features for all neurons, 2 will scale the features for each hidden
## neuron separately, 3 will scale the range of the randomization for uniform distribution
scale = x( 4 ) ; ## Linearly scale the random features before feeding into the nonlinear activation function.
## In this implementation, we consider the threshold which leads to 0.99 of the maximum/minimum
## value of the activation function as the saturating threshold.
## scale = 0.9 means all the random features will be linearly scaled
## into 0.9 * [ lower_saturating_threshold , upper_saturating_threshold ].
##if_output_bias = floor( x( 5 ) + 0.5 ) ; ## Use output bias, or not? 1 == yes , 0 == no.
##lambda = x( 6 ) ; ## the regularization coefficient lambda
##length_jj_loop = 25 ;
##all_aic_values = zeros( length_jj_loop , 1 ) ;
rand( 'seed' , 0 ) ;
randn( 'seed' , 0 ) ;
##U_sample_targets = unique( sample_targets ) ;
##nclass = numel( U_sample_targets ) ;
##sample_targets_temp = zeros( numel( sample_targets ) , nclass ) ;
##
#### get the 0 - 1 one hot coding for the target,
##for i = 1 : nclass
## idx = sample_targets == U_sample_targets( i ) ;
## sample_targets_temp( idx , i ) = 1 ;
##endfor
sample_targets_temp = sample_targets ;
[ Nsample , Nfea ] = size( sample_features ) ;
######### get type of randomisation from input x #################
if ( randomisation_type == 1 ) ## uniform randomisation
if ( scale_mode == 3 ) ## range scaled for uniform randomisation
Weight = scale * ( rand( Nfea , hidden_layer_size ) * 2 - 1 ) ; ## scaled uniform random input weights to hidden layer
Bias = scale * rand( 1 , hidden_layer_size ) ; ## scaled random bias weights to hidden layer
else
Weight = rand( Nfea , hidden_layer_size ) * 2 - 1 ; ## unscaled random input weights to hidden layer
Bias = rand( 1 , hidden_layer_size ) ; ## unscaled random bias weights to hidden layer
endif
elseif ( randomisation_type == 2 ) ## gaussian randomisation
Weight = randn( Nfea , hidden_layer_size ) ; ## gaussian random input weights to hidden layer
Bias = randn( 1 , hidden_layer_size ) ; ## gaussian random bias weights to hidden layer
else
error( 'only Gaussian and Uniform are supported' )
endif
############################################################################
Bias_train = repmat( Bias , Nsample , 1 ) ;
H = sample_features * Weight + Bias_train ;
k_parameters = numel( Weight ) + numel( Bias_train ) ;
## Activation Function
Saturating_threshold = [ -2.1 , 2.1 ] ;
Saturating_threshold_activate = [ 0 , 1 ] ;
if ( scale_mode == 1 )
## scale the features for all neurons
[ H , k , b ] = Scale_feature( H , Saturating_threshold , scale ) ;
elseif ( scale_mode == 2 )
## else scale the features for each hidden neuron separately
[ H , k , b ] = Scale_feature_separately( H , Saturating_threshold , scale ) ;
endif
## actual activation, the radial basis function
H = exp( -abs( H ) ) ;
## glm training always applies a bias, so comment out if training with netlab glm
##if ( if_output_bias == 1 )
## ## we will use an output bias
## H = [ H , ones( Nsample , 1 ) ] ;
##endif
## the direct link scaling options, concatenate hidden layer and sample_features
if ( scale_mode == 1 )
## scale the features for all neurons
sample_features_temp = sample_features .* k + b ;
H = [ H , sample_features_temp ] ;
elseif ( scale_mode == 2 )
## else scale the features for each hidden neuron separately
[ sample_features_temp , ktr , btr ] = Scale_feature_separately( sample_features , Saturating_threshold_activate , scale ) ;
H = [ H , sample_features_temp ] ;
else
H = [ H , sample_features ] ;
endif
H( isnan( H ) ) = 0 ; ## avoids any 'blowups' due to nans in H
############ THE ORIGINAL REGULARISED LEAST SQUARES CODE ###################
## do the regularized least squares for concatenated hidden layer output
## and the original, possibly scaled, input sample_features
##if ( hidden_layer_size < Nsample )
## beta = ( eye( size( H , 2 ) ) / lambda + H' * H ) H' * sample_targets_temp ;
##else
## beta = H' * ( ( eye( size( H , 1 ) ) / lambda + H * H' ) sample_targets_temp ) ;
##endif
############################################################################
##k_parameters = k_parameters + numel( beta ) ;
## get the model predicted target output
##sample_targets_temp = H * beta ;
## the final logistic output
##final_output = 1.0 ./ ( 1.0 .+ exp( -sample_targets_temp ) ) ;
############ REPLACED BY GLM TRAINING USING NETLAB #########################
net = glm( size( H , 2 ) , 1 , 'logistic' ) ; ## Create a generalized linear model structure.
options = foptions ; ## Set default parameters for optimisation routines, for compatibility with MATLAB's foptions()
options( 1 ) = -1 ; ## change default value
## OPTIONS(1) is set to 1 to display error values during training. If
## OPTIONS(1) is set to 0, then only warning messages are displayed. If
## OPTIONS(1) is -1, then nothing is displayed.
options( 14 ) = 5 ; ## change default value
## OPTIONS(14) is the maximum number of iterations for the IRLS
## algorithm; default 100.
net = glmtrain( net , options , H , sample_targets ) ;
k_parameters = k_parameters + net.nwts ;
## get output of trained glm model
final_output = glmfwd( net , H ) ;
############################################################################
##Y_temp = zeros( Nsample , 1 ) ;
##% decode the target output
##for i = 1 : Nsample
## [ maxvalue , idx ] = max( sample_targets_temp( i , : ) ) ;
## Y_temp( i ) = U_sample_targets( idx ) ;
##endfor
############################################################################
## https://machinelearningmastery.com/logistic-regression-with-maximum-likelihood-estimation/
##
## likelihood = yhat * y + (1 – yhat) * (1 – y)
##
## We can update the likelihood function using the log to transform it into a log-likelihood function:
##
## log-likelihood = log(yhat) * y + log(1 – yhat) * (1 – y)
## Finally, we can sum the likelihood function across all examples in the dataset to maximize the likelihood:
##
## maximize sum i to n log(yhat_i) * y_i + log(1 – yhat_i) * (1 – y_i)
log_likelihood = sum( log( final_output .+ epsilon ) .* sample_targets + log( 1 .- final_output .+ epsilon ) .* ( 1 .- sample_targets ) ) ;
## get Akaike Information criteria
J = 2 * k_parameters - 2 * log_likelihood ;
## get the Brier_score
## https://en.wikipedia.org/wiki/Brier_score
##J = mean( ( final_output .- sample_targets_temp ) .^ 2 ) ;
rand( 'state' ) ; randn( 'state' ) ; ## reset rng
endfunction
## Various measures of goodness
## https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models
## https://www.fharrell.com/post/classification/
## https://stats.stackexchange.com/questions/433628/what-is-a-reliable-measure-of-accuracy-for-logistic-regression
## https://www.jstatsoft.org/article/view/v090i12
## https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/
## https://stats.stackexchange.com/questions/319666/aic-with-test-data-is-it-possible
## https://www.learningmachines101.com/lm101-076-how-to-choose-the-best-model-using-aic-or-gaic/
## https://stackoverflow.com/questions/48185090/how-to-get-the-log-likelihood-for-a-logistic-regression-model-in-sklearn
## https://stats.stackexchange.com/questions/67903/does-down-sampling-change-logistic-regression-coefficients
## https://stats.stackexchange.com/questions/163221/whats-the-measure-to-assess-the-binary-classification-accuracy-for-imbalanced-d
## https://stats.stackexchange.com/questions/168929/logistic-regression-is-predicting-all-1-and-no-0
## https://stats.stackexchange.com/questions/435307/multiple-linear-regression-lse-when-one-of-parameter-is-known
Kedua fungsi ini adalah kode yang berfungsi dan banyak diberi komentar dan mungkin tidak terlalu dipoles.
Saat saya menulis postingan ini, saya sedang menjalankan berbagai pengujian di latar belakang dan akan melaporkan hasilnya pada waktunya.