tea asked . 2021-07-26

Function approximation: Neural network great

Function approximation: Neural network great 'on paper' but when simulated results are very bad?

I need some help with NN because I don't understand what happened. One hidden layer, I=4, H=1:20, O=1. I run each net architecture 10 times with different initial weights (left default initnw). I have in total 34 datasets which were divided 60/20/20 when using Levenberg-Marquadt algorithm. Mse_goal = 0.01*mean(var(t',1)), i calculate NMSE and R^2, choose best R^2, for that check performance of each subsample, check regression plots, check rmse. R^2 is usually around 0,95; R for each subset 0,98... But when I simulate network with completely new set of data, estimations deviate quite a lot. It is not because of extrapolation. Data are normalized with mapminmax, transfer functions tansig, purelin.

Trainbr was my first choice actually, since I have small dataset and trainbr doesn't need validation set (Matlab2015a), but it is awfully slow. I ran a net with trainbr and we are talking hours versus minutes with trainlm.

I've read a ton of Greg Heath's posts and tutorials and found very valuable information there, however, still nothing. I see no way out.

% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by Neural Fitting app
% Created 09-Aug-2016 18:33:13

% This script assumes these variables are defined:
%
%   MP_UA_K - input data.
%   UA_K - target data.

close all, clear all

load varUA_K

x = MP_UA_K;
t = UA_K;

var_t=mean(var(t',1)); %t variance

[inputs,obs]=size(x); %

hiddenLayerSize = 20; %max number of neurons
numNN = 10; % number of training runs

neurons = [1:hiddenLayerSize]';
training_no = 1:numNN;
obs_no = 1:obs;

nets = cell(hiddenLayerSize,numNN);
trainOutputs = cell(hiddenLayerSize,numNN);
valOutputs = cell(hiddenLayerSize,numNN);
testOutputs = cell(hiddenLayerSize,numNN);
Y_all = cell(hiddenLayerSize,numNN);
performance = zeros(hiddenLayerSize,numNN);
trainPerformance = zeros(hiddenLayerSize,numNN);
valPerformance = zeros(hiddenLayerSize,numNN);
testPerformance = zeros(hiddenLayerSize,numNN);
e = zeros(numNN,obs);
e_all = cell(hiddenLayerSize,numNN);
NMSE = zeros(hiddenLayerSize,numNN);
r_train = zeros(hiddenLayerSize,numNN);
r_val = zeros(hiddenLayerSize,numNN);
r_test = zeros(hiddenLayerSize,numNN);
r = zeros(hiddenLayerSize,numNN);
Rsq = zeros(hiddenLayerSize,numNN);

for j=1:hiddenLayerSize

      % Choose a Training Function
      % For a list of all training functions type: help nntrain
      % 'trainlm' is usually fastest.
      % 'trainbr' takes longer but may be better for challenging problems.
      % 'trainscg' uses less memory. Suitable in low memory situations.
      trainFcn = 'trainbr';  % Bayesian Regularization backpropagation.

      % Create a Fitting Network
      net = fitnet(j,trainFcn);

      % Choose Input and Output Pre/Post-Processing Functions
      % For a list of all processing functions type: help nnprocess
      net.input.processFcns = {'removeconstantrows','mapminmax'};
      net.output.processFcns = {'removeconstantrows','mapminmax'};

      % Setup Division of Data for Training, Validation, Testing
      % For a list of all data division functions type: help nndivide
      % podaci su sortirani prema zavisnoj varijabli, cca svaki tre?i dataset je
      % testni
      net.divideFcn = 'divideind';  % Divide data by index
      net.divideMode = 'sample';  % Divide up every sample
      net.divideParam.trainInd = [1:3:34,2:3:34];
  %     net.divideParam.valInd = [5:5:30];
      net.divideParam.testInd = [3:3:34];

      mse_goal = 0.01*var_t;

      % Choose a Performance Function
      % For a list of all performance functions type: help nnperformance
      net.performFcn = 'mse';  % Mean Squared Error
      net.trainParam.goal = mse_goal;

      % Choose Plot Functions
      % For a list of all plot functions type: help nnplot
      net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
          'plotregression', 'plotfit'};

      for i=1:numNN
        % Train the Network
        net = configure(net,x,t);
        disp(['No. of hidden nodes '  num2str(j)  ', Training ' num2str(i) '/' num2str(numNN)])
        [nets{j,i}, tr{j,i}] = train(net,x,t);

        y = nets{j,i}(x);

        e (i,:) = gsubtract(t,y);
        e_all{j,i}= e(i,:);

        trainTargets = t .* tr{j,i}.trainMask{1};
        %valTargets = t .* tr{j,i}.valMask{1};
        testTargets = t .* tr{j,i}.testMask{1};

        trainPerformance(j,i) = perform(net,trainTargets,y);
        %valPerformance(j,i) = perform(net,valTargets,y);
        testPerformance(j,i) = perform(net,testTargets,y);

        performance(j,i)= perform(net,t,y);

        rmse_train(j,i)=sqrt(trainPerformance(j,i));
        %rmse_val(j,i)=sqrt(valPerformance(j,i));
        rmse_test(j,i)=sqrt(testPerformance(j,i));
        rmse(j,i)=sqrt(performance(j,i));

        % outputs of all networks
        Y_all{j,i}= y;

        trainOutputs {j,i} = y .* tr{j,i}.trainMask{1};
        %valOutputs {j,i} = y .* tr{j,i}.valMask{1};
        testOutputs {j,i} = y .* tr{j,i}.testMask{1};

        [r(j,i)] = regression(t,y);
        [r_train(j,i)] = regression(trainTargets,trainOutputs{j,i});
        %[r_val(j,i)] = regression(valTargets,valOutputs{j,i});
        [r_test(j,i)] = regression(testTargets,testOutputs{j,i});

        NMSE(j,i) = mse(e_all{j,i})/mean(var(t',1)); % normalized mse

        % coefficient of determination
        Rsq(j,i) = 1-NMSE(j,i);

      end

      [minperf_train,I_train] = min(trainPerformance',[],1);
      minperf_train = minperf_train';
      I_train = I_train';

%     [minperf_val,I_valid] = min(valPerformance',[],1);
%     minperf_val = minperf_val';
%     I_valid = I_valid';

      [minperf_test,I_test] = min(testPerformance',[],1);
      minperf_test = minperf_test';
      I_test = I_test';

      [minperf,I_perf] = min(performance',[],1);
      minperf = minperf';
      I_perf = I_perf';

      [maxRsq,I_Rsq] = max(Rsq',[],1);
      maxRsq = maxRsq';
      I_Rsq = I_Rsq';

      [train_min,train_min_I] = min(minperf_train,[],1);

%     [val_min,val_min_I] = min(minperf_val,[],1);

      [test_min,test_min_I] = min(minperf_test,[],1);

      [perf_min,perf_min_I] = min(minperf,[],1);

      [Rsq_max,Rsq_max_I] = max(maxRsq,[],1);

end

figure(4)
hold on
xlabel('observation no.')
ylabel('targets')
scatter(obs_no,trainTargets,'b')
% scatter(obs_no,valTargets,'g')
scatter(obs_no,testTargets,'r')
hold off

figure(5)
hold on
xlabel('neurons')
ylabel('min. performance')
plot(neurons,minperf_train,'b',neurons,minperf_test,'r',neurons,minperf,'k')
hold off

figure(6)
hold on
xlabel('neurons')
ylabel('max Rsq')
scatter(neurons,maxRsq,'k')
hold off

% View the Network
%view(net)

% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, ploterrhist(e)
%figure, plotregression(t,y)
%figure, plotfit(net,x,t)

% Deployment
% Change the (false) values to (true) to enable the following code blocks.
% See the help for each generation function for more information.

save figure(4).fig
save figure(5).fig
save figure(6).fig

if (false)
    % Generate MATLAB function for neural network for application
    % deployment in MATLAB scripts or with MATLAB Compiler and Builder
    % tools, or simply to examine the calculations your trained neural
    % network performs.
    genFunction(net,'nn_UA_K_BR');
    y = nn_UA_K_BR(x);
end

% sa?uvati sve varijable iz workspacea u poseban file za daljnju analizu
save ws_UA_K_BR

neural networks , generalization , trainbr , trainlm , simulation , matlab

Expert Answer

Prashant Kumar answered . 2025-03-27 23:14:26

% I need some help with NN because I don't understand what happened. One % hidden layer, I=4, H=1:20, O=1. I run each net architecture 10 times % with different initial weights (left default initnw). I have in total % 34 datasets

 Do you mean data points N = 34?

It typically takes ~ 10 to 30 data points per dimension to

adequately characterize a distribution. For a 4-D distribution I'd recommend

 40 <~ Ntrn <~ 120

% which were divided 60/20/20 when using Levenberg-Marquadt

 Ntrn = 34-2*round(0.2*34) = 20

 Hub = (20-1)/(4+1+1) = 3.2

indicating you really don't have enough data to adequately characterize a 4-D distribution.

You should consider

 1. Dimensionality reduction
 2. k-fold crossvalidation
 3. Adding new data with the same mean and covariance (stdv + 
correlations) matrix

% algorithm. Mse_goal = 0.01*mean(var(t',1)), i calculate NMSE and R^2, % choose best R^2, for that check performance of each subsample, check % regression plots, check rmse. R^2 is usually around 0,95; R for each % subset 0,98... But when I simulate network with completely new set of % data, estimations deviate quite a lot. It is not because of % extrapolation.

 No. It probably is. Your training data subset is insufficiently 
large for 4 dimensions.

 I would begin with minimizing H with dividetrain. Then consider 
k-fold crossvalidation.

% Data are normalized with mapminmax, transfer functions tansig, % purelin. % Trainbr was my first choice actually, since I have small dataset and % trainbr doesn't need validation set (Matlab2015a), but it is awfully % slow. I ran a net with trainbr and we are talking hours versus minutes % with trainlm.

This may be a BUG. Let MATLAB know. What version are you using?

>> ver

% I've read a ton of Greg Heath's posts and tutorials and found very % valuable information there, however, still nothing. I see no way out.

 It typically takes ~ 10 to 30 data points per dimension to adequately 
characterize a distribution,
I suggest calculating the means and stdv for each data set to see how 
much your training data is representative of the total 4-D 
distribution that includes the new datasets. 2 or 3-D 
color coded projections may be helpful.