teena_singh asked . 2021-05-25

"Unable to evaluate the loss function. Check the loss function and ensure it

"Unable to evaluate the loss function. Check the loss function and ensure it runs successfully": `gradient` can't access the custom loss function

I am trying to build a custom reinforcement learning environment with multiple agents having their own policy network for a project, and I have stuck in the training part (trying to follow a similar approach with this example)

My policy network accepts an array of size 21 as input and outputs a single element from [-1, 0, 1].

I have the following code (multiple-file code shortened into a single file; sorry for the mess):

clear
    close all
    
    %% Model parameters
    T_init = 0;
    T_final = 100;
    dt = 1;
    
    rng("shuffle")
    
    baseEnv = baseEnvironment();
    p1_pos = randi(baseEnv.L,1);
    p2_pos = randi(baseEnv.L,1);
    while p1_pos == p2_pos
        p2_pos = randi(baseEnv.L,1);
    end
 
    rng("shuffle")
    
    baseEnv = baseEnvironment();
    % validateEnvironment(baseEnv)
    p1_pos = randi(baseEnv.L,1);
    p2_pos = randi(baseEnv.L,1);
    while p1_pos == p2_pos
        p2_pos = randi(baseEnv.L,1);
    end
    
    agent1 = IMAgent(baseEnv, p1_pos, 1, 'o');
    agent2 = IMAgent(baseEnv, p2_pos, 2, 'x');
    listOfAgents = [agent1; agent2];
    multiAgentEnv = multiAgentEnvironment(listOfAgents);
    
    %
    actInfo = getActionInfo(baseEnv);
    obsInfo = getObservationInfo(baseEnv);
    
    %%build the agent1
    actorNetwork = [imageInputLayer([obsInfo.Dimension(1) 1 1],'Normalization','none','Name','state')
                    fullyConnectedLayer(24,'Name','fc1')
                    reluLayer('Name','relu1')
                    fullyConnectedLayer(24,'Name','fc2')
                    reluLayer('Name','relu2')
                    fullyConnectedLayer(numel(actInfo.Elements),'Name','output')
                    softmaxLayer('Name','actionProb')];
    actorOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
    actor = rlStochasticActorRepresentation(actorNetwork,...
        obsInfo,actInfo,'Observation','state',actorOpts);
    actor = setLoss(actor, @actorLossFunction);
    %obj.brain = rlPGAgent(actor,baseline,agentOpts);
    agentOpts = rlPGAgentOptions('UseBaseline',false, 'DiscountFactor', 0.99);
    agent1.brain = rlPGAgent(actor,agentOpts);
    %%build the agent2
    actorNetwork = [imageInputLayer([obsInfo.Dimension(1) 1 1],'Normalization','none','Name','state')
                    fullyConnectedLayer(24,'Name','fc1')
                    reluLayer('Name','relu1')
                    fullyConnectedLayer(24,'Name','fc2')
                    reluLayer('Name','relu2')
                    fullyConnectedLayer(numel(actInfo.Elements),'Name','output')
                    softmaxLayer('Name','actionProb')];
    actorOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1);
    actor = rlStochasticActorRepresentation(actorNetwork,...
        obsInfo,actInfo,'Observation','state',actorOpts);
    actor = setLoss(actor, @actorLossFunction);
    %obj.brain = rlPGAgent(actor,baseline,agentOpts);
    agentOpts = rlPGAgentOptions('UseBaseline',false, 'DiscountFactor', 0.99);
    agent2.brain = rlPGAgent(actor,agentOpts);
    %%
    
    averageGrad = [];
    averageSqGrad = [];
    learnRate = 0.05;
    gradDecay = 0.75;
    sqGradDecay = 0.95;
    numOfEpochs = 1;
    
    numEpisodes = 5000;
    maxStepsPerEpisode = 250;
    discountFactor = 0.995;
    aveWindowSize = 100;
    trainingTerminationValue = 220;
    
    
    
    loss_history = [];
    for i = 1:numOfEpochs
        action_hist = [];
        reward_hist = [];
        observation_hist = [multiAgentEnv.baseEnv.state];
        for t = T_init:1:T_final
            actionList = multiAgentEnv.act();
            [observation, reward, multiAgentEnv.isDone, ~] = multiAgentEnv.step(actionList);
    
            if t == T_final
                multiAgentEnv.isDone = true;
            end
            
            action_hist = cat(3, action_hist, actionList);
            reward_hist = cat(3, reward_hist, reward);
            if multiAgentEnv.isDone == true
                break
            else
                observation_hist = cat(3, observation_hist, observation);
            end
        end
        if size(observation_hist,3) ~= size(action_hist,3)
            print("gi")
        end
        clear observation reward
        actor = getActor(agent1.brain);        
        batchSize = min(t,maxStepsPerEpisode);
    
        observations = observation_hist;
        actions = action_hist(1,:,:);
        rewards = reward_hist(1,:,:);
        
        observationBatch = permute(observations(:,:,1:batchSize), [2,1,3]);
        actionBatch = actions(:,:,1:batchSize);
        rewardBatch = rewards(:,1:batchSize);
        
        
        discountedReturn = zeros(1,int32(batchSize));
        for t = 1:batchSize
            G = 0;
            for k = t:batchSize
                G = G + discountFactor ^ (k-t) * rewardBatch(k);
            end
            discountedReturn(t) = G;
        end
        
        lossData.batchSize = batchSize;
        lossData.actInfo = actInfo;
        lossData.actionBatch = actionBatch;
        lossData.discountedReturn = discountedReturn;
        
        % 6. Compute the gradient of the loss with respect to the policy
        % parameters.
        actorGradient = gradient(actor,'loss-parameters', {observationBatch},lossData);
        
        
        p1_pos = randi(baseEnv.L,1);
        p2_pos = randi(baseEnv.L,1);
        while p1_pos == p2_pos
            p2_pos = randi(baseEnv.L,1);
        end
        multiAgentEnv.reset([p1_pos; p2_pos]);
    end
    
    
    function loss = actorLossFunction(policy, lossData)
    
        % Create the action indication matrix.
        batchSize = lossData.batchSize;
        Z = repmat(lossData.actInfo.Elements',1,batchSize);
        actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
    
        % Resize the discounted return to the size of policy.
        G = actionIndicationMatrix .* lossData.discountedReturn;
        G = reshape(G,size(policy));
    
        % Round any policy values less than eps to eps.
        policy(policy < eps) = eps;
    
        % Compute the loss.
        loss = -sum(G .* log(policy),'all');
    end

When I run the code, I am getting the following error:

Error using rl.representation.rlAbstractRepresentation/gradient (line 181)
Unable to compute gradient from representation.

Error in main1 (line 154)
    actorGradient = gradient(actor,'loss-parameters', {observationBatch},lossData);

Caused by:
    Unable to evaluate the loss function. Check the loss function and ensure it runs successfully.
        Reference to non-existent field 'Advantage'.

I also tried running the example in the link; it works, but not my code. I put a breakpoint the loss function, but it isn't called during the gradient calculation, and from the error message, I suspect this is the problem, but the thing is it works when I run the code of the example in mathworks' website.

deep learning , matlab , simulink ,

Expert Answer

John Williams answered . 2025-03-27 17:19:28

In the training loop, you collect the actor from agent.brain, which is an rlPGAgent. The actor, thus, used the loss function defined inside rlPGAgent and not your loss function, actorLossFunction. I believe you can bypass rlPGAgent creation and use actor representation throughout your custom training loop.

To be precise, the actor used inside agent1.brain overides your loss function with a different one.

agent1.brain = rlPGAgent(actor,agentOpts);

Not satisfied with the answer ?? ASK NOW

Recent Matlab Projects

Classification of Covid and Non-Covid Lungs CT-Scan using Deep Learning with MATLAB

Matlab simulation on Wind Energy system

Frequently Asked Questions

MATLAB offers tools for real-time AI applications, including Simulink for modeling and simulation. It can be used for developing algorithms and control systems for autonomous vehicles, robots, and other real-time AI systems.

MATLAB Online™ provides access to MATLAB® from your web browser. With MATLAB Online, your files are stored on MATLAB Drive™ and are available wherever you go. MATLAB Drive Connector synchronizes your files between your computers and MATLAB Online, providing offline access and eliminating the need to manually upload or download files. You can also run your files from the convenience of your smartphone or tablet by connecting to MathWorks® Cloud through the MATLAB Mobile™ app.

Yes, MATLAB provides tools and frameworks for deep learning, including the Deep Learning Toolbox. You can use MATLAB for tasks like building and training neural networks, image classification, and natural language processing.

MATLAB and Python are both popular choices for AI development. MATLAB is known for its ease of use in mathematical computations and its extensive toolbox for AI and machine learning. Python, on the other hand, has a vast ecosystem of libraries like TensorFlow and PyTorch. The choice depends on your preferences and project requirements.

You can find support, discussion forums, and a community of MATLAB users on the MATLAB website, Matlansolutions forums, and other AI-related online communities. Remember that MATLAB's capabilities in AI and machine learning continue to evolve, so staying updated with the latest features and resources is essential for effective AI development using MATLAB.

Without any hesitation the answer to this question is NO. The service we offer is 100% legal, legitimate and won't make you a cheater. Read and discover exactly what an essay writing service is and how when used correctly, is a valuable teaching aid and no more akin to cheating than a tutor's 'model essay' or the many published essay guides available from your local book shop. You should use the work as a reference and should not hand over the exact copy of it.