"Unable to evaluate the loss function. Check the loss function and ensure it runs successfully": `gradient` can't access the custom loss function
clear close all %% Model parameters T_init = 0; T_final = 100; dt = 1; rng("shuffle") baseEnv = baseEnvironment(); p1_pos = randi(baseEnv.L,1); p2_pos = randi(baseEnv.L,1); while p1_pos == p2_pos p2_pos = randi(baseEnv.L,1); end rng("shuffle") baseEnv = baseEnvironment(); % validateEnvironment(baseEnv) p1_pos = randi(baseEnv.L,1); p2_pos = randi(baseEnv.L,1); while p1_pos == p2_pos p2_pos = randi(baseEnv.L,1); end agent1 = IMAgent(baseEnv, p1_pos, 1, 'o'); agent2 = IMAgent(baseEnv, p2_pos, 2, 'x'); listOfAgents = [agent1; agent2]; multiAgentEnv = multiAgentEnvironment(listOfAgents); % actInfo = getActionInfo(baseEnv); obsInfo = getObservationInfo(baseEnv); %%build the agent1 actorNetwork = [imageInputLayer([obsInfo.Dimension(1) 1 1],'Normalization','none','Name','state') fullyConnectedLayer(24,'Name','fc1') reluLayer('Name','relu1') fullyConnectedLayer(24,'Name','fc2') reluLayer('Name','relu2') fullyConnectedLayer(numel(actInfo.Elements),'Name','output') softmaxLayer('Name','actionProb')]; actorOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1); actor = rlStochasticActorRepresentation(actorNetwork,... obsInfo,actInfo,'Observation','state',actorOpts); actor = setLoss(actor, @actorLossFunction); %obj.brain = rlPGAgent(actor,baseline,agentOpts); agentOpts = rlPGAgentOptions('UseBaseline',false, 'DiscountFactor', 0.99); agent1.brain = rlPGAgent(actor,agentOpts); %%build the agent2 actorNetwork = [imageInputLayer([obsInfo.Dimension(1) 1 1],'Normalization','none','Name','state') fullyConnectedLayer(24,'Name','fc1') reluLayer('Name','relu1') fullyConnectedLayer(24,'Name','fc2') reluLayer('Name','relu2') fullyConnectedLayer(numel(actInfo.Elements),'Name','output') softmaxLayer('Name','actionProb')]; actorOpts = rlRepresentationOptions('LearnRate',1e-3,'GradientThreshold',1); actor = rlStochasticActorRepresentation(actorNetwork,... obsInfo,actInfo,'Observation','state',actorOpts); actor = setLoss(actor, @actorLossFunction); %obj.brain = rlPGAgent(actor,baseline,agentOpts); agentOpts = rlPGAgentOptions('UseBaseline',false, 'DiscountFactor', 0.99); agent2.brain = rlPGAgent(actor,agentOpts); %% averageGrad = []; averageSqGrad = []; learnRate = 0.05; gradDecay = 0.75; sqGradDecay = 0.95; numOfEpochs = 1; numEpisodes = 5000; maxStepsPerEpisode = 250; discountFactor = 0.995; aveWindowSize = 100; trainingTerminationValue = 220; loss_history = []; for i = 1:numOfEpochs action_hist = []; reward_hist = []; observation_hist = [multiAgentEnv.baseEnv.state]; for t = T_init:1:T_final actionList = multiAgentEnv.act(); [observation, reward, multiAgentEnv.isDone, ~] = multiAgentEnv.step(actionList); if t == T_final multiAgentEnv.isDone = true; end action_hist = cat(3, action_hist, actionList); reward_hist = cat(3, reward_hist, reward); if multiAgentEnv.isDone == true break else observation_hist = cat(3, observation_hist, observation); end end if size(observation_hist,3) ~= size(action_hist,3) print("gi") end clear observation reward actor = getActor(agent1.brain); batchSize = min(t,maxStepsPerEpisode); observations = observation_hist; actions = action_hist(1,:,:); rewards = reward_hist(1,:,:); observationBatch = permute(observations(:,:,1:batchSize), [2,1,3]); actionBatch = actions(:,:,1:batchSize); rewardBatch = rewards(:,1:batchSize); discountedReturn = zeros(1,int32(batchSize)); for t = 1:batchSize G = 0; for k = t:batchSize G = G + discountFactor ^ (k-t) * rewardBatch(k); end discountedReturn(t) = G; end lossData.batchSize = batchSize; lossData.actInfo = actInfo; lossData.actionBatch = actionBatch; lossData.discountedReturn = discountedReturn; % 6. Compute the gradient of the loss with respect to the policy % parameters. actorGradient = gradient(actor,'loss-parameters', {observationBatch},lossData); p1_pos = randi(baseEnv.L,1); p2_pos = randi(baseEnv.L,1); while p1_pos == p2_pos p2_pos = randi(baseEnv.L,1); end multiAgentEnv.reset([p1_pos; p2_pos]); end function loss = actorLossFunction(policy, lossData) % Create the action indication matrix. batchSize = lossData.batchSize; Z = repmat(lossData.actInfo.Elements',1,batchSize); actionIndicationMatrix = lossData.actionBatch(:,:) == Z; % Resize the discounted return to the size of policy. G = actionIndicationMatrix .* lossData.discountedReturn; G = reshape(G,size(policy)); % Round any policy values less than eps to eps. policy(policy < eps) = eps; % Compute the loss. loss = -sum(G .* log(policy),'all'); end
When I run the code, I am getting the following error:
Error using rl.representation.rlAbstractRepresentation/gradient (line 181) Unable to compute gradient from representation. Error in main1 (line 154) actorGradient = gradient(actor,'loss-parameters', {observationBatch},lossData); Caused by: Unable to evaluate the loss function. Check the loss function and ensure it runs successfully. Reference to non-existent field 'Advantage'.
I also tried running the example in the link; it works, but not my code. I put a breakpoint the loss function, but it isn't called during the gradient calculation, and from the error message, I suspect this is the problem, but the thing is it works when I run the code of the example in mathworks' website.
agent1.brain = rlPGAgent(actor,agentOpts);
Matlabsolutions.com provides guaranteed satisfaction with a
commitment to complete the work within time. Combined with our meticulous work ethics and extensive domain
experience, We are the ideal partner for all your homework/assignment needs. We pledge to provide 24*7 support
to dissolve all your academic doubts. We are composed of 300+ esteemed Matlab and other experts who have been
empanelled after extensive research and quality check.
Matlabsolutions.com provides undivided attention to each Matlab
assignment order with a methodical approach to solution. Our network span is not restricted to US, UK and Australia rather extends to countries like Singapore, Canada and UAE. Our Matlab assignment help services
include Image Processing Assignments, Electrical Engineering Assignments, Matlab homework help, Matlab Research Paper help, Matlab Simulink help. Get your work
done at the best price in industry.