강화학습 매트랩 코드 분석

# 강화학습 매트랩 코드 분석

| 공개
처음 시작하는 곳

main.m 에서 시작하여 초기에 initsimulation.m에 있는  initsimulation 함수를 호출한다.

main.m

initSimulation;

%% ----- Main loop

while(true)

버튼이 눌려지면 모드가 변경이 되고 변경된 모드에 의해서 작업이 수행이 되는 형식으로 진행이 된다.

elseif( strcmp(mode,'Start2') )

mode = '';
info_s1 = sprintf('%sd%21sd%22s.2f', 'Trials:', j-1, 'Steps:', steps2, 'Total reward:', reward2);

for j = j:max_trials

subplot(h_plot2(1));

[nn_params, Q_nn, Q_est, reward2, steps2, n_crash2, cost, info_s2] = ...
nnTrials(nn_params, input_layer_size, hidden_layer_size,...
output_layer_size, alpha2, gamma, T2, actions, max_steps,...

subplot(h_plot2(1));
delete(h_car2);
drawnow;

if( strcmp(mode,'Pause2') ), break; end
if(T2 > 1.0e-3), T2 = 0.95*T2; end

info_s1 = sprintf('%sd%21sd%22s.2f', 'Trials:', j, 'Steps:', steps2, 'Total reward:', reward2);
info_s = sprintf('%s\n%s', info_s1, info_s2);
set(h_text2, 'String', info_s);

n_steps2 = n_steps2 + steps2;
tot_steps2 = tot_steps2 + steps2;
total_reward2 = total_reward2 + reward2;
cost_fun(j) = cost;

if( mod(j,10) == 0)

n2 = n2 + 1;
x2(n2) = n2-1;
y3(n2) = n_steps2/10;
y4(n2) = total_reward2;

subplot(h_plot2(2));
plot(x2,y3);
xlabel('Trials x10');
ylabel('Average steps each 10th trials');

subplot(h_plot2(3));
plot(x2,y4);
title('Total reward');
xlabel('Trials x10');
ylabel('Accumulated reward');

n_steps2 = 0;
subplot(h_plot2(1));
drawnow;
end

if(j == 300 && ~figure2)

set(h_button2(1), 'BackGroundColor', [0.4,0.4,0.4], 'Enable', 'off');
set(h_button2(2), 'BackGroundColor', 'b', 'Enable', 'on');
figure2 = true;
end

if(mod(j,50) == 0), clc; end
Tekst = sprintf('Trials: %d', j);
fprintf([Tekst,'\n']);
end
end

initsimulation.m

% GUI
[h_plot1, h_plot2, h_text1, h_text2, h_button1, h_button2] = createSimulation(xlimit, ylimit, max_trials, max_steps,...
alpha1, alpha2, gamma, epsilon, T1);
% Simulation Environments
subplot(h_plot1(1));
h_car1 = createCarCircle(radius, sensor_lengde, 'b');                     ---->           createCarCircle

subplot(h_plot2(1));

initSimulation.m 에 있는 createSimulation 함수 호출  ---> createSimulation.m에 createSimulation 함수 원형

createSimulation.m

방식이 두가지가 있는 것으로 추정됨 - Tab에 의해서 결정이 됨
%% ----- Tab 1: Q-Table
%% ----- Tab 2: Neural Network

버튼 및 callback 함수

Tab 2: Neural Network

start_button2 = uicontrol(NN_panel, 'Style', 'pushbutton', 'BackGroundColor', 'g', 'String', 'Start', 'Position', [800, 220, 90, 45]);

pause_button2 = uicontrol(NN_panel, 'Style', 'pushbutton', 'BackGroundColor', [0.4,0.4,0.4],...
'Enable', 'off', 'String', 'Pause', 'Position', [920, 220, 90, 45]);

upload_button2 = uicontrol(NN_panel, 'Style', 'pushbutton', 'BackGroundColor', 'y',...
'Enable', 'on', 'String', 'Upload', 'Position', [800, 140, 90, 45]);

save_button2 = uicontrol(NN_panel, 'Style', 'pushbutton', 'BackGroundColor', [0.4,0.4,0.4],...
'Enable', 'off', 'String', 'Save', 'Position', [920, 140, 90, 45]);

reset_button2 = uicontrol(NN_panel, 'Style', 'pushbutton', 'BackGroundColor', [0.4,0.4,0.4],...
'Enable', 'off', 'String', 'Reset', 'Position', [855, 60, 100, 50]);

%% ----- Callback functions

set(start_button1, 'callback', {@funStart1, pause_button1, reset_button1, start_button2, pause_button2,...

set(pause_button1, 'callback', {@funPause1, start_button1, reset_button1, start_button2, pause_button2,...

set(reset_button1, 'callback', {@funReset1, start_button1, pause_button1, save_button1});
set(save_button1, 'callback', @funSave1);

set(start_button2, 'callback', {@funStart2, start_button1, pause_button1, reset_button1, pause_button2,...

set(pause_button2, 'callback', {@funPause2, start_button1, pause_button1, reset_button1, start_button2,...

set(reset_button2, 'callback', {@funReset2, start_button2, pause_button2, save_button2});
set(save_button2, 'callback', @funSave2);

명령

mode 변수 있는 값에 따라 명령을 수행함

Reset1

Save1

Reset2

Save2

Start1

Pause1

Start2

Pause2

Static

Dynamic

Start2 -> for j = j:max_trials

-> nnTrials

circularMotion -> rotate

nnFeedForward -> reshape

-> tanhActivation

nnSoftMaxSelection -> getBestAction

doActionCircle -> Move forward/Turn left/Turn right  각속도 계산

-> moveCarCircle -> rotate

-> moveSensor

checkCrash -> obstacle_polygon    xxxxx

-> obstacle_circle       xxxxx

-> obstacleCrash

-> sensorValues -> sensor_vertices     xxxx

-> sensor_temp        xxxxxx

nnGetReward

computeQEstimate -> nnFeedForward

nnBackPropagation

-> reshape

-> zeros

-> length

-> tanhActivation

-> tanhDerivative

### 댓글

댓글 본문
작성자
비밀번호
graphittie 자세히 보기
• 토픽 0
• 모듈 0
• 코스 0
• 봤어요 0
• 댓글 0
• 명예의 전당 0