강화학습 매트랩 코드 분석

처음 시작하는 곳

main.m 에서 시작하여 초기에 initsimulation.m에 있는 initsimulation 함수를 호출한다.

main.m

initSimulation;

%% ----- Main loop

while(true)

버튼이 눌려지면 모드가 변경이 되고 변경된 모드에 의해서 작업이 수행이 되는 형식으로 진행이 된다.

elseif( strcmp(mode,'Start2') )

mode = '';

info_s1 = sprintf('%sd%21sd%22s.2f', 'Trials:', j-1, 'Steps:', steps2, 'Total reward:', reward2);

for j = j:max_trials

subplot(h_plot2(1));

[nn_params, Q_nn, Q_est, reward2, steps2, n_crash2, cost, info_s2] = ...

nnTrials(nn_params, input_layer_size, hidden_layer_size,...

output_layer_size, alpha2, gamma, T2, actions, max_steps,...

n_crash2, stadium_option2, h_car2, h_poly2, h_circ2, radius, wheel_radius, info_s1, h_text2);

subplot(h_plot2(1));

delete(h_car2);

h_car2 = createCarCircle(radius, sensor_lengde, 'b');

drawnow;

if( strcmp(mode,'Pause2') ), break; end

if(T2 > 1.0e-3), T2 = 0.95*T2; end

info_s1 = sprintf('%sd%21sd%22s.2f', 'Trials:', j, 'Steps:', steps2, 'Total reward:', reward2);

info_s = sprintf('%s\n%s', info_s1, info_s2);

set(h_text2, 'String', info_s);

n_steps2 = n_steps2 + steps2;

tot_steps2 = tot_steps2 + steps2;

total_reward2 = total_reward2 + reward2;

cost_fun(j) = cost;

if( mod(j,10) == 0)

n2 = n2 + 1;

x2(n2) = n2-1;

y3(n2) = n_steps2/10;

y4(n2) = total_reward2;

subplot(h_plot2(2));

plot(x2,y3);

xlabel('Trials x10');

ylabel('Average steps each 10th trials');

subplot(h_plot2(3));

plot(x2,y4);

title('Total reward');

xlabel('Trials x10');

ylabel('Accumulated reward');

n_steps2 = 0;

subplot(h_plot2(1));

drawnow;

end

if(j == 300 && ~figure2)

set(h_button2(1), 'BackGroundColor', [0.4,0.4,0.4], 'Enable', 'off');

set(h_button2(2), 'BackGroundColor', 'b', 'Enable', 'on');

figure2 = true;

end

if(mod(j,50) == 0), clc; end

Tekst = sprintf('Trials: %d', j);

fprintf([Tekst,'\n']);

end

initsimulation.m

% GUI

[h_plot1, h_plot2, h_text1, h_text2, h_button1, h_button2] = createSimulation(xlimit, ylimit, max_trials, max_steps,...

alpha1, alpha2, gamma, epsilon, T1);

% Simulation Environments

subplot(h_plot1(1));

[h_poly1, h_circ1] = createStadium(stadium_option1); ---> createStadium.m

h_car1 = createCarCircle(radius, sensor_lengde, 'b'); ----> createCarCircle

subplot(h_plot2(1));

[h_poly2, h_circ2] = createStadium(stadium_option2);

h_car2 = createCarCircle(radius, sensor_lengde, 'b');

initSimulation.m 에 있는 createSimulation 함수 호출 ---> createSimulation.m에 createSimulation 함수 원형

createSimulation.m

방식이 두가지가 있는 것으로 추정됨 - Tab에 의해서 결정이 됨

%% ----- Tab 1: Q-Table

%% ----- Tab 2: Neural Network

버튼 및 callback 함수

Tab 2: Neural Network

start_button2 = uicontrol(NN_panel, 'Style', 'pushbutton', 'BackGroundColor', 'g', 'String', 'Start', 'Position', [800, 220, 90, 45]);

pause_button2 = uicontrol(NN_panel, 'Style', 'pushbutton', 'BackGroundColor', [0.4,0.4,0.4],...

'Enable', 'off', 'String', 'Pause', 'Position', [920, 220, 90, 45]);

upload_button2 = uicontrol(NN_panel, 'Style', 'pushbutton', 'BackGroundColor', 'y',...

'Enable', 'on', 'String', 'Upload', 'Position', [800, 140, 90, 45]);

save_button2 = uicontrol(NN_panel, 'Style', 'pushbutton', 'BackGroundColor', [0.4,0.4,0.4],...

'Enable', 'off', 'String', 'Save', 'Position', [920, 140, 90, 45]);

reset_button2 = uicontrol(NN_panel, 'Style', 'pushbutton', 'BackGroundColor', [0.4,0.4,0.4],...

'Enable', 'off', 'String', 'Reset', 'Position', [855, 60, 100, 50]);

%% ----- Callback functions

set(start_button1, 'callback', {@funStart1, pause_button1, reset_button1, start_button2, pause_button2,...

reset_button2, upload_button1, save_button1, upload_button2, save_button2});

set(pause_button1, 'callback', {@funPause1, start_button1, reset_button1, start_button2, pause_button2,...

reset_button2, upload_button1, save_button1, upload_button2, save_button2});

set(reset_button1, 'callback', {@funReset1, start_button1, pause_button1, save_button1});

set(upload_button1, 'callback', @funUpload1);

set(save_button1, 'callback', @funSave1);

set(start_button2, 'callback', {@funStart2, start_button1, pause_button1, reset_button1, pause_button2,...

reset_button2, upload_button1, save_button1, upload_button2, save_button2});

set(pause_button2, 'callback', {@funPause2, start_button1, pause_button1, reset_button1, start_button2,...

reset_button2, upload_button1, save_button1, upload_button2, save_button2});

set(reset_button2, 'callback', {@funReset2, start_button2, pause_button2, save_button2});

set(upload_button2, 'callback', @funUpload2);

set(save_button2, 'callback', @funSave2);

명령

mode 변수 있는 값에 따라 명령을 수행함

Reset1

Upload1

Save1

Reset2

Upload2

Save2

Start1

Pause1

Start2

Pause2

Static

Dynamic

Start2 -> for j = j:max_trials

-> nnTrials

circularMotion -> rotate

nnFeedForward -> reshape

-> tanhActivation

nnSoftMaxSelection -> getBestAction

doActionCircle -> Move forward/Turn left/Turn right 각속도 계산

-> moveCarCircle -> rotate

-> moveSensor

checkCrash -> obstacle_polygon xxxxx

-> obstacle_circle xxxxx

-> obstacleCrash

-> sensorValues -> sensor_vertices xxxx

-> sensor_temp xxxxxx

nnGetReward

computeQEstimate -> nnFeedForward

nnBackPropagation

-> reshape

-> zeros

-> length

-> tanhActivation

-> tanhDerivative

gradientDescent

강화학습 매트랩 코드 분석

댓글