Welcome in store of eventleek63

Personal information

eventleek63
Spence, Queensland, Philippines
https://writeablog.net/sungarage17/voxel-violence-mod-adds-minecraft-blocks-and-building-to-doom

Deep reinforcement learning has been successfully applied to several visual-input tasks using model-free methods. In this paper, we propose a model-based approach that combines learning a DNN-based transition model with Monte Carlo tree search to solve a block-placing task in Minecraft. Our learned transition model predicts the next frame and the rewards one step ahead given the last four frames of the agent’s first-person-view image and the current action. Then a Monte Carlo tree search algorithm uses this model to plan the best sequence of actions for the agent to perform. On the proposed task in Minecraft, our model-based approach reaches the performance comparable to the Deep Q-Network’s, but learns faster and, thus, is more training sample efficient.

\contourlength
0.9pt

Keywords: Reinforcement Learning, Model-Based Reinforcement Learning, Deep Learning, Model Learning, Monte Carlo Tree Search

I would like to express my sincere gratitude to my supervisor Dr. Stefan Uhlich for his continuous support, patience, and immense knowledge that helped me a lot during this study. My thanks and appreciation also go to my colleague Anna Konobelkina for insightful comments on the paper as well as to Sony Europe Limited for providing the resources for this project.

In deep reinforcement learning, visual-input tasks (i.e., tasks where the observation from the environment comes in the form of videos or pictures) are oftentimes used to evaluate algorithms: Minecraft or various Atari games appear quite challenging for agents to solve [Oh+16], [Mni+15]. When applied to these tasks, model-free reinforcement learning shows noticeably good results: e.g., a Deep Q-Network (DQN) agent approaches human-level game-playing performance [Mni+15] or the Asynchronous Advantage Actor-Critic algorithm outperforms the known methods in half the training time [Mni+16]. These achievements, however, do not cancel the fact that generally, model-free methods are considered “statistically less efficient” in comparison to model-based ones: model-free approaches do not employ the information about the environment directly whereas model-based solutions do [DN08].

Working with a known environment model has its benefits: changes of the environment state can be foreseen, therefore, planning the future becomes less complicated. At the same time, developing algorithms with no known environment model available at the start is more demanding yet promising: less training data is required than for model-free approaches and agents can utilize planning algorithms. The research has been progressing in this direction: e.g., a model-based agent surpassed the DQN’s results by using the Atari games’ true state for modelling [Guo+14], and constructing transition models with video-frames prediction was proposed [Oh+15].

These ideas paved the way for the following question: is it feasible to apply planning algorithms on a learned model of the environment that is only partially observable, such as in a Minecraft building task? To investigate this question, we developed a method that not only predicts future frames of a visual task but also calculates the possible rewards for the agent’s actions. Our model-based approach merges model learning through deep-neural-network training with Monte Carlo tree search, and demonstrates results competitive with those of DQN’s, when tested on a block-placing task in Minecraft.

2 Block-Placing Task in Minecraft

To evaluate the performance of the suggested approach as well as to compare it with model-free methods, namely, DQN, a block-placing task was designed: it makes use of the Malmo framework and is built inside the Minecraft game world [Joh+16].

At the beginning of the game, the agent is positioned to the wall of the playing “room”. There is a 5×\times×5 playing field in the center of this room. The field is white with each tile having a 0.1 probability to become colored at the start. Colored tiles indicate the location for the agent to place a block. The goal of the game is to cover all the colored tiles with blocks in 30 actions maximum. Five position-changing actions are allowed: moving forward by one tile, turning left or right by 90°, and moving sideways to the left or right by one tile. When the agent focuses on a tile, it can place the block with the 6th action. For each action, the agent receives a feedback: every correctly placed block brings a +1 reward, an erroneously put block causes a -1 punishment, and any action costs the agent a -0.04 penalty (this reward signal is introduced to stimulate the agent to solve the task with the minimum time required). To evaluate the environment, the agent is provided with a pre-processed (grayscaled and downsampled to 64×\times×64) first-person-view picture of the current state. The task is deterministic and discrete in its action and state space. The challenge lies in the partial observability of the environment with already placed blocks further obscuring the agent’s view. It is equally important to place blocks systematically to not obstruct the agent’s pathway. An example of the task is depicted in Fig. 1 (left) and a short demonstration is available at https://youtu.be/AQlBaq34DpA.

3 Model Learning

To learn the transition model, a deep convolutional neural network is used. The network takes the last four frames st-3,…,stsubscript

Welcome in store of eventleek63

Personal information

Send message to seller

Latest listings

Do not miss hot items

Favorite categories

Information