联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-09-28 07:49

The DragonGame AI Environment

“Untitled Dragon Game”1 or simply DragonGame, is a 2.5D Platformer game in which the player must

collect all of the gems in each level and reach the exit portal, making use of a jump-and-glide movement

mechanic, and avoiding landing on lava tiles. DragonGame is inspired by the “Spyro the Dragon” game

series from the original PlayStation. For this assignment, there are some changes to the game environment

indicated in pink font. In Assignment 2, actions may have non-deterministic outcomes!

To optimally solve a level, your AI agent must find a policy (mapping from states to actions) which collects

all gems and reaches the exit while incurring the minimum possible expected action cost.

Levels in DragonGame are composed of a 2D grid of tiles, where each tile contains a character representing

the tile type. An example game level is shown in Figure 1.


Figure 1: Example game level of DragonGame, showing character-based and GUI visualiser representations

1

Assignment 2: DragonGame MDP

Key information:

• Due: 3pm, Wednesday 20 September

• This assignment assesses your skills in developing algorithms for solving MDPs.

• Assignment 2 contributes 20% to your final grade.

• This assignment consists of two parts: (1) programming and (2) a report.

• This is an individual assignment.

https://www.foxitsoftware.cn/editmac/?MD=Mshanchu

Assignment 2: DragonGame MDP

Game state representation

Each game state is represented as a character array, representing the tiles and their position on the board.

In the visualizer and interactive sessions, the tile descriptions are graphical assets, whereas in the input file

these are single characters.

Levels can contain the tile types described in Table 1.

Table 1: Table of tiles in DragonGame, their corresponding symbol and effect

Tile Symbol in

Input File

Symbol in

Visualiser

Effect

Solid ‘X’ The player cannot move into a Solid tile. Walk and jump

actions are valid when the player is directly above a Solid tile.

Ladder ‘=’

The player can move through Ladder tiles. Walk, jump, glide

and drop actions are all valid when the player is directly above

a Ladder tile. When performing a walk action on top of a

ladder tile, there is a small chance to fall downwards by 2 tiles

(when the position 2 tiles below is non-solid).

Air ‘ ’ The player can move through Air tiles. Glide and drop actions

are all valid when the player is directly above an Air tile.

Lava ‘*’

Moving into a Lava tile results in Game Over and moving

directly above a Lava tile results in Game Over on the next

turn (as there are no possible moves that can avoid falling into

Lava on the next step).

Super

Jump

‘J’

When the player performs a ‘Jump’ action while on top of a

Super Jump tile, the player will move upwards by between 2

and 5 tiles (random outcome - probabilities given in input file).

If blocked by a solid tile, probabilities for moving to each of

the blocked tiles roll over to the furthest non-blocked tile. For

all other actions, Super Jump tiles behave as ‘Solid’ tiles.

Super

Charge

‘C’

When the player performs a ‘Walk Left’ or ‘Walk Right’ action while on top of a Super Charge tile, the player will move

(left or right) until on top of the last adjoining Super Charge

tile (see example in Table 2), then move in the same direction by between 2 and 5 tiles (random outcome - probabilities

given in input file). If blocked by a solid tile, probabilities for

moving to each of the blocked tiles roll over to the furthest

non-blocked tile. For all other actions, Super Charge tiles behave as ‘Solid’ tiles.

Gem ‘G’

Gems are collected (disappearing from view) when the player

moves onto the tile containing the gem. The player must

collect all gems in order to complete the level. Gem tiles

behave as ‘Air’ tiles, and become ‘Air’ tiles after the gem is

collected.

Exit ‘E’ Moving to the Exit tile after collecting all gems completes the

level. Exit tiles behave as ‘Air’ tiles.

Player ‘P’

The player starts at the position in the input file where this

tile occurs. The player always starts on an ‘Air’ tile. In the

visualiser, the type of tile behind the player is shown in place

of the spaces.

Page 2

https://www.foxitsoftware.cn/editmac/?MD=Mshanchu

Assignment 2: DragonGame MDP

An example of performing the Walk Right action on a Super Charge tile is shown in Table 2:

Table 2: Example Walk Right action on a Super Charge tile

>>> >>> >>> (P) (P) (P) (P)

The ‘ ’ on the green tile represents the current player position, ‘>>>’ represents tiles which are skipped

over, and each (P) represents a possible new position of the player.

Actions

At each time step, the player is prompted to select an action. Each action has an associated cost, representing

the amount of energy used by performing that action. Each action also has requirements which must be

satisfied by the current state in order for the action to be valid. The set of available actions, costs and

requirements for each action are shown in Table 3.

Table 3: Table of available actions, costs and requirements

Action Symbol Cost Description Validity Requirements

Walk Left wl 1.0

Move left by 1 position; if above

a Ladder tile, random chance to

move down by 2; if above a Super Charge tile, move a variable

amount (see example in Table 2)

Walk Right wr 1.0

Move right by 1 position; if above

a Ladder tile, random chance to

move down by 2; if above a Super Charge tile, move a variable

amount (see example in Table 2)

Jump j 2.0

Move up by 1 position; if above a

Super Jump tile, move up by between 2 and 5 (random outcome)

Current player must be above a

Solid or Ladder tile, and new player

position must not be a Solid tile.

Glide Left 1 gl1 0.7 Move left by between 0 and 2 (random outcome) and down by 1.

Glide Left 2 gl2 1.0 Move left by between 1 and 3 (random outcome) and down by 1

Glide Left 3 gl3 1.2 Move left by between 2 and 4 (random outcome) and down by 1

Glide Right 1 gr1 0.7 Move right by between 0 and 2

(random outcome) and down by 1

Glide Right 2 gr2 1.0 Move right by between 1 and 3

(random outcome) and down by 1

Glide Right 3 gr3 1.2 Move right by between 2 and 4

(random outcome) and down by 1

Current player must be above a

Ladder or Air tile, and all tiles in the

axis aligned rectangle enclosing both

the current position and new

position must be non-solid (i.e. Air

or Ladder tile). See example below.

Drop 1 d1 0.3 Move down by 1

Drop 2 d2 0.4 Move down by 2

Drop 3 d3 0.5 Move down by 3

Current player must be above a

Ladder or Air tile, and all cells in the

line between the current position

and new position must be non-solid

(i.e. Air or Ladder tile).

Page 3

https://www.foxitsoftware.cn/editmac/?MD=Mshanchu

Assignment 2: DragonGame MDP

Example of glide action validity requirements for GLIDE RIGHT 2 (‘gr2’):

Current Position Must be Non-Solid Must be Non-Solid

Must be Non-Solid Must be Non-Solid New Position

Page 4

Interactive mode

A good way to gain an understanding of the game is to play it. You can play the game to get a feel for how

it works by launching an interactive game session from the terminal with the following command:

$ python play_game.py <input_file>.txt

where <input_file>.txt is a valid testcase file from the support code with path relative to the current

directory, e.g. testcases/L1.txt

In interactive mode, type the symbol for your chosen action and press enter to perform the action. Type ‘q’

and press enter to quit the game.

DragonGame as an MDP

In this assignment, you will write the components of a program to play DragonGame, with the objective

of finding a high-quality solution to the problem using various sequential decision-making algorithms based

on the Markov decision process (MDP) framework. This assignment will test your skills in defining a MDP

for a practical problem and developing effective algorithms for large MDPs.

What is provided to you

We provide supporting code in Python, in the form of:

1. A class representing the DragonGame environment and a number of helper functions (in game env.py)

– The constructor takes an input file (testcase) and converts it into a DragonGame map

2. A class representing the DragonGame game state (in game state.py)

3. A graphical user interface for visualising the game state (in gui.py)

4. A solution file template (solution.py)

5. A tester (in tester.py)

6. Testcases to test and evaluate your solution (in ./testcases)

https://www.foxitsoftware.cn/editmac/?MD=Mshanchu

Assignment 2: DragonGame MDP

• vi initialise()

• vi is converged()

• vi iteration()

• vi get state value()

• vi select action()

• pi initialise()

• pi is converged()

• pi iteration()

• pi select action()

You can add additional helper methods and classes (either in solution.py or in files you create) if you wish.

To ensure your code is handled correctly by the autograder, you should avoid using any try-except blocks in

your implementation of the above methods (as this can interfere with our time-out handling). Refer to the

documentation in solution.py for more details.

If you are unable to solve certain testcases, you may specify which testcases to attempt in testcases to attempt.

Page 5

Your assignment task

Your task is to develop planning algorithms for determining the optimal policy (mapping from states to

actions) for the agent (i.e. the Dragon), and to write a report on your algorithms’ performance. You will be

graded on both your submitted program (Part 1, 50%) and the report (Part 2, 50%). These percentages

will be scaled to the 20% course weighting for this assessment item.

The provided support code formulates DragonGame as an MDP, and your task is to submit code implementing the following MDP algorithms:

1. Value Iteration (VI)

2. Policy Iteration (PI)

Once you have implemented and tested the algorithms above, you are to complete the questions listed in the

section “Part 2 - The Report” and submit the report to Gradescope.

More detail of what is required for the programming and report parts are given below.

Part 1 — The programming task

Your program will be graded using the Gradescope autograder, using testcases similar to those in the support

code.

Interaction with the testcases and autograder

We now provide you with some details explaining how your code will interact with the testcases and the

autograder (with special thanks to Nick Collins for his efforts making this work seamlessly).

Implement your solution using the supplied solution.py template file. You are required to fill in the

constructor and the following method stubs of the Solver class:

https://www.foxitsoftware.cn/editmac/?MD=Mshanchu

Assignment 2: DragonGame MDP

Grading rubric for the programming component (total marks: 50/100)

For marking, we will use 5 different testcases to evaluate your solution. Each test case is scored out of 10.0

marks in the following categories:

a) Completion/reaching goal (0.5 pts pass/fail)

b) Number of Iterations vs target (up to 1 pt with partial marks)

c) Average time per iteration vs target (up to 1 pt with partial marks) [disqualified if max time per iteration

is much greater than the mean]

d) Convergence of values/policy (1pt pass/fail)

e) Values - distance from reference solution values (up to 1 pt with partial marks) [assessed for VI only]

f) Reward vs target (up to 1 pt with partial marks)

This totals to 5.5 marks per testcase for VI, 4.5 marks per testcase for PI, resulting in 10 marks per testcase,

and 50 marks total over 5 testcases.

Part 2 — The report

The report tests your understanding of MDP algorithms and the methods you have used in your code, and

contributes 50/100 of your assignment mark.

Question 1. MDP problem definition (18 marks)

a) Define the State space, Action space, Transition function, and Reward function components of the

DragonGame MDP planning agent as well as where these are represented in your code. (10 marks)

b) Describe the purpose of a discount factor in MDPs. (3 marks)

c) State and briefly justify what the following dimensions of complexity are of your agent in the DragonGame MDP. (See https://artint.info/3e/html/ArtInt3e.Ch1.S5.html for definitions)

(5 marks)

• Planning Horizon

• Sensing Uncertainty

• Effect Uncertainty

• Computational Limits

• Learning

Question 2. Comparison of algorithms (17 marks)

This question requires comparison of your implementation of value iteration (VI) and policy iteration (PI).

If you did not implement PI, you may receive partial marks for this question by providing insightful relevant

comments on your implementation of VI. For example, if you tried standard VI and asynchronous VI, you may

compare these two approaches for partial marks.

a) Describe your implementations of Value Iteration and Policy Iteration in one sentence each. Include

details such as whether you used asynchronous updates, and how you handled policy evaluation in PI.

(2 marks)

b) Pick three representative testcases to compare the performance of VI and PI, reporting the numerical

values for the following performance measures: (6 marks)

Page 6

https://www.foxitsoftware.cn/editmac/?MD=Mshanchu

Assignment 2: DragonGame MDP

• Time to converge to the solution.

• Number of iterations to converge to the solution.

c) Discuss the difference between the numbers you found for VI and PI. Explain and provide reasons for

why the differences either make sense, or do not make sense. (9 marks)

Question 3. Investigating optimal policy variation (15 marks)

One consideration in the solution of a Markov Decision Process (i.e. the optimal policy) is the trade off between

a risky higher reward vs a lower risk lower reward, which depends on the probabilities of non-deterministic

dynamics of the environment and the rewards associated with certain states and actions.

Consider testcase L4.txt, and explore how the policy of the agent changes with ladder f all prob and

game over penalty. The agent must cross from the bottom left (to collect a gem) to the bottom right (to

reach the exit). There are 3 possible paths from left to right (bottom, middle and top). Because of the

chance to fall when performing a walk action while on top of a ladder, there is a risk of falling into the lava

in the centre.

Figure 2: Testcase L4.txt

• The bottom path has the lowest cost (fewest jumps needed) and highest risk, as the lava tiles above

prevent using jump+glide (which is more expensive than walking but has no fall risk).

• The middle path also has lava tiles above preventing jump+glide, but if the agent falls once it can

glide and land in a safe part of the bottom path for a second chance at making it across.

• The top path has enough headroom to allow jump+glide (which eliminates the risk of falling), but

requires a large number of jumps to reach, so is expensive.

The level of risk presented by the lower paths can be tuned by adjusting the ladder f all prob and the

game over penalty.

If you did not implement PI, you may change the solver type to VI in order to answer this question.

a) Describe how you expect the optimal path to change as the ladder f all prob and game over penalty

values change. Use facts about the algorithms or Bellman optimality equation to justify why you expect

these changes to have such effects. (7.5 marks)

b) Picking three suitable values for ladder f all prob, and three suitable values for the game over penalty,

explore how the optimal policy changes over the 9 combinations of these factors. You should present

the results in a table, indicating whether the agent’s optimal policy is to traverse the top, middle or

bottom path, or something else, using colours to denote the optimal behaviour for each combination.

Do the experimental results align with what you thought should happen? If not, why?

(7.5 marks)

Page 7

https://www.foxitsoftware.cn/editmac/?MD=Mshanchu

Assignment 2: DragonGame MDP

Academic Misconduct

It is the responsibility of the student to ensure that you understand what constitutes Academic Misconduct

and to ensure that you do not break the rules. If you are unclear about what is required, please ask.

It is also the responsibility of the student to take reasonable precautions to guard against unauthorised access

by others to his/her work, however stored in whatever format, both before and after assessment.

In the coding part of assignments, you are allowed to draw on publicly-accessible resources and provided

tutorial solutions, but you must make reference or attribution to its source, by doing the following:

• All blocks of code that you take from public sources must be referenced in adjacent comments in your

code or at the top of your code.

• Please also include a list of references indicating code you have drawn on in your solution.py

docstring.

If you have utilised Generative AI tools such as ChatGPT, you must cite the tool and version in your code

as well as describe in the report. A failure to reference generative AI use may constitute student misconduct

under the Student Code of Conduct.

However, you must not show your code to, or share your code with, any other student under any

circumstances. You must not post your code to public discussion forums (including Ed Discussion)

or save your code in publicly accessible repositories (check your security settings). You must not

look at or copy code from any other student.

All submitted files (code and report) will be subject to electronic plagiarism detection and misconduct proceedings will be instituted against students where plagiarism or collusion is suspected. The electronic plagiarism

detection can detect similarities in code structure even if comments, variable names, formatting etc. are

modified. If you collude to develop your code or answer your report questions, you will be caught.

Late submission

Students should not leave assignment preparation until the last minute and must plan their workloads to meet

advertised or notified deadlines. It is your responsibility to manage your time effectively.

It may take the autograder up to an hour to grade your submission. It is your responsibility to ensure you

are uploading your code early enough and often enough that you are able to resolve any issues that may be

revealed by the autograder before the deadline. Submitting non-functional code just before the deadline, and

not allowing enough time to update your code in response to autograder feedback is not considered a valid

reason to submit late without penalty.

Assessment submissions received after the due time (or any approved extended deadline) will be subject to a

100% late penalty. A one-hour grace period will be applied to the due time after which time the 100% late

Page 8

https://www.foxitsoftware.cn/editmac/?MD=Mshanchu

Assignment 2: DragonGame MDP

Page 9

penalty will be imposed. This grace period is designed to deal with issues that might arise during submission

(e.g. delays with Blackboard or Turnitin) and should not be considered a shift of the due time. Please keep

a record of your submission time.

https://www.foxitsoftware.cn/editmac/?MD=Mshanchu


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp