联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2024-06-07 08:30

Computer Architecture

2024 Spring

Final Project Part 2Overview

Tutorial

● Gem5 Introduction

● Environment Setup

Projects

● Part 1 (5%)

○ Write C++ program to analyze the specification of L1 data cache.

● Part 2 (5%)

○ Given the hardware specifications, try to get the best performance for more

complicated program.

2Project 2

3In this project, we will use a two-level cache

computer system. Your task is to write a

ViT(Vision Transformer) in C++ and optimize it.

You can see more details of the system

specification on the next page.

Description

4System Specifications

● ISA: X86

● CPU: TimingSimpleCPU (no pipeline, CPU stalls on every memory request)

● Caches

* L1 I cache and L1 D cache connect to the same L2 cache

● Memory size: 8192MB

5

I cache

size

I cache

associativity

D cache

size

D cache

associativity

Policy Block size

L1 cache 16KB 8 16KB 4 LRU 32B

L2 cache – – 1MB 16 LRU 32BViT(Vision Transformer) – Transformer Overview

6

● A basic transformer block consists of

○ Layer Normalization

○ MultiHead Self-Attention (MHSA)

○ Feed Forward Network (FFN)

○ Residual connection (Add)

● You only need to focus on how to

implement the function in the red box

● If you only want to complete the project

instead of understanding the full

algorithm about ViT, you can skip the

section masked as redViT(Vision Transformer) – Image Pre-processing

7

● Normalize, resize to (300,300,3) and center crop to (224,224,3)ViT(Vision Transformer) – Patch Encoder

8

● In this project, we use Conv2D as Patch

Encoder with kernel_size = (16,16), stride =

(16,16) and output_channel = 768

● (224,224,3) -> (14,14, 16*16*3) -> (196, 768)ViT(Vision Transformer) – Class Token

9

● Now we have 196 tokens and each

token has 768 features

● In order to record global information, we

need concatenate one learnable class

token with 196 tokens

● (196,768) -> (197,768)ViT(Vision Transformer) – Position Embedding

10

● Add the learnable position information

on the patch embedding

● (197,768) +

position_embedding(197,768) ->

(197,768)ViT(Vision Transformer) – Layer Normalization

11

T

# of tokens

C

embedded dimension

● Normalize each token

● You need to normalize with the formulaAttention

ViT(Vision Transformer) – MultiHead Self Attention (1)

12

● Wk

, Wq

, Wv

∈ RC✕C

● b

q

, bk

, bv

∈ RC

● W

o

∈ RC✕C

● b

o

∈ RC

Input

Linear

Projection

X Attention

split

into

heads

merge

heads

Output

Linear

Projection

Y

Wk

, Wq

, Wv W

o

b

q

, bk

, bv b

o

ViT(Vision Transformer) – MultiHead Self Attention (2)

13

T

# of tokens

C

embedded dimension

● Get Q, K, V ∈ RT✕(NH*H) after input linear projection

● Split Q, K, V into Q1

, Q2

, Q3

,..., QNH K1

, K2

, K3

,..., KNH V1

, V2

, V3

,..., VNH

∈ RT✕H

H

hidden dimension

Linear Projection and split into heads

Linear Projection

Q = XWq

T

+ b

q

K = XWk

T

+ bk

V = XW

v

T

+ b

v

NH

# of head C = H * NHViT(Vision Transformer) – MultiHead Self Attention (2)

14

● For each head i, compute Si

= QiKi

T

/square_root(H) ∈ RT✕T

● Pi = Softmax(Si

) ∈ RT✕T

, Softmax is a row-wise function

● Oi = Pi Vi ∈ RT✕H

Matrix

Multiplication

and scale

Qi

Ki

Softmax

Matrix

Multiplication Vi

Oi

SoftmaxViT(Vision Transformer) – MultiHead Self Attention (3)

15

T

# of tokens

C

embedded dimension

● Oi ∈ RT✕H

, O = [O1

, O2

,...,O2

]

H

hidden dimension

merge heads and Linear Projection

Linear Projection

output = OWo

T

+ b

o

NH

# of headViT(Vision Transformer) – Feed Forward Network

16

● Get Q, K, V ∈ RT✕(h*H) after input linear projection

● Split Q, K, V into Q1

, Q2

, Q3

,..., Qh

K1

, K2

, K3

,..., Kh V1

, V2

, V3

,..., Vh ∈ RT✕H

T

# of tokens

C

embedded dimension

Input

Linear

Projection

T

# of tokens

OC

hidden dimension

GeLU

output

Linear

ProjectionViT(Vision Transformer) – GeLU

17ViT(Vision Transformer) – Classifier

18

● Contains a Linear layer to transform 768 features to 200 class

○ (197, 768) -> (197, 200)

● Only refer to the first token (class token)

○ (197, 200) -> (1, 200)ViT(Vision Transformer) – Work Flow

19

Pre-pocessing

Embedder

Transformer x12

Classifier

m5_dump_init

Load_weight

m5_dump_stat

Argmax

layernorm

MHSA

layernorm

FFN

matmul

attention

matmul

matmul

layernorm

matmul

Black footed Albatross

+

+

gelu

matmul

gelu

$ make gelu_tb

$ make matmul_tb

$ make layernorm_tb

$ make MHSA_tb

$ make feedforward_tb

$ make transformer_tb

$ run_all.sh

layernorm

layernorm

MHSA

residualViT(Vision Transformer) – Shape of array

20

layernorm token 1 token 2 …… token T

C

input/output [T*C]

MHSA input/output/o [T*C]

MHSA qkv [T*3*C] q token 1

C

k token 1 v token 1 …… q token T k token T v token T

feedforward input/output [T*C]

feedforward gelu [T*OC] token 1

OC

token 2 …… token TCommon problem

21

● Segmentation fault

○ ensure that you are not accessing a nonexistent memory address

○ Enter the command $ulimit -s unlimited All you have to do is

22

● Download TA’s Gem5 image

○ docker pull yenzu/ca_final_part2:2024

● Write C++ with understanding the algorithm in ./layer folder

○ make clean

○ make <layer>_tb

○ ./<layer>_tbAll you have to do is

23

● Ensure the ViT will successfully classify the bird

○ python3 embedder.py --image_path images/Black_Footed_Albatross_0001_796111.jpg

--embedder_path weights/embedder.pth --output_path embedded_image.bin

○ g++ -static main.cpp layer/*.cpp -o process

○ ./process

○ python3 run_model.py --input_path result.bin --output_path torch_pred.bin --model_path

weights/model.pth

○ python3 classifier.py --prediction_path torch_pred.bin --classifier_path

weights/classifier.pth

○ After running the above commands, you will get the following top5 prediction.

● Evaluate the performance of part of ViT, that is layernorm+MHSA+residual

○ Need about 3.5 hours to finish the simulation

○ Check stat.txtGrading Policy

24

● (50%) Verification

○ (10%) matmul_tb

○ (10%) layernorm_tb

○ (10%) gelu_tb

○ (10%) MHSA_tb

○ (10%) transformer_tb

● (50%) Performance

○ max(sigmoid((27.74 - student latency)/student latency))*70, 50)

● You will get 0 performance point if your design is not verified.Submission

● Please submit code on E3 before 23:59 on June 20, 2024.

● Late submission is not allowed.

● Plagiarism is forbidden, otherwise you will get 0 point!!!

25

● Format

○ Code: please put your code in a folder

named FP2_team<ID>_code and compress

it into a zip file.

2

2

2FP2_team<ID>_code folder

26

● You should attach the following documents

○ matmul.cpp

○ layernorm.cpp

○ gelu.cpp

○ attention.cpp

○ residual.cpp


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp