代做program、代写C++设计编程-代写C/C++编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

您当前位置：首页 >> C/C++编程C/C++编程

代做program、代写C++设计编程

日期：2024-06-07 08:30

Computer Architecture

2024 Spring

Final Project Part 2Overview

Tutorial

● Gem5 Introduction

● Environment Setup

Projects

● Part 1 (5%)

○ Write C++ program to analyze the speciﬁcation of L1 data cache.

● Part 2 (5%)

○ Given the hardware speciﬁcations, try to get the best performance for more

complicated program.

2Project 2

3In this project, we will use a two-level cache

computer system. Your task is to write a

ViT(Vision Transformer) in C++ and optimize it.

You can see more details of the system

speciﬁcation on the next page.

Description

4System Speciﬁcations

● ISA: X86

● CPU: TimingSimpleCPU (no pipeline, CPU stalls on every memory request)

● Caches

* L1 I cache and L1 D cache connect to the same L2 cache

● Memory size: 8192MB

I cache

size

I cache

associativity

D cache

size

D cache

associativity

Policy Block size

L1 cache 16KB 8 16KB 4 LRU 32B

L2 cache – – 1MB 16 LRU 32BViT(Vision Transformer) – Transformer Overview

● A basic transformer block consists of

○ Layer Normalization

○ MultiHead Self-Attention (MHSA)

○ Feed Forward Network (FFN)

○ Residual connection (Add)

● You only need to focus on how to

implement the function in the red box

● If you only want to complete the project

instead of understanding the full

algorithm about ViT, you can skip the

section masked as redViT(Vision Transformer) – Image Pre-processing

● Normalize, resize to (300,300,3) and center crop to (224,224,3)ViT(Vision Transformer) – Patch Encoder

● In this project, we use Conv2D as Patch

Encoder with kernel_size = (16,16), stride =

(16,16) and output_channel = 768

● (224,224,3) -> (14,14, 16*16*3) -> (196, 768)ViT(Vision Transformer) – Class Token

● Now we have 196 tokens and each

token has 768 features

● In order to record global information, we

need concatenate one learnable class

token with 196 tokens

● (196,768) -> (197,768)ViT(Vision Transformer) – Position Embedding

● Add the learnable position information

on the patch embedding

● (197,768) +

position_embedding(197,768) ->

(197,768)ViT(Vision Transformer) – Layer Normalization

# of tokens

embedded dimension

● Normalize each token

● You need to normalize with the formulaAttention

ViT(Vision Transformer) – MultiHead Self Attention (1)

● Wk

, Wq

, Wv

∈ RC✕C

● b

, bk

, bv

∈ RC

● W

∈ RC✕C

● b

∈ RC

Input

Linear

Projection

X Attention

split

into

heads

merge

heads

Output

Linear

Projection

, Wq

, Wv W

, bk

, bv b

ViT(Vision Transformer) – MultiHead Self Attention (2)

# of tokens

embedded dimension

● Get Q, K, V ∈ RT✕(NH*H) after input linear projection

● Split Q, K, V into Q1

, Q2

, Q3

,..., QNH K1

, K2

, K3

,..., KNH V1

, V2

, V3

,..., VNH

∈ RT✕H

hidden dimension

Linear Projection and split into heads

Linear Projection

Q = XWq

+ b

K = XWk

+ bk

V = XW

+ b

# of head C = H * NHViT(Vision Transformer) – MultiHead Self Attention (2)

● For each head i, compute Si

= QiKi

/square_root(H) ∈ RT✕T

● Pi = Softmax(Si

) ∈ RT✕T

, Softmax is a row-wise function

● Oi = Pi Vi ∈ RT✕H

Matrix

Multiplication

and scale

Softmax

Matrix

Multiplication Vi

SoftmaxViT(Vision Transformer) – MultiHead Self Attention (3)

# of tokens

embedded dimension

● Oi ∈ RT✕H

, O = [O1

, O2

,...,O2

]

hidden dimension

merge heads and Linear Projection

Linear Projection

output = OWo

+ b

# of headViT(Vision Transformer) – Feed Forward Network

● Get Q, K, V ∈ RT✕(h*H) after input linear projection

● Split Q, K, V into Q1

, Q2

, Q3

,..., Qh

, K2

, K3

,..., Kh V1

, V2

, V3

,..., Vh ∈ RT✕H

# of tokens

embedded dimension

Input

Linear

Projection

# of tokens

hidden dimension

GeLU

output

Linear

ProjectionViT(Vision Transformer) – GeLU

17ViT(Vision Transformer) – Classiﬁer

● Contains a Linear layer to transform 768 features to 200 class

○ (197, 768) -> (197, 200)

● Only refer to the ﬁrst token (class token)

○ (197, 200) -> (1, 200)ViT(Vision Transformer) – Work Flow

Pre-pocessing

Embedder

Transformer x12

Classiﬁer

m5_dump_init

Load_weight

m5_dump_stat

Argmax

layernorm

MHSA

layernorm

FFN

matmul

attention

matmul

layernorm

matmul

Black footed Albatross

gelu

matmul

gelu

$ make gelu_tb

$ make matmul_tb

$ make layernorm_tb

$ make MHSA_tb

$ make feedforward_tb

$ make transformer_tb

$ run_all.sh

layernorm

MHSA

residualViT(Vision Transformer) – Shape of array

layernorm token 1 token 2 …… token T

input/output [T*C]

MHSA input/output/o [T*C]

MHSA qkv [T*3*C] q token 1

k token 1 v token 1 …… q token T k token T v token T

feedforward input/output [T*C]

feedforward gelu [T*OC] token 1

token 2 …… token TCommon problem

● Segmentation fault

○ ensure that you are not accessing a nonexistent memory address

○ Enter the command $ulimit -s unlimited All you have to do is

● Download TA’s Gem5 image

○ docker pull yenzu/ca_ﬁnal_part2:2024

● Write C++ with understanding the algorithm in ./layer folder

○ make clean

○ make <layer>_tb

○ ./<layer>_tbAll you have to do is

● Ensure the ViT will successfully classify the bird

○ python3 embedder.py --image_path images/Black_Footed_Albatross_0001_796111.jpg

--embedder_path weights/embedder.pth --output_path embedded_image.bin

○ g++ -static main.cpp layer/*.cpp -o process

○ ./process

○ python3 run_model.py --input_path result.bin --output_path torch_pred.bin --model_path

weights/model.pth

○ python3 classiﬁer.py --prediction_path torch_pred.bin --classiﬁer_path

weights/classiﬁer.pth

○ After running the above commands, you will get the following top5 prediction.

● Evaluate the performance of part of ViT, that is layernorm+MHSA+residual

○ Need about 3.5 hours to ﬁnish the simulation

○ Check stat.txtGrading Policy

● (50%) Veriﬁcation

○ (10%) matmul_tb

○ (10%) layernorm_tb

○ (10%) gelu_tb

○ (10%) MHSA_tb

○ (10%) transformer_tb

● (50%) Performance

○ max(sigmoid((27.74 - student latency)/student latency))*70, 50)

● You will get 0 performance point if your design is not veriﬁed.Submission

● Please submit code on E3 before 23:59 on June 20, 2024.

● Late submission is not allowed.

● Plagiarism is forbidden, otherwise you will get 0 point!!!

● Format

○ Code: please put your code in a folder

named FP2_team<ID>_code and compress

it into a zip ﬁle.

2FP2_team<ID>_code folder

● You should attach the following documents

○ matmul.cpp

○ layernorm.cpp

○ gelu.cpp

○ attention.cpp

○ residual.cpp

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：program代做、代写c/c++，Java编程

【下一篇】：program代做、代写c/c++，Java编程

联系方式

最新辅导

热门辅导

您当前位置：首页 >> C/C++编程C/C++编程

代做program、代写C++设计编程

日期：2024-06-07 08:30

相关文章