联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2021-11-11 10:47

Program Assignment #2

Due day: NOV. 16, 2021

Problem 1: Matrix-Matrix Multiplication

In the first hands-on lab section, this lab introduces a famous and widely-used example

application in the parallel programming field, namely the matrix-matrix multiplication.

You will complete key portions of the program in the CUDA language to compute this

widely-applicable kernel.

In this lab you will learn:

‧ How to allocate and free memory on GPU.

‧ How to copy data from CPU to GPU.

‧ How to copy data from GPU to CPU.

‧ How to measure the execution times for memory access and computation

respectively.

‧ How to invoke GPU kernels.

Your output should look like this:

Input matrix file name:

Setup host side environment and launch kernel:

Allocate host memory for matrices M and N.

M:

N:

Allocate memory for the result on host side.

Initialize the input matrices.

Allocate device memory.

Copy host memory data to device.

Allocate device memory for results.

Setup kernel execution parameters.

# of threads in a block:

# of blocks in a grid :

Executing the kernel...

Copy result from device to host.

GPU memory access time:

GPU computation time :

GPU processing time :

Check results with those computed by CPU.

Computing reference solution.

CPU Processing time :

CPU checksum:

GPU checksum:

Record your runtime with respect to different input matrix sizes as follows:

Matrix Size GPU Memory

Access Time

(ms)

GPU

Computation

Time (ms)

GPU

Processing

Time (ms)

Ratio of

Computation Time

as compared with

matrix 128x128

8 x 8

128 x 128 1

512 x 512

3072 x 3072

4096 x 4096

What do you see from these numbers?

Problem 2: Matrix-Matrix Multiplication with Tiling and Shared Memory

This lab is an enhanced matrix-matrix multiplication, which uses the features of

shared memory and synchronization between threads in a block. The device shared

memory is allocated for storing the sub-matrix data for calculation, and threads share

memory bandwidth which was overtaxed in previous matrix-matrix multiplication lab.

In this lab you will learn:

‧ How to apply tiling on matrix-matrix multiplication.

‧ How to use shared memory on the GPU.

‧ How to apply thread synchronization in a block.

Your output should look like this.

Input matrix file name:

Setup host side environment and launch kernel:

Allocate host memory for matrices M and N.

M:

N:

Allocate memory for the result on host side.

Initialize the input matrices.

Allocate device memory.

Copy host memory data to device.

Allocate device memory for results.

Setup kernel execution parameters.

# of threads in a block:

# of blocks in a grid :

Executing the kernel...

Copy result from device to host.

GPU memory access time:

GPU computation time :

GPU processing time :

Check results with those computed by CPU.

Computing reference solution.

CPU Processing time :

CPU checksum:

GPU checksum:

Record your runtime with respect to different input matrix sizes as follows:

Matrix Size GPU Memory

Access Time

(ms)

GPU

Computation

Time (ms)

GPU

Processing

Time (ms)

Ratio of

Computation Time

as compared with

matrix 128x128

8 x 8

128 x 128 1

512 x 512

3072 x 3072

4096 x 4096

What do you see from these numbers? Have they improved a lot as compared to the

previous matrix-matrix multiplication implementation?

Problem 3: Matrix-Matrix Multiplication with Tiling and Constant Memory

This lab is an enhanced matrix-matrix multiplication, which uses the features of

constant memory and synchronization between threads in a block. Allocate constant

memory for matrices M and N.

Record your runtime with respect to different input matrix sizes as follows:

Matrix Size GPU Memory

Access Time

(ms)

GPU

Computation

Time (ms)

GPU

Processing

Time (ms)

Ratio of

Computation Time

as compared with

matrix 128x128

8 x 8

128 x 128 1

512 x 512

3072 x 3072

4096 x 4096

What do you see from these numbers? Have they improved a lot as compared to the

previous matrix-matrix multiplication implementation?


相关文章

版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp