联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2021-05-17 10:34

CISC372-Parallel

Project 6

Overview:

In the last project, you implemented an image filter program using pThreads. A special case of the filter

program is called a box blur. Basically, this type of filter involves setting all of the values in the matrix to

1, then dividing the result by the number of values in the matrix.

???????????? = [

1 1 1

1 1 1

1 1 1

] ???? ?????? ??????? ?????????????? ???? ????? ??????????, ?????????? ????? ????????????, ?????? ???????????? ???? 9

The issue we had in the last program, is that when an image is high resolution, a 3x3 filter does very little

to change the appearance of the image. We would like a bigger filter (i.e. the radius here is 1, we might

want a radius of 20 or 40), but this would make the problem somewhat intractable.

A fast way to do this is to simply keep a running sum for each row of the last 2*radius+1 elements, then

take the resultant image, and do the same for each column. If we divide each of these by the width of

the kernel (2*radius+1), Then we end up computing exactly what the filter computes (average around a

radius), with exactly one pass through the columns and one pass through the rows. Now that each row

and each column is independent, we have a hope of parallelizing this algorithm.

Project Details:

For this project, you may either work alone, or in pairs. You will have until the final Friday (5/14) to

complete this assignment. If you work in pairs, make sure that the header of all files that you generate

contains the names of both people who worked on the project so that you both get credit. Both people

should hand in the final project via Canvas. You may run this code anywhere you like (on PSC, on

cisc372 using srun, or on your own machine configured for CUDA). You should hand in your final .cu file

and any other files you produce.

Part 1: Fast Blur

You can retrieve my fast blur code from github, along with a sample image (Gauss,jpg) from github at:

gsilber/CISC372_HW6 (github.com)

Use the included makefile to build the program. You can run it as is by executing ./fastblur gauss.jpg 40

where 40 is the desired radius (this is a big image). You can play with different values of radius to see

how it behaves. The radius is dependent on the image resolution. On different resolutions, the radius is

a different percentage of the entire image, and thus will have a different blurring effect.

Part 2: Simple CUDA

In this part of the project, you should modify the fastblur.c file to create cudablur.cu (cuda code must

have a .cu extension to work). You will need to change the makefile to use nvcc instead of gcc to

compile for cuda.

Rewrite the program, so that each column runs in its own thread. I suggest a thread block size of 256.

This means turning the computeColumn function into a kernel, and figuring out the col parameter from

the threadIdx, blockIdx, and blockdim variables.

Then you must sync up the threads with a call to cudaDeviceSync and repeat the process for each row.

Finally convert back to uint8_t array, and save the image.

I suggest for this part you use cudaMallocManaged and cudaFree for all the arrays to simplify the code.

If you have a block size of 256, then you would have a block count of (width+255)/256 columns. Make

sure to check in your kernel function for unused threads where the computed column>pWidth. Do the

same for the rows (height+255)/256. And check the computed row against height. If the height or width

is not divisible by the blocksize, then we will have some extra threads that need to just return

immediately.

Part 3: More advanced CUDA

Part 2 is kind of slow. This is because of the managed memory. To speed it up, we want to allocate the

memory we need on the device where possible and move that memory with cudaMalloc and

cudaMemcpy up to the device for calculation. Then when complete, copy that memory back to the host

in order to save it to the output file. Play with the values for blocksize to try to maximize performance.

See how fast you can get the computation to run.

What to hand in:

Hand in your cudaBlur.cu file from part2, and from part3 along with makefiles for each and any other

files you added which are required to build your program. Make sure your program compiles and runs,

and put the system where you ran it in the comments to avoid any confusion.

Grading

This is a hard project. My intent is that most people will be able to do part 2, so part 2 is worth 75% of

the grade on this project. Part 3 is worth the remaining 25%.


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp