DSC 20 Mid-Quarter Project
Total Points: 100 (10% of the Course Grade)
Submission due (SD time):
● Checkpoint: Tuesday, November 24th, 11:59pm
● Final: Thursday, December 3st, 11:59pm
Starter Files
Download midqtr_project.zip. Inside this archive, you will find the starter files for
this project.
Checkpoint Submission
You can earn 5 points extra credit by submitting the checkpoint by the due date
above. In the checkpoint submission, you should complete Part 1 and the first
two methods of Part 2 (negate, grayscale) and submit ONLY the
midqtr_project.py file to gradescope.
Checkpoint submission is graded by completion, which means you can get full
points if your code can pass some simple sanity check (no tests against edge
cases). Note that in your final submission, you should still submit these questions,
and you may modify your implementation if you noticed any errors.
Final Submission
Submit ONLY the midqtr_project.py file to gradescope, under the Mid-Quarter
Project portal. You can submit multiple times before the due date, but only the final
submission will be graded.
If you are working with a partner, please have only one person to submit your
group’s work. You can add your partner to your submission once you submit it.
Requirements
1. Do NOT import any packages in midqtr_project.py.
2. Follow the style guide on the course website.
3. You are not required to add new doctests this time. However, you should
still add docstrings for modules, classes and functions/methods.
4. You should add assert statements when the question explicitly requires them.
If assertions are not explicitly required, you can assume that the
arguments are valid.
Overview
In this project, we will discover the basics of image processing, a key area in the
computer science field. In part 1, we will define an abstraction of images in RGB
colorspace with a Python class. In part 2, we will implement multiple image
processing methods that you might have seen in your daily life. In part 3, we will
implement a K-nearest neighbour classifier to classify and predict the labels of
images.
In the digital world, images are defined as 3-dimensional matrices: height (row),
width (column), and channel (color). Each (row, col) entry is called a pixel. Height
and width dimensions are intuitive: they define the size of the image. The channel
dimension defines the color of an image.
The most commonly used color model is the RGB color model. In this model, every
color can be defined as a mixture of three primary color channels: Red, Green and
Blue. Thus, the color of a pixel is digitally defined as a triplet (R, G, B). Each
element in this triplet is an integer (called intensity) with value between 0 and 255
(both inclusive), where 0 represents no R/G/B is present and 255 means R/G/B is
fully present. Thus, (0, 0, 0) represents black since no R/G/B are present, and
(255, 255, 255) represents white since all these colors are fully present and mixed.
To better understand how the RGB color model works, you can play around the RGB
value with this online color wheel.
In our project, we will use a 3-dimensional list of integers to structure the pixels.
This picture (source) shows how a pixels list is structured.
The first dimension is the color channel. In other words, len(pixels) = 3, and each
element represents a color channel. Each color channel is a row by column matrix
that represents the intensities (0 - 255) of this color. Therefore, to index a specific
intensity value at channel c, row i and column j of the pixels list, you should use
pixels[c][i][j].
Note that the width of an image is the length of the column dimension (number of
columns), and the height of an image is the length of the row dimension (number of
rows). Since in Python we conventionally consider (row, column) as the order of
dimensions for 2-dimensional lists, make sure to distinguish these notions clearly.
We have also provided an example of pixels list in the midqtr_project_runner.py to
help you understand the structure of pixels list.
Testing
For this project, we will not require you to write new doctests for any functions.
However, it is still for your benefit to test your implementation thoroughly.
Since this project aims to process images, it makes more sense for you to check the
output against real images instead of 3-dimensional lists. Therefore, we have
provided some helper functions for you to read and write actual images, so you can
manually evaluate if your implementation works correctly. The helper functions
(with examples of usage) are provided in the midqtr_project_runner.py. This file
serves as an entry point to your implementation. Feel free to add more functions
and/or test cases in this file. You do not need to submit this runner file to
gradescope.
As a refresher, here are the commands to test your code (with the runner):
● No options: >>> python3 midqtr_project_runner.py
● Interactive mode: >>> python3 -i midqtr_project_runner.py
● Doctest: >>> python3 -m doctest midqtr_project_runner.py
(For Windows users, please use py or python instead of python3.)
You might have noticed that this runner file used two packages: NumPy (np) and
OpenCV (cv2). They are the most common packages to use for image processing.
Although you cannot use these packages in your own implementation, you can use
these packages to help your testing. If you have not installed these packages
before, run the following command in your terminal (at any directory):
python3 -m pip install numpy opencv-python
(For Windows users, please use py or python instead of python3.)
Please let us know if you are having problems installing these packages, or check
out this post.
Part 1: RGB Image
In this part, you will implement the RGBImage class, a template for image objects
in RGB color spaces.
You need to implement the following methods:
__init__ (self, pixels)
A constructor that initializes a RGBImage instance and necessary instance
variables.
The argument pixels is a 3-dimensional matrix, where pixels[c][i][j]
indicates the intensity value in channel c at position (i, j). You must assign this
matrix to self.pixels.
You can assume that the index of the channel c is guaranteed to be 0 (red
channel), 1 (green channel) or 2 (blue channel). You can also assume that each
channel contains a valid (row × col) matrix.
size (self)
A getter method that returns the size of the image, where size is defined as a
tuple of (number of rows, number of columns).
get_pixels (self)
A getter method that returns a COPY of the pixels matrix of the image (as a
3-dimensional list). This matrix of pixels is exactly the same pixels passed to
the constructor.
copy (self)
A method that returns a COPY of the RGBImage instance. You should create a new
RGBImage instance with a copy of the pixels matrix and return this new
instance.
get_pixel (self, row, col)
A getter method that returns the color of the pixel at position (row, col). The
color should be returned as a 3-element tuple: (red intensity, green intensity,
blue intensity) at that position.
Requirement:
Assert that row and col are valid indices.
Part 2: Image Processing Methods
In this part, you will implement several image processing methods in the
ImageProcessing class.
Notes:
(1)All methods in this class have a decorator @staticmethod. This decorator
indicates that these functions do not belong to any instances, and should be
called by the class itself. For example, to use the function negate(), you
should call it like ImageProcessing.negate(image), instead of initializing an
ImageProcessing instance first.
(2)All methods in this class must return a new RGBImage instance with the
processed pixels matrix. After calling any of these methods, the original
image should not be modified.
Hint:
If you find processing 3-dimensional pixels difficult, try to approach these problems
in a 2-dimensional perspective (pick any color channel to work with first) and
broadcast your solution to all three channels. If you are stuck, try to write down the
(row, column) matrix before and after applying the function and derive a pattern.
You need to implement the following methods:
set_pixel (self, row, col, new_color)
A setter method that updates the color of the pixel at position (row, col) to the
new_color inplace. The argument new_color is a 3-element tuple (red intensity,
green intensity, blue intensity). However, if any color intensity in this tuple is
provided as -1, you should not update the intensity at the corresponding
channel.
Requirement:
Assert that row and col are valid indices.
@staticmethod negate (image)
A method that returns the negative image of the given image. To produce a
negative image, all pixel values must be inverted. Specifically, for each pixel with
current intensity value val, this method should update it with (255 - val).
Requirement:
No explicit for/while loops. You can use list comprehensions or map() instead.
Example:
image negate (image)
Note: Checkpoint submission ends here.
@staticmethod grayscale (image)
A method that converts the given image to grayscale. For each pixel (R, G, B) in
the pixels matrix, calculate the average (R + G + B) / 3 and update all channels
with this average, i.e. (R, G, B) -> ((R + G + B) / 3, (R + G + B) / 3, (R
+ G + B) / 3). Note that since intensity values must be integer, you should use
the integer division.
Requirement:
No explicit for/while loops. You can use list comprehensions or map() instead.
Example:
image grayscale (image)
@staticmethod scale_channel (image, channel, scale)
A method that scales the given channel of the image by the given scale. The
channel argument will be one of 0 (R), 1 (G) or 2 (B). The scale argument is a
non-negative numeric value. For each intensity value val in the specified channel,
update it with int(val * scale). However, if the scaled value exceeds the
maximum pixel value 255, you need to cap the value to 255.
Requirement:
No explicit for/while loops. You can use list comprehensions or map() instead.
Example:
image scale_channel
(image, 2, 0.63)
scale_channel
(image, 2, 2.25)
@staticmethod clear_channel (image, channel)
A method that clears the given channel of the image. By clearing a channel, you
need to update every intensity value in the specified channel to 0.
Requirement:
No explicit for/while loops. You can use list comprehensions or map() instead.
Example:
image clear_channel
(image, 0)
clear_channel
(image, 2)
@staticmethod rotate_90 (image, clockwise)
A method that rotates the image for 90 degrees.
The argument clockwise is a boolean value: when it’s true, rotate the image
clockwise (right); otherwise (false), rotate the image counterclockwise (left).
Tip:
The built-in function zip() could be helpful. Try to approach this problem in a
matrix perspective: which matrix operations will help to achieve this purpose?
Requirement:
No explicit for/while loops. You can use list comprehensions or map() instead.
Example:
image rotate_90 (image, True) rotate_90 (image, False)
@staticmethod crop (image, tl_row, tl_col, target_size)
A method that crops the image.
Arguments tl_row and tl_col specify the position of the top-left corner of the
cropped image. In other words, position (tl_row, tl_col) before cropping
becomes position (0, 0) after cropping.
The argument target_size specifies the size of the image after cropping. It is a
tuple of (number of rows, number of columns). However, when the specified
target_size is too large, the actual size of the cropped image might be smaller,
since the original image has no content in the overflowed rows and columns.
Tip:
When a target_size (n_rows, n_cols) is possible to achieve given the original size
of the image, tl_row, and tl_col, (tl_row + n_rows) and (tl_col + n_cols)
give you br_row and br_col, which is the position of the bottom-right corner of
the cropped image.
Requirement:
No explicit for/while loops. You can use list comprehensions or map() instead.
Example:
image crop (image,
50, 75, (75, 50))
crop (image,
100, 50, (100, 150))
size = (190, 190)
actual size = (75, 50),
target_size does not
overflow
actual size = (90, 140),
target_size overflows
both row and column,
thus the actual size is
smaller in both
dimensions.
@staticmethod chroma_key (chroma_image, background_image, color)
A method that performs the chroma key algorithm on the chroma_image by
replacing all pixels with the specified color in the chroma_image to the pixels at
the same places in the background_image. If the color does not present in the
chroma_image, this function won’t replace any pixel, but it will still return a copy.
You can assume that color is a valid (R, G, B) tuple.
Tip:
When testing this function, you can find pictures with a green or blue screen in
the background online, find your favorite pictures as background images, and use
this function to replace the background color with the background image you
choose. Make sure you crop them to the same size before applying this
function.
Requirement:
Assert that chroma_image and background_image are RGBImage instances and
have the same size.
Example:
chroma image background
image
color =
(255, 255, 255)
color =
(255, 205, 210)
(white background
replaced)
(pink font color
replaced)
Part 3: Image KNN Classifier
Classification is one of the major tasks in machine learning, which aims to predict
the label of a piece of unknown data by learning a pattern from a visible collection
of data. To train a classifier, you need to fit the classifier with training data, which is
a collection of (data, label) pairs. The classifier will apply the training data to its
own algorithm. After training, you can provide a piece of known or unknown data,
and the classifier will try to predict a label. By training a classification algorithm, we
can extract essential information from pieces of complicated data.
In this part, you will implement a K-nearest Neighbors (KNN) classifier for the RGB
images. Given an image, this algorithm will predict the label by finding the most
popular labels in a collection (with size k) of nearest training data.
But how could we evaluate how near two images are? We need a definition of
distance between images. With this definition, we can find the nearest training
data by finding the shortest distances. Here, we use the Euclidean distance as our
definition of distance. Since images are represented as 3-dimensional matrices, we
can first flatten them to 1-dimensional vectors, then apply the Euclidean distance
formula. For image a and b, we define their Euclidean distance as:
d (a, b) = √(a ) a ) .. a b ) 1 − b1
2 + ( 2 − b2
2 + . + ( n − n
2
Where ai and bi (1 <= i <= n based on the above equation) are the intensity
values at the same position of two image matrices, and n is the count of individual
intensity values in the image matrices, which equals to (number of channels
(which is 3 for RGB) × number of rows × number of columns). Note that to
calculate the distance, two images must have the same size. Can you figure
out why?
Once we have a notion of distance, we can start implementing a KNN classifier. In
the fitting (training) stage, all you need to do is to store the training data. Then,
in the prediction stage, the algorithm will find the distance between the provided
image and all images in the training data, find k nearest training data, and predict
the label by the most popular label among the k-nearest training data.
Our blueprint of the classifier will be defined in the ImageKNNClassifier class. In this
class, you need to implement the following methods:
__init__ (self, n_neighbors)
A constructor that initializes a ImageKNNClassifier instance and necessary
instance variables.
The argument n_neighbors defines the size of the nearest neighborhood (i.e.
how many neighbors your model will find to make the prediction). When
predicting the labels, this classifier will look for the majority between the
n_neighbors closest images.
fit (self, data)
Fit the classifier by storing all training data in the classifier instance. You can
assume data is a list of (image, label) tuples, where image is a RGBImage
instance and label is a string.
Requirements:
(1)Assert that the length of data is greater than self.n_neighbors.
(2)Assert that this classifier instance does not already have training data
stored.
@staticmethod distance (image1, image2)
A method to calculate the Euclidean distance between RGB image image1 and
image2.
To calculate the Euclidean distance, for the value at each position (channel, row,
column) in the pixels of both images, calculate the squared difference between
two values. Then, add all these values of squared difference together. The
Euclidean distance between two images is the square root of this sum. You can
refer to d (a, b) = if you prefer a more formal √(a ) a ) .. a b ) 1 − b1
2 + ( 2 − b2
2 + . + ( n − n
2
definition.
Requirements:
(1)No explicit for/while loops. You can use list comprehensions or map()
instead.
(2)Assert that both arguments are RGBImage instances with the same size.
@staticmethod vote (candidates)
Find the most popular label from a list of candidates (nearest neighbors)
labels. If there is a tie when determining the majority label, you can return any of
them.
After implementing this classifier, try to run the example test in the runner, and
find more images online (again, crop them to the same size in order to calculate
distance) to make your own tests. While testing your implementation, think about
the following questions. You don’t need to submit your answers to these questions
to gradescope.
(1)How long does it take to run a single prediction? If it runs longer than you
expect, why? Think about how many arithmetic calculations this algorithm
needs to predict an image.
(2)Will this algorithm work for all kinds of image data? Why does the example
test in the runner works as expected, while your own tests with online
images and labels might not work very well?
(3)Are there any other ways to define this distance for the nearest neighbor?
Think about it, but don’t change the distance algorithm in your submission.
(4)What are the advantages and disadvantages of using Euclidean distance to
find the nearest neighbors?
predict (self, image)
Predict the label of the given image using the KNN classification algorithm. You
should use the vote() method to make the prediction from the nearest
neighbors.
Requirements:
(1)No explicit for/while loops. You can use list comprehensions or map()
instead.
(2)Assert that the training data is present in the classifier instance. In other
words, assert that fit() method has been called before calling this
method.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。