Homework 2
Due date: 11:59 pm 3/2/2024
Submission: please submit a report including the answer for each question and corresponding screenshots.
A statistical attack on a polyalphabetic cipher, such as the Vigenère cipher, is a
method used to break the encryption without prior knowledge of the key. The
Vigenère cipher, for instance, uses a series of different Caesar ciphers based on
the letters of a keyword. While more secure, polyalphabetic ciphers are still
vulnerable to statistical attacks if the message is long enough. This attack can be achieved by using the frequency analysis of English letter in English language. The following table is the frequency of each letter in English language.
Taks 1. Please follow the steps to achieve the statistical attack for poly-alphabetic cipher with known key length.
Here is the ciphertext:
usojlbxpkieoiqmwucdyiaxamvknplbeaxcfjmvsjmvsfzblxstgrbbmfxgwyvhjgkamvknp axacjbachvqnvkmggrqztwfnmxgljglrwquuelgclaqlbaypgqcnkivikfagrtbaaswmrkmm jwvbrbfyqvgmltmgfypulbamjmbyuqgdgvoukqhlvmuwcwlsjivbvnmmxstxrutewxqbr zwusvgmfnmusvgiiltrseuqvteykjtidbacvmullxmggrqldqlbavgwkqhlgjvbvaxpnmeyjb acqttimqwc
Assume that the keyword has length 8, we need to partition the ciphertexts into different blocks and there are 8 letters in each block. Then the first letter of each block is encrypted with the same letter – they are encrypted with the same shifting number. Similarly, the second letter of each block is encrypted with the same shifting number. The same goes for the third to the seventh letters of each block. Because shifting the same number is easily broken by frequency analysis, we can discover the letters of the keyword. Here is how we can proceed.
Step 1: partition the ciphertext into different blocks.
Usojlbxp kieoiqmw ucdyiaxa mvknplbe axcfjmvs jmvsfzbl xstgrbbm
fxgwyvhj gkamvknp axacjbac hvqnvkmg grqztwfn mxgljglr wquuelgc
laqlbayp gqcnkivi kfagrtba aswmrkmm jwvbrbfy qvgmltmg fypulbam
jmbyuqgd gvoukqhl vmuwcwls jivbvnmm xstxrute wxqbrzwu svgmfnmu
svgiiltr seuqvtey kjtidbac vmullxmg grqldqlb avgwkqhl gjvbvaxp nmeyjbac qttimqwc
Step 2: calculate the frequency of each letter (a-z) among the first letter of each block using the code HW2_task1.py. You will find the most frequent letter is .
Step 3: since in English language the most frequent letter is “e”, then we can guess the first letter of key is .
Step 4: let us repeat step 2 and step 3 for the second, third, fourth, fifth, sixth and seventh letter of each block respectively. We can get the full key is .
Step 5: you decrypt the ciphertext with the above full key. Now you can find your decryption since . (For this task, you should analyze your decryption failure or success and the corresponding reason. )
Task 2. Compared to the task 1 which only uses the most frequent letter “e” in English for frequency analysis during the statistical attack, we will calculate the IC information of ciphertext to achieve the statistical attack. Based on the content we learnt in the class, for single letter, the IC in English should be close to 0.065 . After partitioning the ciphertext into different blocks (length of each block should be the same with the key), the i-th letter of each block (i in the range of [1, key length]) should shift the same number. Thus we calculate the IC of i-th letter among all blocks.
Process to compute the IC. Let the length of the text be N and let the size of the alphabet be n. Consider the i-th letter a_i in the alphabet. Suppose a_i appears in the given text Fi times. Since the number of a_i's in the text is Fi, picking the first a_i has Fi different choices and picking the second ai has only Fi-1 different choices because one a_i has been selected. Since there are N(N-1) different ways of picking two characters from the text, the probability of having two a_i's is
Therefore, the indexof coincidence is:
Note that English has n = 26 letters.
Please follow the steps below using the code HW2_task2.py:
Step 1: you need to guess the key length with IC. Please fill out the missing code (yellow part) based on the above information and comments.
******************************************************************
from collections import Counter
import numpy as np
# Function to calculate the Indexof Coincidence (IC)
def index_of_coincidence(text):
Write the code to calculate the IC
return ic
# Function to calculate the average IC for different key lengths
def average_ic(ciphertext, max_key_length):
avg_ics = []
for key_length in range(1, max_key_length + 1):
ics = [] # This list is to store the ic value for i-th position of blocks
for i in range(key_length):
nth_letters = write the code to create a substring of the ciphertext starting
from index i and selecting every key_length-th character since these letters shift
the same number
ic = index_of_coincidence(nth_letters)
ics.append(ic)
avg_ic = np.mean(ics)
avg_ics.append((key_length, avg_ic))
return avg_ics
# Sample ciphertext (replace this with your actual ciphertext)
ciphertext=
"usojlbxpkieoiqmwucdyiaxamvknplbeaxcfjmvsjmvsfzblxstgrbbmfxgwyvhjgkamvkn paxacjbachvqnvkmggrqztwfnmxgljglrwquuelgclaqlbaypgqcnkivikfagrtbaaswmrkm mjwvbrbfyqvgmltmgfypulbamjmbyuqgdgvoukqhlvmuwcwlsjivbvnmmxstxrutewxq brzwusvgmfnmusvgiiltrseuqvteykjtidbacvmullxmggrqldqlbavgwkqhlgjvbvaxpnmey jbacqttimqwc "
# Calculate and display the average IC for key lengths up to 10 (you can adjust this value)
max_key_length = 10
avg_ics = average_ic(ciphertext, max_key_length)
# Print the average ICs for each key length
for key_length, ic in avg_ics:
print(f"Key Length: {key_length}, Average IC: {ic:.4f}")
# Identify the key length with the IC closest to the expected IC for the language (0.065 for English)
expected_ic = 0.065
closest_key_length = min(avg_ics, key=lambda x: abs(x[1] - expected_ic))[0]
print(f"\nClosest key length to expected IC is: {closest_key_length}")
******************************************************************
After filling this, please run it and put the screenshot here. From this program, you will get the key length that is the closest to expected IC= 0.065.
Step 2: once you find the key length, you should find the specific key. Please read the following code and answer the questions.
******************************************************************
def key_finder(list, keylength, i):
smallest_x2 = float('inf')
best_c = ' '
N = 0
expected_freq = [
0.08167, 0.01492, 0.02782, 0.04253, 0.12702, 0.02228, 0.02015, 0.06094, 0.06966, 0.00153, 0.00772, 0.04025, 0.02406, 0.06749, 0.07507, 0.01929, 0.00095, 0.05987, 0.06327, 0.09056, 0.02758,
0.00978, 0.0236, 0.0015, 0.01974, 0.00074
]
observed_freq = [0] * 26
freq = [0] * 26
for j in range(i, len(list),keylength):
freq[ord(list[j]) - ord('a')] += 1
N = sum(freq)
for j in range(26):
observed_freq[j] = freq[j] / N
for j in range(26):
x2 = 0
for k in range(26):
x2 += ((observed_freq[(j + k) % 26] - expected_freq[k]) ** 2) / expected_freq[k]
if x2 < smallest_x2:
smallest_x2 = x2
best_c = chr(ord('A') + j)
return best_c
******************************************************************
Q1: In this code, instead of using the frequent letter “e” in English, the keypoint to guess the correct key is to compute “x2 += ((observed_freq[(j + k) % 26] - expected_freq[k]) ** 2) / expected_freq[k]” . Based on your understanding, what does this line mean? Compared to use the frequent letter frequent letter “e” in English for guessing password, what is the advantage of this computation?
Q2: What is the key? What is the plaintext? Put your screenshot here.
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。