Parallel Computing: Homework 6
Introduction
During this last lab, you will work on parallelising and optimising a password guessing parallel
program. We will provide you with a basic implementation of such a program that you are expected
to parallelise using MPI. In addition, you are expected to develop heuristics to create an effective
password guessing strategy.
Background
On Unix systems, user account information is stored in two separate files: /etc/passwd and
/etc/shadow. The passwd file contains information such as the user name, home directory, and
preferred shell, but counterintuitively, it does not contain any information about the password. The
password hash is stored in /etc/shadow, together with information about which hashing algorithm
and which salt1 were used.
The format of /etc/passwd is as follows:
jde100:x:1001:1002:John Doe,,,:/home/jde100:/bin/bash
jde100 is the username.
x indicates that the password is stored in /etc/shadow (it used to be stored
here).
1001 is the user ID (UID).
1002 is the group ID (GID).
John Doe,,, is the comment field (GECOS), typically a comma-separated list with as first
item the user’s full name, and after that other information like phone numbers.
/home/jde100 is the user’s home directory.
/bin/bash is the user’s preferred shell.
The format of /etc/shadow is as follows:
jde100:$1$M9$80ZJhwG6Ngh1lLoKg5MdC1:14486:1:2:3:4
jde100 is again the username.
$1$M9$...)) is the password hash. There are three sections to it, separated by $. The first
section (1) is the algorithm used (MD5); the second section is the salt used
(M9), and the last section is the hashed password in base64 format. In this case,
it is the hashed version of iloveyou.
14486 is day the password was last changed; in days since Jan 1st, 1970 (the Unix
epoch).
1 is the minimum number of days required between password changes.
2 is the maximum number of days a password is valid.
3 is the number of days before the expiry date of the password that a warning
message should be shown about changing the password.
4 is the expiration date of the account.
1https://www.hypr.com/salt/
1
Objective
The objective of this assignment is to guess passwords as efficiently and effectively as possible. This
has two aspects: on the one hand your algorithm needs to be efficient and parallellised, but on the
other hand your password guessing strategy will greatly influence your results: simply starting
with a, b, ... probably won’t get you very far.
Your final submitted code will get exactly 1 minute of runtime on Hábròk, on 128 cores (2
nodes with 64 cores each) and 32GB of memory (256MB per core), to produce as many correct
passwords as possible. The performance of your implementation in this test will influence your
grade. Additionally, there is a small competitive factor: the relative ranking of your implementation
as compared to that of your classmates might award you a bonus point (so the maximum grade for
this assignment is 11).
Execution
On Brightspace, we have already provided a fully-functional password guesser that handles
everything from reading the shadow and passwd files, to computing hashes and printing the results.
However, this implementation is (of course) single-threaded and is not very intelligent about its
guessing strategy; you can definitely do better.
Please make sure to carefully read the comments in the provided header files and in guessword.c,
outlining all data structures and functionality provided. In particular, the function parseInput()
will read and parse the shadow and passwd files, and put all data in appropriate structs. The
function tryPasswords() will hash all the given passwords in a list and try to match them with
the ‘target’ list of hashes - if a result is found, the username and password combination will be
printed.
The code comes with an accompanying Makefile, which makes it easy to start going with the code:
after installing OpenMPI or MPICH, simply run make to (re-)compile your code, after which
you should have an executable ./guessword. This executable expects two arguments: the path to
a password file and the path to a shadow file, in that order. Contrary to previous assignments, the
number of threads/cores is not passed to the program directly, as this is handled by MPI.
Parallelisation
You are expected to parallellise your code using the most suitable MPI functions. Before getting
to work, think thoroughly about the parallellisation approach you will be using. How will you split
the work? How will you get that work to the different processes? How can you ensure all processes
will continue working as much as possible? How will you ensure each process is doing the most
effective work at any given moment? How will you deal with the potentially large amounts of data?
How to limit the initialisation overhead? These are just some of the questions you should ask
yourself before starting to work on a particular implementation, to prevent yourself from getting
stuck after many hours of hard work.
The provided Makefile already compiles your code using mpicc, so running your code in an
MPI environment is as simple as running mpiexec -n [threads] ./guessword [passwd] [shadow] . As
always, the number of processes running is available to you through MPI_Comm_size(). The
function parseInput() has a parameter that allows you to mute the debugging output. Note that
the framework code does not require changes for a parallel implementation - in particular the
printing to stdout by tryPasswords() should continue to function without issues using MPI. 2
2One might expect issues where different ‘threads’ are printing at the same time, causing the output to be garbled,
but from our testing this seems to be prevented by MPI.
2
Guessing Strategy
The second big part of this assignment is to develop and implement an effective guessing strategy.
The search space of all possible passwords is gigantic, but user’s passwords are usually not fully
random; they will have patterns that you can exploit to try strings with a higher probability of
being the correct password. Developing a guessing strategy then means: determining what strings
to test and in what order.
For the purposes of this assignment, all input files will be generated using the same patterns. The
actual passwords will be different, but the underlying patterns are not. In addition to the guesser
implementation, we provide you with a set of sample files (passwd, shadow, and accompanying
plaintext passwords) for you to analyse and test your code on.
Given the set of sample files, you should be able to observe patterns in the plaintext passwords
that you can exploit in your implementation. For example: some users might use their username,
appended with 00 as their password. If you spot that as a pattern, you should write code to test
passwords of that form.
To simplify your coding work, the provided implementation comes with a set of functions that
implement a range of manipulation operations that might be useful to you. One of them is a
function to substitute arbitrary characters with strings - you can use it to implement this ‘appending
00’ by substituting '\0' (end-of-string marker) by "00". Other examples are functions to capitalise
one or all letters - these are already used in the the provided code as an example.
In addition to sample data files, we also provide the top 250 most popular passwords, and a set of
ebooks in .txt form. You can use these as ‘dictionaries’ for passwords to try. You are allowed to
preprocess these or any other files into some other form that saves computation time. The provided
implementation comes with a function to read a file with simple strings on each line - which is how
this list of most popular passwords is formatted.
Reminder: Your code can (and should!) run as long as possible using as many cores as possible -
you are allowed to fill a grand total of 128 minutes of runtime on Hábròk after all (split in 1-minute
chunks over 128 cores). We will just stop your program when the time limit is reached; you do not
have to worry about stopping in time.
Hint: Think about how to minimise MPI communication overhead, given the short amount of
time available per core.
Hint: Precomputing a bunch of passwords and storing the hashes in files won’t work, since the
grading input sets will use a different salt - that will influence the final hash.
Hint: We recommend you not to use dictionary files found online, as these are not used in our
generated password list.
Hint: You can count the number of unique password guesses when testing your own code as
follows: sort [file] | uniq | wc -l .
Warning: Since the course is still called Parallel Computing, it is required to implement a parallel
component (a single-threaded implementation won’t be accepted), and it is required to use actual
MPI techniques to implement your solution.
Deliverables
We ask you to submit your code to Themis, just like previous weeks, which will only perform some
basic sanity checks. It will test some trivial inputs, like just the password password, or 123456
(both are in the top 250), and allow your program to run for 6 seconds on 2 or 4 cores (with 256MB
memory per core available, just like we allow on Hábròk). So if you manage to guess these trivial
inputs within 6 seconds, you will pass Themis - in fact, the provided demo code will already pass it.
The test on Themis mainly serves to check whether your code compiles properly and doesn’t crash.
Your code does not need to do anything specific to deal with these timing limitations - Themis
3
and Hábròk will just kill your program after the specified timeout. To prevent any losses due to
buffering of the output, the provided implementation already calls fflush(stdout) after every
found password to ensure the password is registered immediately.
Furthermore, generally you do not need to worry about printing the same password twice - we will
filter out all unique results and grade only those. However, if you do print a password multiple
times, you have probably tried and hashed it twice, which might not be the most efficient strategy.
Printing excessive amounts of output will cause Themis to stop your code.
In addition to your program code, you can submit any other input (text) files that will be used by
your code. By default, Themis will put all uploaded files in the same folder; if your code expects
these auxiliary files to be in a different folder (like the provided code does), please put all the files
you want to submit in a tar or zip archive with the proper directories - Themis will then transfer
the folder structure to the execution environment as-is. The limit on file uploads is 10MB. Note
once again that the salt used for our grading inputs will be different, so precomputing the hashes
will not work!
Themis will compile your code through your Makefile by running make , and will run your
code through mpiexec -n N ./guessword ...)) . You are allowed to change any parts of the provided
implementation (or make a completely new one), as long as it compiles and runs this way.
Lastly, we ask you to submit (on Themis) a short and concise report (PDF, no more than 1 page)
outlining your approach to parallellisation and your guessing strategy. You do not have to perform
a performance analysis this time - the efficiency is fully determined by the performance in our test
on Hábròk.
Grading
Similar to the previous labs, we will grade on the criteria correctness, efficiency, and code style:
Correctness corresponds to passing test cases on Themis. If you did not pass all test cases, you
might still get partial points. A manual code check might override the results from Themis,
e.g. if you hardcoded answers, did not implement a parallel solution, or circumvented other
explicit assignment instructions.
Parallel Algorithm corresponds to a manual judgement of your parallelisation approach for this
assignment. Elaborate on your strategy in your report. Optimal use of MPI capabilities will
positively influence your grade. Particularly sophisticated implementations might be awarded
some bonus points.
Efficiency corresponds to the number of passwords you manage to guess in the 1-minute run on
Hábròk relative to the performance of the provided sample code. This score will be purely
individual. In addition, up to 1 bonus point can be earned through your ranking among
your classmates.
Code style includes, but is not limited to: sufficient documentation, clear naming of variables,
proper code structuring, proper code formatting, etc. This is not an exhaustive list!
Our grading will always be based on the most recent submission on Themis. Even though you
might have submitted a ‘better’(-performing) version before, we will only evaluate the most recent
one.
4
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。