The output from this command is a path. Read from left to right, a path pinpoints a location in the filesystem by specifying the route to that location from a source reference point. In this case, the root directory / is used, which is the top-most location in the filesystem’s tree.
You have a special directory called your home directory, which is the default location of many things you do, and the usual starting point for your own subtree of directories containing your own files. It is also, by default, your initial working directory. So, if you entered pwd as above, just after opening the terminal, Linux should respond with the path to your home directory. On Ed, this will be /home. (If you use Linux natively or in a virtual machine, it will likely be /home/username where username is the user name you use to login; Linux is a multi-user system where each user has their own home directory.)
Whereas your working directory changes as you move around in Linux, your home directory is always the same.
You can find out what’s in your current directory using the list directory command: $ ls
You will see a list of names of all the files and subdirectories that belong to the current directory. This list is usually in multicolumn format (i.e., several names per line). If you prefer one name per line, you can use the -1 option (digit 1):
$ ls -1
There is also a long form of the directory listing which you can get with the option “-l” (lowercase
letter l):
$ ls -l
We won’t go into the details here, but the first character on each line tells you what kind of object the name refers to. If it’s a “d”, it is a directory. If it’s a “-”, it is a normal file. If it’s something else, it’s something else.
You can move from one directory to another using the change directory command. For example, to go to the lab0 directory, enter
$ cd lab0
You can go to the parent directory using $ cd ..
You can go to the home directory using $ cd ~
In fact, “ ̃” is a standard Linux shorthand for the home directory. So, if you are elsewhere in the filesystem and want to go to your home directory, you can do cd ̃ which is easier than typing out the whole path to the home directory, as in cd /home/. Similarly, “.” (a single full stop) is an abbreviation for the path to the current directory, and “..” is an abbreviation for the path to the parent directory of the current directory.
You can find out what’s in the root directory by going there. Try
$ cd /
$ pwd
$ ls -l
You will see a bunch of directories, but no files. Try changing into a subdirectory to see what is there:
4
$ cd usr $ ls -l
There are more subdirectories. “Walk around” Linux’s directory hierarchy using these commands and try to build a mental image of the tree you are exploring. Maybe draw it on a piece of paper as you go. Remember you can use cd .. to go up one level and pwd to check where you are.
Now go back to your lab0 directory. (How can you do this with as few keystrokes as possible?) Enter the command pwd to make sure you are in the right place. Enter the following commands:
$ mkdir testing
$ cd testing
You have just created a new directory named testing inside your lab0 directory and changed into it. Both commands, mkdir and cd, accept a path as their argument, but because we haven’t specified a point of reference, the current directory is taken as the default. This is often much easier than specifying a path in terms of the root directory!
Some handy tips
With the system awaiting your next command, try pressing the up-arrow key, ". This will bring back your previous command, and it’s now sitting at the prompt. It won’t execute until you press Return. Try pressing " a few times, to run through your recent actions. Then try using #, and see if the effect is what you expect. If you find a command you want to execute, hit Return.
The command history lists, in order, the commands you have entered recently.
Another shortcut is to press Tab before you have finished typing a command. If there is only one way to complete the command, then the system provides it for you without you having to type it all. Try this with history. How can you do this command with the fewest keystrokes?
Sometimes, a command generates so much output that it won’t fit within the terminal window, and you miss it all except the last part. You can gain control over viewing the output by following the command with “| less”. For example, the command ls -lR / generates an excessively long directory listing. Press Ctrl+C to stop it. Now try ls -lR / | less. (The vertical bar here is a pipe; we look at pipes in more detail later.) This shows you one window-full of output at a time. Press Space to go forwards, b to go backwards, and q when you have finished viewing and want to q uit.
If you want to find out further information on any Linux command, you can consult the manual pages (known as manpages ), from the terminal, by simply using the man command. For example, to know more about listing directories, enter
$ man ls
You can navigate the document with the arrow keys; hit “q” to escape back to the shell.
1.2 Editing files
For assignments in this unit you will have to edit text files. To do this, you need a text editor. There is a plethora of editors for Linux and the question which is best can almost cause religious wars (don’t leave an Emacs person and a vim person in the same room). If you are at the command prompt of a Linux system, you can use the command nano filename to edit a file. A short help is displayed at the bottom: ^X stands for pressing Ctrl+X to leave nano; ^O stands for pressing Ctrl+O to write out to a file (press Return to save the modified file under the same name).
Use the cd command to navigate to your lab0 directory and use nano to edit the file student-id. Put your student ID and last name in the locations indicated; you will be required to do this in the assignments.
5
You should do this using nano at least once – it is a useful skill to have when you are using any Linux system where no other editors are available. There is, however, a much more convenient way to edit files in Ed.
Using the file manager on the left side of your Ed Workspace (toggle it on if you have turned it off; see Figure 1), locate a file you’d like to edit and click on it. It will open in a tab inside the Workspace and all changes are saved immediately. You can use this convenient method to do your editing in Ed, but be sure to also know how to do it the “old-fashioned” way using nano in case you have to use a different Linux system.
Right-clicking on a file in the file manager gives you more options. While these are convenient, you should not normally use them. Instead, the purpose of this Lab is to teach you how to perform file management tasks on the command line.
1.3 Uploading files to include in assignment submissions
The assignments in this unit will mostly have tasks to perform under Linux, but there will also be theoretical questions that require you to submit a PDF. This can be a document created by scanning hand-written pages or by using some office software and converting the document to PDF. When we are asking for PDF, no other file formats are allowed and there will be penalties for submitting the wrong file format. Be sure you know how to create a PDF well before assignment submission time. Hint: Any printer on campus, e.g. in the library, has a scanner. Just login with your student ID card and “scan to self”; you will receive a PDF via email.
Practise this now: Create a PDF by any means you like. Use your Workspace’s file manager to navigate to the problem1 subdirectory. It already contains a file placeholder.pdf. Right-click on that, select “Upload Here. . . ” and upload your PDF. It should appear in the file manager and should be displayed correctly when you click on it. In assignments, there will be a placeholder document like this whenever you are required to upload a PDF. Please replace the placeholder PDF with a document with the same name in this case.
1.4 Downloading a Workspace for submission
After you have finished working on an assignment, you will be required to upload your submission to Moodle. To do this, right-click anywhere in your Workspace’s file manager and select “Download All”. Ed will create a zip file called home.zip that contains all files and directories inside your home directory. Save this file on your computer – this is what you will upload to Moodle.
Please try this now and make sure it works – it will be crucial for submitting your assignments.
2 Some useful commands and tools
Try out, and play with, each of the following commands.
The echo command just repeats any strings you give it, followed by a newline. For example:
$ echo Panini
Panini
$ echo Alan Turing
Alan Turing
$ echo "Alan Turing"
Alan Turing
Although this command doesn’t do much, it can be useful when playing with other commands. You can copy a file using the cp command:
6
$cpfilename1 filename2
creates a new file which is an exact copy of the first file. Try copying the README file you have in your home directory, and then use ls to see that the new file does indeed now exist. Note that unlike Windows, Linux treats file names as case sensitive: README is different from Readme or ReAdMe and these three names can coexist in the same directory. Pressing the Tab key completes a partially typed filename, so if you start with the scaffold as supplied, typing R and Tab will be enough to get README. If you are doing this in Ed, the newly created file will also automatically appear in the file manager bar on the left, but we are ignoring that for now.
If you don’t want the copy after all, you can remove it with rm. If you want to rename a file, you can move it:
$mvfilename1 filename2
You can cause the entire contents of a file to be output for the user to see: $ cat filename
This is useful for displaying small files. But for longer files, the contents may pass in front of you too quickly. If the file is too large to fit in one terminal window, try more filename, which displays one screenfull at a time (press the space bar to move forwards to the next screenfull), or less filename, which is like more except that it also allows you to move backwards by pressing “b”.
You can also use cat to concatenate several files and display them all: $catfilename1 filename2 filename3 ...
The wordcount command gives a one-line output containing the numbers of lines, words, and characters (including newline characters) in the file. For the purposes of this command, words are delimited by spaces, tabs, or newlines.
$ wc filename
Now, try wc without any filename:
$ wc
It seems nothing happens, and you don’t get a $ prompt. In fact, the system is waiting for your input, from the keyboard. Such input is known as standard input. Type a few words, then Return, then a few more words, and so on. The system accepts your input but does nothing with it; you see no output. After a few lines, type Ctrl+D. This terminates the standard input to this program. Now, wc gives output to the screen, giving counts of the lines, words and characters you typed in standard input.
You can sort the lines of a file:
$ sort filename
Note that this will not change the original file. A sorted version is shown to the user, but then disappears (in the sense that it is not stored in any file). Fortunately, we will shortly see how to retain it.
The command uniq removes repeated lines. So, if you have two or more identical consecutive lines, all but one is removed. Lines that are identical but have other, different lines in between them are kept; neither is removed, as they are not repeats. Again, the output (with repeated lines
7
removed) goes to the screen; the original file is unchanged. The option -c causes each line of the output file to have, at its start, the number of repeats of that line that there were in the input file.
3 Input, output, redirection, pipes
By default, the output from a Linux command is to the screen where it is displayed in the terminal window (standard output). But you can change this.
Some commands will have a specific way of nominating an output file. Examples:
• sort can take the name of an output file given as part of the -o option. So
$ sort inputFile -o outputFile
places the sorted output into file outputFile.
• uniq can take the name of an output file as a second argument. So
$uniqinputFile outputFile
places the output in outputFile. Here, there is no -o option.
In each case, any existing file of the same name is overwritten — so take care.
To determine how output files are specified for a particular command, you need to look up its documentation (e.g., its manpage). It is instructive to try to use sort as above but without the -o,
and explain what happens.
If a command or program in Linux gives output to the screen (standard output), then you can
redirect this output to a file, say outputFile, by using “> outputFile”.
Whenever output goes to a file, the output is not seen by the user at that time. To see the
output, the user must inspect the contents of that output file (for example, by cat outputFile, or by opening it with an editor).
For example,
$ ls -l > directoryList
places a directory listing in the file directoryList, rather than showing it on the screen.
You can append output to a file using “>> outputFile”. This means that the original contents of outputFile are still there, but the file is now enlarged by having the new output at the end as well.
For example,
$ echo "The End" >> outputFile
puts the string “The End” (without quotation marks) at the end of outputFile.
Exercise: Suggest two ways to concatenate two files, placing the combination into a single new file.
You can also do input redirection. Some commands may wait for you to type some input to them (standard input). We have not seen much of this yet; all our commands so far either take no user input at all, or take input from a nominated file (although we did briefly try wc with standard input). If a command uses standard input, you can append “< inputFile” at the end to specify that the input is taken from the file inputFile. The command will then execute without expecting anything further to be typed by the user.
8
Sometimes, you want the output from one program to be used as the input to the next. You can make this happen using pipes, represented by a vertical bar, |, as follows.
$ command1 | command2
Here, command1 is executed first, and the standard output from it (which would normally be displayed to the user) instead becomes the standard input for command2, which is executed second (actually, both run at the same time with the pipe “pumping” data between them). For example,
$ ls | wc
can be used to give information on the number of files. You can have a whole series of commands,
linked by pipes in this way.
4 The character translator tr
The command tr is useful when you want to translate characters according to a mapping you specify.
$trstring1 string2
The meaning is that, for each position in string1, the character there is mapped to the character at the corresponding position in string2. It’s like specifying a function, in mathematics, with string1 listing the characters in the domain, and string2 specifying what the function does to each of those characters.
If string2 is omitted and the -d option is used, then tr simply removes all characters appearing in string1, without replacing them by anything.
The command does not specify an input or output file. Input is from standard input, and output is to standard output.
$ tr abc 123 abracadabra 12r131d12r1 open sesame open ses1me Ctrl+D
$
One typical application is to convert all letters to lower case.
A handy option is -s, for squeeze, which removes consecutive repeats of nominated characters.
There are also special ways of defining ranges of characters and special names for commonly used character sets. Check out the manpage for details of these and other features.
You can use redirection if you want to use files for your input and/or output.
Exercise: How would you use tr to concatenate all the lines of a file so that the file just contains one single long line (with all line breaks removed)?
5 The stream editor sed
The stream editor sed enables you to transform a file, line by line, according to rules you specify. A rule specifies a pattern to be replaced, and what the pattern is to be replaced with. sed is very pow- erful; for a full description, see its manpage, or one of the many introductions to it such as
and http://www.grymoire.com/Unix/Sed.html. 9
https://
www.gnu.org/software/sed/manual/sed.html
The most basic way to use sed is as follows.
$ sed ’s/pattern/replacement/’ filename
The text between single-quotes is called the script, and this gives the rule to be used on the input file, filename. A pattern can be a simple string of characters, and the replacement could be the string that you want it replaced by. In that case, running sed as above causes the first occurrence of the pattern string in each line (if such exists) to be replaced by the replacement string. Lines without the pattern string are not changed. The output goes to the screen, as usual. If you want all occurrences of the pattern in every line to be replaced, you can append “g” to the script:
$ sed ’s/pattern/replacement/g’ filename
For example, the following command replaces every occurrence of A Lady by Jane Austen:6 $ sed ’s/A Lady/Jane Austen/g’ filename
You can include special characters in these pattern strings. For example, the tab, newline and forward-slash character are represented by \t, \n and \/, respectively. (The latter is necessary if you want to include a forward-slash character in your pattern, since that character is also used in the script to delimit the pattern.)7
Patterns can be more general than just one specific string. To match any one of a set of charac- ters, use a list within square brackets (with no commas, or other separators, between the items in the list). For example, [aeiou] matches any lower-case vowel. So the pattern b[aeiou]t matches any of the words bat, bet, bit, bot, but. To match a range of characters, use [↵1-↵2], where ↵1 and ↵2 are the first and last characters in the range. The order of characters in the range is alphabetic — or, more precisely, in order of ASCII values. So the range [a-z] matches any lower-case letter; [N-Z] matches any upper-case letter in the second half of the alphabet; and [#-&] matches any of #, $, %, &. If you want to match any character that is not in some list, or range, of characters, you can put ˆ just after [. For example, [ˆaeiou] matches any character that is not a lower-case vowel, and [ˆa-zA-Z&] matches any character that is not a letter or an ampersand.
Exercise: How would you use sed to remove all characters that are not letters?
Replacement strings can also be used to specify a variety of ways of replacing a string that matches the pattern. If you put \(. . . \) around some part of the pattern, then the corresponding portion of the matching string can be used, using \1, in the replacement string. For example, sup- pose you have a file of postcodes (which are four-digit numbers in Australia), one per line, whose first digits have been erroneously recorded as 2 (for NSW) when they should be 3 (for Victoria), but are otherwise correct. Then you could correct your file by
$ sed ’s/2\([0-9][0-9][0-9]\)/3\1/’ filename
Here, the pattern matches every erroneous postcode, and the subpattern in \(. . . \) matches the last three digits of the postcode. The matched string (the entire erroneous postcode) will be replaced by 3 followed by the portion of the original matched string that matches the subpattern, i.e., by the last three digits of the original postcode.
6Jane Austen (1775–1817) is one of the most popular authors in English. Her first novel was published under the pseudonym, “A Lady”.
7An apostrophe in the pattern is trickier. To match it, instead of just using a single apostrophe ’, use ’\”. 10
Exercise: Suppose you have a file of prices in dollars and cents, all under $100. Use sed to remove all cents from the prices.
Extension exercise: Consider how, instead, to round all prices to the nearest dollar. (This is considerably harder.)
The “1” in \1 indicates the string that matches the first subpattern (counting from left to right in the pattern). You can have up to nine subpatterns, referred to in the replacement string by \1, \2, . . . , \9. For example,
$ sed ’s/ \([0-9]\)\/\([0-9]\) / \2\/\1 /’ filename
takes a fraction, with single-digit numerator and denominator and a space on each side, and swaps
the numerator and denominator.
Exercise: Use sed to insert a space between every pair of adjacent letters in a file.
6 Searching with grep
If you don’t want to edit a file but just search it using patterns similar to those in sed, you can use
the command grep:
$ grep pattern filename
Like many other Linux commands, grep will read from the standard input if you omit the file- name, so you can use it in a pipe. The simplest use is to search for a word in a file. If you are in the directory lab0/example, the command
$ grep it Gadsby-para1
will output the five lines that contain the character sequence “it”.
Like in sed, a pattern for use with grep can contain special characters like \t and \n and bracket expressions that describe character classes. In this case, you should put single quotes around the pattern as in
$ grep ’[aeiou][aeiou]’ Gadsby-para1
which searches the file Gadsby-para1 for lines containing two consecutive vowels.
If you look up the manpage for grep and scroll down you will see that the patterns it accepts are called regular expressions, and there’s a lot more to them than we are covering right now. We
will study regular expressions in detail later in this unit.
For now, take note of two more special characters that may be useful in a pattern. If you begin
your pattern with a caret “ˆ”, it will only match lines that begin with the indicated pattern. Sim- ilarly, if you end it with a dollar sign “$”, lines have to end with the specified pattern. If you do both, the whole line has to be matched by the pattern. This isn’t very exciting when your pattern just consists of letters, but it can be when you use bracket expressions and other special features of regular expressions which we will encounter later.
11
Figure 2: The Wordle puzzle of 15/7/2022, partially solved
Exercise: Use grep to find possible solutions to the Wordle8 puzzle shown in Figure 2. (You are looking for a five-letter word that has the letter E in the two green positions indicated and does not use any of the grey letters.)
Hint: the file /usr/share/dict/words that comes with Linux has a list of English words. Use less to browse it and grep to cheat at Wordle.
7 Frequency count
Let us now combine some of the tools and skills we have learned, in order to determine frequencies of words and letters in English text. This is a common task in studying the statistical properties of human language, and is useful in data compression, machine learning and cryptography.
First, find a good source of reasonably long English text files that have little or no formatting or mark-up. For example, look at one of Project Gutenberg’s lists,
, choose a book, then choose Plain Text, and download it. Upload the file to your Ed Workspace to work with it there.
Exercises:
1. From your input file, derive one with the same words, in the same order, but with each word on a separate line.
2. From this file, find one in which each word appears only once, and is accompanied by its frequency (i.e., the number of times it occurs in the file).
8Wordle is a popular online word game developed by Josh Wardle and hosted by the New York Times. Go to to check out the rules and play it.
12
browse/scores/top
https://www.gutenberg.org/
https://www.nytimes.com/games/wordle/
3. Sort the file of word frequencies in order of decreasing frequency.
4. Now do a frequency count of letters in the file. You should give the frequency count in two separate files: one with the letters in alphabetical order, the other with the letters ranked by frequency.
5. A digraph is a pair of consecutive letters. Do a frequency count of all digraphs in a file. (Overlapping digraphs are still counted separately. For example, if the file just consists of the single word dodo, then we have three digraphs, namely do, od, and do, so the frequency count should show that do has frequency 2 and od has frequency 1.)
To test your solutions, you can use small input files you write yourself, and/or the file provided in the lab0/examples directory. Once your method works on these small examples, try it on the large textfile you downloaded earlier.
8 Challenges
1. Suppose we do ls | wc as above, followed by a sequence of further applications of | wc. $ ls | wc
...
$ ls | wc | wc
...
$ ls | wc | wc | wc
... .
Before trying it out, can you predict how many pipes are required before the output ceases to change, and what that output will be? Having made your prediction, try it out.
What if we had started out with some other command, instead of ls? How would you prove that, for every Linux command, continued application of | wc eventually produces this same fixed output?
2. Suppose you represent each word of a file by its position in the list of words ranked by frequency. So the most frequent word (which might be “the”, say) is represented by the number 1, and for all i, the i-th most frequent word is represented by the number i. If each word in your file is replaced by its corresponding number (with spaces, newlines and punctuation unchanged), how much compression of the file is achieved? It is possible to do a good estimate of this without actually implementing this compression.
3. How could you convert your frequency count files to ones where the frequencies are given as percentages rather than raw counts? Do not use a spreadsheet or write a program to do this. Read more about the capabilities of some of the Linux tools we have met, and other tools such as awk.
4. Read about how Simple Substitution cyphers work, and show how to implement Simple Sub- stitution using tr.
13
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。