联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> CS作业CS作业

日期:2018-07-30 03:39


CS 335 2018 Page 1 of 5 A#1R v1

Word Frequency Table

Finding the most common words in a text file is a famous “programming pearl”,

originally posed by Jon Bentley (1984). Several interesting solutions have been

proposed by Knuth (an exquisite model of “literate programming”, 1986),

McIlroy (an engineering example of combining a timeless set of “UNIX tools”,

1986), Hanson (an alternate efficient solution, 1987).

This assignment requires a simple solution, comparable in style and performance

to McIlroy’s pipelined solution (though not as performant as the much more

complex solutions of Knuth and Hanson). For this, we will use a simple functional

programming style based on .NET LINQ framework.

Problem statement: Given a text file and an integer k, you are to print the

words (and their frequencies of occurrence) whose frequencies of occurrence are

the k largest (upper limit) in order of decreasing frequency; breaking any ties in

increasing words’ alphabetical order. Words are consecutive sequences of ASCII

alphabetical letters (A-Za-z), where letter case is ignored, and all other

characters are ignored. A short minimalistic example will clarify these specs.

Sample text file, called test1.txt:

五 π?ντε пять

dd00 DD?!

ccc.

AA aa, Aa.

CCC bb aA?

!

Sample result, for k=3: The output must be a text file with two columns. Extra

white spaces are permitted but discarded. Please use this format, because your

output will be assessed by a script, using whitespace tolerant textual

comparisons.

4 AA

2 CCC

2 DD

CS 335 2018 Page 2 of 5 A#1R v1

To understand this, let us consider the following intermediate steps.

Word list extracted from this text file:

dd DD ccc AA aa Aa CCC bb aA

Canonical word list – upper case versions to ignore the case:

DD DD CCC AA AA AA CCC BB AA

An unsorted frequency table:

4 AA

1 BB

2 CCC

2 DD

A sorted frequency table: 1st by decreasing frequencies, 2nd by increasing

alphabetical word order:

4 AA

2 CCC

2 DD

1 BB

Taking the first k=3 rows only, we obtain the above shown result.

Programs: (1) a C# solution (required); (2) a Node JS solution (required); and

(3) an F# solution (optionally, bonus). Essentially, each program must be totally

contained in one single file and use only the standard libraries extant in the labs.

The C# solution must use the .NET LINQ framework and System.IO.

The Node JS solution must only use linq-es2015 (external) and fs (internal).

The F# solution must only use the F# core modules and System.IO.

These programs must be named after your upi, e.g. jbon007.cs, jbon007.js,

jbon007.fs. The dropbox will reject programs that do not follow this convention.

Compilation: Assuming the C# and F# compilers are on the current PATH, the

required C# and the optional F# programs must be compilable in the labs with

these simple commands (no external libraries are permitted):

csc jbon007.cs

fsc jbon007.fs

The required Node JS program will not be compiled, but executed directly by

Node.

CS 335 2018 Page 3 of 5 A#1R v1

Execution – C# and F#

After the above compilation, the C# and F# programs will have a name like

jbon007.exe. This program will be given (1) a mandatory parameter, the text

file name (path); and (2) an optional parameter, the integer k, with the default

value k=3. If errors occur, an error line must appear. Sample runs and outputs:

>jbon007.exe test1.txt 2 all arguments given

4 AA

2 CCC

>jbon007.exe test1.txt 10 all arguments given

4 AA

2 CCC

2 DD

1 BB

>jbon007.exe test1.txt default value k=3

4 AA

2 CCC

2 DD

>jbon007.exe test1.txt x incorrect number k

*** Error: ...

>jbon007.exe testx.txt assuming that testx.txt does not exist

*** Error: ...

>jbon007.exe missing file name

*** Error: ...

In the error texts, in place of ellipses (...), display either system created

messages, or your custom messages (presumably more adequate). However, we

do not prescribe these texts. The important part is that the programs must catch

and display something meaningful for all possible errors.

CS 335 2018 Page 4 of 5 A#1R v1

Execution – Node JS

The JS program will be executed under Node, so the command lines will be

longer and errors may look differently - but still all errors must be caught.

>node jbon007.js test1.txt 2

4 AA

2 CCC

>node jbon007.js test1.txt 10

4 AA

2 CCC

2 DD

1 BB

>node jbon007.js test1.txt

4 AA

2 CCC

2 DD

>node jbon007.js test1.txt x

*** Error: ...

>node jbon007.js testx.txt

*** Error: ...

>node jbon007.js

*** Error: ...

CS 335 2018 Page 5 of 5 A#1R v1

Functional style: Besides the results’ accuracy, your submission will also be

assessed on its high-level functional style. For this assignment, we will be

specifically looking at a consistent formatting, a reasonable size, and at the

number of low-level explicit loops used, i.e. for, foreach, while statements.

Generally, a functional style naturally results in crisper and shorter programs

(without artificially squeezing many expressions on one single line). The model

solutions have about 50-60 lines, including a few blank lines inserted for

readability, and no explicit loops.

As we will see in the lectures, functional programming offers high-level functions

that encapsulate and hide most required loops.

Submission

Please submit to the Assignment Dropbox (https://adb.auckland.ac.nz/) the

following items:

(1) Your C# source file, upi.cs - required

(2) Your Node JS source file, upi.js - required

(3) Your F# source file, upi.fs – allowed (optional bonus)

References (just in case you are interested to read more about this problem

and its very efficient solutions)

Bentley, J., Knuth, D., McIlroy, D.: Programming pearls: A literate program.

Commun. ACM 29(6), 471-483 (Jun 1986),

http://doi.acm.org/10.1145/5948.315654

Appendix

A sample bash script solution, based on McIlroy’s pipeline of UNIX tools. Again,

no explicit loops. Just in case, as you do not have to study this if are not familiar

with UNIX.

cat test1.txt |

tr -cs A-Za-z '\n' |

awk 'NF' |

tr a-z A-Z |

gsort |

uniq -c |

gsort -k1rn -k2 |

head -3

#sed 3q


版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp