data程序代做、代写Java设计编程-代写Java编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

您当前位置：首页 >> Java编程Java编程

data程序代做、代写Java设计编程

日期：2024-06-09 05:39

Lab 05 - Wordcount program in Hadoop

Full name:

Student D:

Tasks:

1. Open Eclipse inside the Cloudera platform

Note: If Eclipse software is not installed inside you Cloudera VM, please install it.

2. Create a Java project and a java file:

a. File > New > Java Project

b. Give a project name ex; “WordCount”

c. Hit “Next” and in the next page, click on “Libraries” tab

d. Click on “Add External JARs” button

e. Navigate to “File System>usr>lib>Hadoop”

f. Select all *.jar files and click “OK”

g. Click on “Add External JARs” button again

h. Navigate to “File System>usr>lib>Hadoop>client”

i. Select all *.jar files (ctrl+A) and click “OK”

j. Give a moment to load all the jar files. Once the jar files are added to the list

of libraries, then click “Finish” button

k. Inside the “Package Explorer”, and under the project that you created above

(WordCound), right click on src and from “New”, select option “Class”

l. In the Java Class window, just give a meaningful name for the file name (ex;

WordCount) and the click on “Finish” button. This will create a java file with

the give name (ex; WordCount.java)

m. Delete the content of the newly created file and then copy the source code

from the like bellow or from the resources at the bottom and paste the code

inside the file.

i. Copy the source code from the below link

https://hadoop.apache.org/docs/current/hadoop-mapreduceclient/hadoop-mapreduce-clientcore/MapReduceTutorial.html#Source_Code

n. Check the code and ensure there is no error before you save it.

o. Now right click on the project name “WordCount” inside “Package Explorer”

panel and select “Export”.

p. Open Java from the list and select “JAR File”, the click “Next”

q. In the next window, click on “Browse…” button and navigate to a specific

folder ex; /home/cloudera

r. Give a name file for exporting jar file (ex; WordCount.jar) and click “OK”

s. Click “Finish” button

t. Check the jar file is available inside the “cloudera” folder

3. Open terminal

4. Create a simple text file ( to do that, follow the any methods you learn in previous

labs)

a. Use the command: cat > /home/cloudera/Processfile.txt

b. Add some lines and try to use duplicated words inside

c. To save and close the file opened by cat function: ctrl+d or ctrl+z

d. Check the file is created and check the content to ensure the inserted content

is saved. cat /home/cloudera/Processfile.txt

5. Check hdfs files and directory list: hdfs dfs -ls /

6. Create a folder inside the root of hdfs: hdfs dfs -mkdir /inputfoler

7. Check the folder is created

8. Copy Processfile.txt from local storage into inputfoler inside hdfs

a. Command: hdfs dfs -put /home/cloudera/Processfile.txt /inputfolder/

b. Check the content of file: hdfs dfs -cat /inputfolder/Processfile.txt

9. Run the Hadoop syntax to run the wordcount

a. Command: Hadoop jar /home/cloudera/WordCount.jar WordCount

/inputfolder/Processfile.txt /outputfolder

b. Hit enter and wait for the jar file to be executed

10. Read the content of the messages that are shown by Hadoop during the execution

process and try to understand and write your findings in your report

11. After the execuation of the program is finished, check the folder outputfolder inside

hdfs to see if there is any output file: hdfs dfs -ls /outputfolder

12. Open the file with name similar to “part-r-00000” or any file that is created and the

size is greater than Zero

13. Check the word counts and cross check with the content of the Processfile.txt to

validate counting is correct.

14. Run the same wordcount program with another text file. select a text file of your choice.

But it is recommended to do not use a heavy file for this practice to prevent heavy

process inside your VM that takes longer time to be completed.

15. Check the source code used in task 2.m and write your understanding of the code in

your report.

16. Save and submit your Lab documents in PDF with the following filename format.

Submit your report via the submission link for this available on MyTIMeS.

Filename format: Name_ID_Lab05

Resources:

Source code for WordCount from hadoop.apache.org website

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(value.toString());

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());

context.write(word, one);

}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += val.get();

}

result.set(sum);

context.write(key, result);

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "word count");

job.setJarByClass(WordCount.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：program代写、Java程序设计代做

【下一篇】：program代写、Java程序设计代做

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Java编程Java编程

data程序代做、代写Java设计编程

日期：2024-06-09 05:39

相关文章