# 08 分布式计算MapReduce–词频统计(08 distributed computing MapReduce — word frequency statistics)-其他

## 08 分布式计算MapReduce–词频统计(08 distributed computing MapReduce — word frequency statistics)

WordCount程序任务：

WordCount

1.用你最熟悉的编程环境，编写非分布式的词频统计程序。

• 读文件
• 分词（text.split列表）
• 按单词统计（字典,key单词，value次数）
• 排序（list.sort列表）
• 输出

`````` 1 import re
2
6 print(str)
7
8 # 利用正则将所有非字母的字符过滤掉
9 str = re.sub(r"[^a-zA-Z]+", " ",str)
10 print("过滤后的字符串：",str)
11
12 #拆分成列表
13 str = str.split(" ")
14 # 去除多余的空项
15 str.remove("")
16 print("拆分成列表：",str)
17
18 # 生成字典的key列表
19 dict_keys = []
20 for i in str:
21     if i not in dict_keys:
22         dict_keys.append(i)
23 print("key列表：",dict_keys)
24
25 # 定义空字典
26 words_dict = {}
27
28 # 往字典写入key值
29 words_dict.fromkeys(dict_keys)
30
31 # 遍历key列表,利用count函数统计单词出现次数
32 for j in dict_keys:
33     words_dict[j] = str.count(j)
34 print("字典：",words_dict)
35
36 #默认排序方式给已拆分的单词排序
37 dict_keys.sort()
38 print("按字母排序：",dict_keys)``````

There are several ways to create and maintain a harmonious dormitory life. Firstly, you have to evaluate your life style and try to get rid of your dirty habits, if there are any. In conclusion, we should try our best to build a harmonious dormitory life for the sake of good study and good life.过滤后的字符串： There are several ways to create and maintain a harmonious dormitory life Firstly you have to evaluate your life style and try to get rid of your dirty habits if there are any In conclusion we should try our best to build a harmonious dormitory life for the sake of good study and good life 拆分成列表： [‘There’, ‘are’, ‘several’, ‘ways’, ‘to’, ‘create’, ‘and’, ‘maintain’, ‘a’, ‘harmonious’, ‘dormitory’, ‘life’, ‘Firstly’, ‘you’, ‘have’, ‘to’, ‘evaluate’, ‘your’, ‘life’, ‘style’, ‘and’, ‘try’, ‘to’, ‘get’, ‘rid’, ‘of’, ‘your’, ‘dirty’, ‘habits’, ‘if’, ‘there’, ‘are’, ‘any’, ‘In’, ‘conclusion’, ‘we’, ‘should’, ‘try’, ‘our’, ‘best’, ‘to’, ‘build’, ‘a’, ‘harmonious’, ‘dormitory’, ‘life’, ‘for’, ‘the’, ‘sake’, ‘of’, ‘good’, ‘study’, ‘and’, ‘good’, ‘life’]key列表： [‘There’, ‘are’, ‘several’, ‘ways’, ‘to’, ‘create’, ‘and’, ‘maintain’, ‘a’, ‘harmonious’, ‘dormitory’, ‘life’, ‘Firstly’, ‘you’, ‘have’, ‘evaluate’, ‘your’, ‘style’, ‘try’, ‘get’, ‘rid’, ‘of’, ‘dirty’, ‘habits’, ‘if’, ‘there’, ‘any’, ‘In’, ‘conclusion’, ‘we’, ‘should’, ‘our’, ‘best’, ‘build’, ‘for’, ‘the’, ‘sake’, ‘good’, ‘study’]字典： {‘There’: 1, ‘are’: 2, ‘several’: 1, ‘ways’: 1, ‘to’: 4, ‘create’: 1, ‘and’: 3, ‘maintain’: 1, ‘a’: 2, ‘harmonious’: 2, ‘dormitory’: 2, ‘life’: 4, ‘Firstly’: 1, ‘you’: 1, ‘have’: 1, ‘evaluate’: 1, ‘your’: 2, ‘style’: 1, ‘try’: 2, ‘get’: 1, ‘rid’: 1, ‘of’: 2, ‘dirty’: 1, ‘habits’: 1, ‘if’: 1, ‘there’: 1, ‘any’: 1, ‘In’: 1, ‘conclusion’: 1, ‘we’: 1, ‘should’: 1, ‘our’: 1, ‘best’: 1, ‘build’: 1, ‘for’: 1, ‘the’: 1, ‘sake’: 1, ‘good’: 2, ‘study’: 1}按字母排序： [‘Firstly’, ‘In’, ‘There’, ‘a’, ‘and’, ‘any’, ‘are’, ‘best’, ‘build’, ‘conclusion’, ‘create’, ‘dirty’, ‘dormitory’, ‘evaluate’, ‘for’, ‘get’, ‘good’, ‘habits’, ‘harmonious’, ‘have’, ‘if’, ‘life’, ‘maintain’, ‘of’, ‘our’, ‘rid’, ‘sake’, ‘several’, ‘should’, ‘study’, ‘style’, ‘the’, ‘there’, ‘to’, ‘try’, ‘ways’, ‘we’, ‘you’, ‘your’]

2.用MapReduce实现词频统计

————————

program

WordCount

input

A text file containing a large number of words

output

Each word in the file and its occurrence times (frequency),

In alphabetical order,

Each word and its frequency occupy a line, and there is an interval between the word and the frequency

1. Use your most familiar programming environment to write a non distributed word frequency statistics program.

• 分词（text.split列表）
• Statistics by word (Dictionary, key word, value times)
• 排序（list.sort列表）
• output

Use Python to complete the above content (refer to the blog post from the blog Garden of the past years of water: Python counts the number of English words and generates the results into a dictionary – the past years of water – the blog Garden (cnblogs. Com))

`````` 1 import re
2
6 print(str)
7
8 # 利用正则将所有非字母的字符过滤掉
9 str = re.sub(r"[^a-zA-Z]+", " ",str)
10 print("过滤后的字符串：",str)
11
12 #拆分成列表
13 str = str.split(" ")
14 # 去除多余的空项
15 str.remove("")
16 print("拆分成列表：",str)
17
18 # 生成字典的key列表
19 dict_keys = []
20 for i in str:
21     if i not in dict_keys:
22         dict_keys.append(i)
23 print("key列表：",dict_keys)
24
25 # 定义空字典
26 words_dict = {}
27
28 # 往字典写入key值
29 words_dict.fromkeys(dict_keys)
30
31 # 遍历key列表,利用count函数统计单词出现次数
32 for j in dict_keys:
33     words_dict[j] = str.count(j)
34 print("字典：",words_dict)
35
36 #默认排序方式给已拆分的单词排序
37 dict_keys.sort()
38 print("按字母排序：",dict_keys)``````

The compilation results are as follows:

There are several ways to create and maintain a harmonious dormitory life. Firstly, you have to evaluate your life style and try to get rid of your dirty habits, if there are any. In conclusion, we should try our best to build a harmonious dormitory life for the sake of good study and good life.过滤后的字符串： There are several ways to create and maintain a harmonious dormitory life Firstly you have to evaluate your life style and try to get rid of your dirty habits if there are any In conclusion we should try our best to build a harmonious dormitory life for the sake of good study and good life 拆分成列表： [‘There’, ‘are’, ‘several’, ‘ways’, ‘to’, ‘create’, ‘and’, ‘maintain’, ‘a’, ‘harmonious’, ‘dormitory’, ‘life’, ‘Firstly’, ‘you’, ‘have’, ‘to’, ‘evaluate’, ‘your’, ‘life’, ‘style’, ‘and’, ‘try’, ‘to’, ‘get’, ‘rid’, ‘of’, ‘your’, ‘dirty’, ‘habits’, ‘if’, ‘there’, ‘are’, ‘any’, ‘In’, ‘conclusion’, ‘we’, ‘should’, ‘try’, ‘our’, ‘best’, ‘to’, ‘build’, ‘a’, ‘harmonious’, ‘dormitory’, ‘life’, ‘for’, ‘the’, ‘sake’, ‘of’, ‘good’, ‘study’, ‘and’, ‘good’, ‘life’]key列表： [‘There’, ‘are’, ‘several’, ‘ways’, ‘to’, ‘create’, ‘and’, ‘maintain’, ‘a’, ‘harmonious’, ‘dormitory’, ‘life’, ‘Firstly’, ‘you’, ‘have’, ‘evaluate’, ‘your’, ‘style’, ‘try’, ‘get’, ‘rid’, ‘of’, ‘dirty’, ‘habits’, ‘if’, ‘there’, ‘any’, ‘In’, ‘conclusion’, ‘we’, ‘should’, ‘our’, ‘best’, ‘build’, ‘for’, ‘the’, ‘sake’, ‘good’, ‘study’]字典： {‘There’: 1, ‘are’: 2, ‘several’: 1, ‘ways’: 1, ‘to’: 4, ‘create’: 1, ‘and’: 3, ‘maintain’: 1, ‘a’: 2, ‘harmonious’: 2, ‘dormitory’: 2, ‘life’: 4, ‘Firstly’: 1, ‘you’: 1, ‘have’: 1, ‘evaluate’: 1, ‘your’: 2, ‘style’: 1, ‘try’: 2, ‘get’: 1, ‘rid’: 1, ‘of’: 2, ‘dirty’: 1, ‘habits’: 1, ‘if’: 1, ‘there’: 1, ‘any’: 1, ‘In’: 1, ‘conclusion’: 1, ‘we’: 1, ‘should’: 1, ‘our’: 1, ‘best’: 1, ‘build’: 1, ‘for’: 1, ‘the’: 1, ‘sake’: 1, ‘good’: 2, ‘study’: 1}按字母排序： [‘Firstly’, ‘In’, ‘There’, ‘a’, ‘and’, ‘any’, ‘are’, ‘best’, ‘build’, ‘conclusion’, ‘create’, ‘dirty’, ‘dormitory’, ‘evaluate’, ‘for’, ‘get’, ‘good’, ‘habits’, ‘harmonious’, ‘have’, ‘if’, ‘life’, ‘maintain’, ‘of’, ‘our’, ‘rid’, ‘sake’, ‘several’, ‘should’, ‘study’, ‘style’, ‘the’, ‘there’, ‘to’, ‘try’, ‘ways’, ‘we’, ‘you’, ‘your’]

2. Use MapReduce to realize word frequency statistics