用python统计list中只出现一次的单词的比例 (call了一个clean_up method 去掉了后面的\n)

用python统计list中只出现一次的单词的比例 (call了一个clean_up method 去掉了后面的n)
def hapax_legomena_ratio(text):
""" (list of str) -> float
Precondition:text is non-empty.Each str in text ends with n and
text contains at least one word.
Return the hapax legomena ratio for text.This ratio is the number of
words that occur exactly once divided by the total number of words.
>>> text = ['James Fennimore Coopern','Peter,Paul,and Maryn',
'James Goslingn']
>>> hapax_legomena_ratio(text)
0.7777777777777778
"""
t = [ ]
for string in text:
t.append(clean_up(string))
at_least_once = [ ] ＃规定只能创建两个list来运算一个是至少出现一次的单词
at_least_twice = [ ] ＃这个是至少出现两次的单词
然后应该是利用length做比吧中间的body不会写.
没学过正则

zhangjiafu 1年前已收到1个回答举报

水里的一个tt人幼苗

共回答了25个问题采纳率：92% 举报

不清楚clean_up函数做了什么,整个函数都是新写的,你参考一下吧.

def hapax_legomena_ratio(text):
at_least_once = []
at_least_twice = []
total = 0
for s in text:
for word in s.strip().split():
word = word.strip('.,;')
total += 1
if word not in at_least_once:
at_least_once.append(word)
elif word not in at_least_twice:
at_least_twice.append(word)
return (1.0 * (len(at_least_once)-len(at_least_twice)))/total

代码很简单,前面是为了把字符串分割成单词,然后统计单词总数 (total += 1).
后面是核心部分,统计单词是出现了最少一次,还是最少两次.
if word not in at_least_once:
at_least_once.append(word)
elif word not in at_least_twice:
at_least_twice.append(word)

最后的 return 语句用来计算只出现一次的单词个数（至少出现一次个数 - 至少出现两次个数 = 只出现一次个数）和总单词数的比率.

1年前追问

zhangjiafu 举报

谢谢！特别感谢！后面核心的部分懂了前面单词分割的部分也懂了可是split的方法还没学过不可以用的话要怎么吧字符串分割成单词呢

可能相似的问题