有一个文件夹里有大概1000个文件。我有以下的Python语句调用后(转载) - Programming版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Programming版 - 有一个文件夹里有大概1000个文件。我有以下的Python语句调用后(转载)

相关主题
● 超级新手，求助 python pandas 和pandas_DataReader	● 问个文字decoding的题目
● python question, easy one	● Python Browsermob Proxy Library on mac issue
● python download pdf	● Java 提高performance问题
● how to decode these data from users' input at a web site	● 两个面世题
● copy_from_user() 是怎么知道缓存长度的？ (转载)	● A question about class size
● 请教一个C++的设计问题	● C++ Q 108: swap
● 请教python 打开文件的问题！多谢！	● 老魏老姜老霸，我出银子给你们开机器
● error of opening a file located in a remote server from pyt (转载)	● java里run curl system command的问题

相关话题的讨论汇总
话题: file话题: dir话题: users话题: line话题: decode

进入Programming版参与讨论

(共1页)

m**********r
发帖数: 122

【以下文字转载自 DataSciences 讨论区】
发信人: milkrootbeer (milkbeer), 信区: DataSciences
标题: 有一个文件夹里有大概1000个文件。我有以下的Python语句调用后出现下面的错误。应该是涉及到特殊字符的问题，我试了其他的方法，都不能解决问题。
发信站: BBS 未名空间站 (Sat May 2 20:09:17 2015, 美东)
有一个文件夹里有大概1000个文件。我有以下的Python语句调用后出现下面的错误。应
该是涉及到特殊字符的问题，我试了其他的方法，都不能解决问题。
DIR = 'C:\Users\Desktop\data\rec.sport.hockey'
posts = [open(os.path.join(DIR,f)).read() for f in os.listdir(DIR)]
x_train = vectorizer.fit_transform(posts)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 240:
invalid start byte
Traceback (most recent call last):
File "C:/Users/PycharmProjects/Project3/demo10.py", line 16, in
x_train = vectorizer.fit_transform(posts)
File "C:UsersAppDataRoamingPythonPython27site-packagessklearnfeature_
extractiontext.py", line 804, in fit_transform
self.fixed_vocabulary_)
File "C:UsersAppDataRoamingPythonPython27site-packagessklearnfeature_
extractiontext.py", line 739, in _count_vocab
for feature in analyze(doc):
File "C:UsersAppDataRoamingPythonPython27site-packagessklearnfeature_
extractiontext.py", line 236, in
tokenize(preprocess(self.decode(doc))), stop_words)
File "C:UsersAppDataRoamingPythonPython27site-packagessklearnfeature_
extractiontext.py", line 113, in decode
doc = doc.decode(self.encoding, self.decode_error)
File "C:Python27libencodingsutf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 240:
invalid start byte
DIR = 'C:\Users\Desktop\data\rec.sport.hockey'
posts = [codecs.open(os.path.join(DIR,f),'r','utf-8') for f in os.listdir(
DIR)]
x_train = vectorizer.fit_transform(posts)
Traceback (most recent call last):
File "C:/Users/PycharmProjects/Project3/demo10.py", line 15, in
posts = [codecs.open(os.path.join(DIR,f),'r','utf-8') for f in os.
listdir(DIR)]
File "C:Python27libcodecs.py", line 878, in open
file = __builtin__.open(filename, mode, buffering)
IOError: [Errno 24] Too many open files: 'C:\Users\Desktop\data\rec.sport.
hockey\53909'

(共1页)

进入Programming版参与讨论

相关主题
● java里run curl system command的问题	● copy_from_user() 是怎么知道缓存长度的？ (转载)
● 问一个Mandriva 2007 下Tix的问题	● 请教一个C++的设计问题
● 请问有没有用过IMSL库的大虾？ (转载)	● 请教python 打开文件的问题！多谢！
● solidot上看来的	● error of opening a file located in a remote server from pyt (转载)
● 超级新手，求助 python pandas 和pandas_DataReader	● 问个文字decoding的题目
● python question, easy one	● Python Browsermob Proxy Library on mac issue
● python download pdf	● Java 提高performance问题
● how to decode these data from users' input at a web site	● 两个面世题

相关话题的讨论汇总
话题: file话题: dir话题: users话题: line话题: decode

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天