l***r 发帖数: 459 | 1 【 以下文字转载自 Programming 讨论区,原文如下 】
发信人: laoer (You know what!), 信区: Programming
标 题: a question on XML parser
发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST
Greetings,
I have several "<.." in one file. Right now, I first divide this
file to many string. Each string is one xml record. Then, I use Java SAX
parser to parse it. It turns out that it performs very slowly on dividing and
parsing. Is there any better way? like parsing all records in this file in one
time?
Tha | x***n 发帖数: 39 | 2 my guess is ur file operation takes long time.
post the snippet where you chop down the files into pieces.
【在 l***r 的大作中提到】 : 【 以下文字转载自 Programming 讨论区,原文如下 】 : 发信人: laoer (You know what!), 信区: Programming : 标 题: a question on XML parser : 发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST : Greetings, : I have several "<.." in one file. Right now, I first divide this : file to many string. Each string is one xml record. Then, I use Java SAX : parser to parse it. It turns out that it performs very slowly on dividing and : parsing. Is there any better way? like parsing all records in this file in one : time?
| z****g 发帖数: 2497 | 3 why do you dividing the file to string?
SAX parser is a progressing parser, go line by line.
If you use JDOM, it will read all the file in.
For large file, SAX Parser performs better than DOM
parser.
【在 l***r 的大作中提到】 : 【 以下文字转载自 Programming 讨论区,原文如下 】 : 发信人: laoer (You know what!), 信区: Programming : 标 题: a question on XML parser : 发信站: Unknown Space - 未名空间 (Tue Jun 15 22:29:14 2004) WWW-POST : Greetings, : I have several "<.." in one file. Right now, I first divide this : file to many string. Each string is one xml record. Then, I use Java SAX : parser to parse it. It turns out that it performs very slowly on dividing and : parsing. Is there any better way? like parsing all records in this file in one : time?
| l***r 发帖数: 459 | 4 Sorry, I guess I didn't describe the problem clearly. The xml file looks like
this:
"NCBI_BlastOutput.dtd">
...
"NCBI_BlastOutput.dtd">
...
"NCBI_BlastOutput.dtd">
...
【在 z****g 的大作中提到】 : why do you dividing the file to string? : SAX parser is a progressing parser, go line by line. : If you use JDOM, it will read all the file in. : For large file, SAX Parser performs better than DOM : parser.
| w*r 发帖数: 2421 | 5 okey, your xml file is not well formated for parsing. My suggestion is
that you can write a class to get rid of the all document head at the first
place and put all record well-formated [Cin one file(or stream).
Then what you need to do is just write a xslt to transform the
xml to whatever the format you want and parse it into your application.
【在 l***r 的大作中提到】 : Sorry, I guess I didn't describe the problem clearly. The xml file looks like : this: : : : "NCBI_BlastOutput.dtd"> : : ... : : :
| z****g 发帖数: 2497 | 6 重新读一下SAX parser的sample code.
你的理解是错误的。
SAX parser是循序解读每个element.
另外, 你的xml doc好像有些问题
XML Declaration, DTD怎么有那么多个? 这个如同html的header, 只
应该有一个啊。 | l***r 发帖数: 459 | 7
Really? what's my mistake?
It should be no problem because this is created by commercial program. And, my
SAX parser works for this format.
like
should
XML
parser
【在 z****g 的大作中提到】 : 重新读一下SAX parser的sample code. : 你的理解是错误的。 : SAX parser是循序解读每个element. : 另外, 你的xml doc好像有些问题 : XML Declaration, DTD怎么有那么多个? 这个如同html的header, 只 : 应该有一个啊。
| z****g 发帖数: 2497 | 8 不是一个chunk 一个chunk读的, 是
按顺序,或者说一行一行的读的。
DOM才是整个文件送进去。
明白?
【在 l***r 的大作中提到】 : : Really? what's my mistake? : : It should be no problem because this is created by commercial program. And, my : SAX parser works for this format. : like : should : XML : parser
| x***n 发帖数: 39 | 9 1. chop ur monolithic(?) file (collection of xmls) into collection of
xml files, parse one by one
2. find a fast way to feel an xml document (part of the file) to a parser,
then the second parsing for the second xml DOCUMENT (unfortunately
it's the second part of ur physical file), and so on.
1 or 2.
【在 z****g 的大作中提到】 : 不是一个chunk 一个chunk读的, 是 : 按顺序,或者说一行一行的读的。 : DOM才是整个文件送进去。 : 明白?
| w*r 发帖数: 2421 | 10 man, you got no other choice, your xml doc has multiple xml declaration
header,
what can you expect from the parser ? magic? NO! all you can do is to
design your own 'feeder' to the parser, skip the declare part and feed the
record to the parser. Both 1 and 2 will work, it just depends how big your
file is, if its millions millions record, i suggest 2 if small number of
records, 1 is okey.
And, my
【在 x***n 的大作中提到】 : 1. chop ur monolithic(?) file (collection of xmls) into collection of : xml files, parse one by one : 2. find a fast way to feel an xml document (part of the file) to a parser, : then the second parsing for the second xml DOCUMENT (unfortunately : it's the second part of ur physical file), and so on. : 1 or 2.
| x***n 发帖数: 39 | 11 do u expect him to do a big project?
//btw, not ME does this project.
【在 w*r 的大作中提到】 : man, you got no other choice, your xml doc has multiple xml declaration : header, : what can you expect from the parser ? magic? NO! all you can do is to : design your own 'feeder' to the parser, skip the declare part and feed the : record to the parser. Both 1 and 2 will work, it just depends how big your : file is, if its millions millions record, i suggest 2 if small number of : records, 1 is okey. : : And, my
|
|