h**********c 发帖数: 4120 | 1 鼓捣了两天jaxb,
marshal的时候,没明白设utf-8 和 utf-16有什么区别,
试验了一下utf-16,结果生成的文件全是乱码,
另外java 是怎么处理中文的,
知道的就给扫扫盲吧,也不想太深入,将来要用的时候有个底。 |
h**********c 发帖数: 4120 | 2 re
【在 h**********c 的大作中提到】 : 鼓捣了两天jaxb, : marshal的时候,没明白设utf-8 和 utf-16有什么区别, : 试验了一下utf-16,结果生成的文件全是乱码, : 另外java 是怎么处理中文的, : 知道的就给扫扫盲吧,也不想太深入,将来要用的时候有个底。
|
g*****g 发帖数: 34805 | 3 utf-8是可变长度,1-3个字节编码都有。utf-16是固定双字节编码,
java内部总是用utf-16
【在 h**********c 的大作中提到】 : 鼓捣了两天jaxb, : marshal的时候,没明白设utf-8 和 utf-16有什么区别, : 试验了一下utf-16,结果生成的文件全是乱码, : 另外java 是怎么处理中文的, : 知道的就给扫扫盲吧,也不想太深入,将来要用的时候有个底。
|
F****n 发帖数: 3271 | 4 Use UTF-8, not UTF-16. UTF-16 is based on the naive assumption that 2 bytes
are enough to encode all characters in the world. This has proved to be
wrong, and it has now become the source of a big mess in Java's internal
character representation, which is based on UTF-16.
【在 h**********c 的大作中提到】 : 鼓捣了两天jaxb, : marshal的时候,没明白设utf-8 和 utf-16有什么区别, : 试验了一下utf-16,结果生成的文件全是乱码, : 另外java 是怎么处理中文的, : 知道的就给扫扫盲吧,也不想太深入,将来要用的时候有个底。
|
c*****t 发帖数: 1879 | 5 Without the naive assumption, most string computations will be too complex
and inefficient.
It is merely a trade off people make.
bytes
【在 F****n 的大作中提到】 : Use UTF-8, not UTF-16. UTF-16 is based on the naive assumption that 2 bytes : are enough to encode all characters in the world. This has proved to be : wrong, and it has now become the source of a big mess in Java's internal : character representation, which is based on UTF-16.
|
F****n 发帖数: 3271 | 6 Why?
【在 c*****t 的大作中提到】 : Without the naive assumption, most string computations will be too complex : and inefficient. : It is merely a trade off people make. : : bytes
|
p*a 发帖数: 592 | 7 utf16不是有surrogate pair吗,可以支持4个字节
bytes
【在 F****n 的大作中提到】 : Use UTF-8, not UTF-16. UTF-16 is based on the naive assumption that 2 bytes : are enough to encode all characters in the world. This has proved to be : wrong, and it has now become the source of a big mess in Java's internal : character representation, which is based on UTF-16.
|
m****r 发帖数: 6639 | 8 我怀疑他看到乱码, 是因为他用notepad之类的去 开, 然后因为notepad不支持utf-16
所以是乱码. 没有道理说utf-8可以写, utf-16就写出来不对了.
【在 g*****g 的大作中提到】 : utf-8是可变长度,1-3个字节编码都有。utf-16是固定双字节编码, : java内部总是用utf-16
|
F****n 发帖数: 3271 | 9 Then why UTF-16 in the first place. The fact is UTF-16 is a older and broken
system that people have to fix .
【在 p*a 的大作中提到】 : utf16不是有surrogate pair吗,可以支持4个字节 : : bytes
|
F****n 发帖数: 3271 | 10 UTF-16 has a byte-ordering problem as many programs assume specific byte
ordering without specifying it in the output files. That's why UTF-16 files
frequently have this cross-program issue.
16
【在 m****r 的大作中提到】 : 我怀疑他看到乱码, 是因为他用notepad之类的去 开, 然后因为notepad不支持utf-16 : 所以是乱码. 没有道理说utf-8可以写, utf-16就写出来不对了.
|
h**********c 发帖数: 4120 | 11 thanks for the replies
This question is no longer my focus.
But I remember, not very sure, I did the following tests.
Marshal to UTF-16, then print with printwriter -> unreadable mess.
Marshal to UTF-8, java String has Chinese characters, writing to file, every
Chinese character becomes '?'
Win 7, Eclipse, English platform.
if using VC++ msxml, same PC, no such problem. But msxml has no option to
set UTF-16 or 8.
Thanks again, |
g*****g 发帖数: 34805 | 12 You should decode and encode using UTF-8 for Chinese characters.
Now once it's encoded to UTF-8, you need right edtor to open it,
like M$ word should do it.
every
【在 h**********c 的大作中提到】 : thanks for the replies : This question is no longer my focus. : But I remember, not very sure, I did the following tests. : Marshal to UTF-16, then print with printwriter -> unreadable mess. : Marshal to UTF-8, java String has Chinese characters, writing to file, every : Chinese character becomes '?' : Win 7, Eclipse, English platform. : if using VC++ msxml, same PC, no such problem. But msxml has no option to : set UTF-16 or 8. : Thanks again,
|