由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Java版 - 问个xml的问题
相关主题
怎麼得到字符串中的raw bytes?运行servlet时出现的http status 404问题
Eclipse不能保存UTF-8文件?今天面试。
Re: 问题:用Java/HTTP协议传送jpg图像文件火车旅行家在中文WINDOWS下无法运行的大问题已被解决
实实在在受不了了,还是来这里求教!A question about Content-type & encoding
a stupid question急问:java如何处理中文字符
请教汉字的utf-8 mapping (转载)[转载] 请问如何改变Oracle 9i AS中JVM的default encoding
显示email中文的问题Question about displaying Chinese
请问用eclipse开发中文的软件question on stream write
相关话题的讨论汇总
话题: utf话题: 16话题: chinese话题: marshal话题: java
进入Java版参与讨论
1 (共1页)
h**********c
发帖数: 4120
1
鼓捣了两天jaxb,
marshal的时候,没明白设utf-8 和 utf-16有什么区别,
试验了一下utf-16,结果生成的文件全是乱码,
另外java 是怎么处理中文的,
知道的就给扫扫盲吧,也不想太深入,将来要用的时候有个底。
h**********c
发帖数: 4120
2
re

【在 h**********c 的大作中提到】
: 鼓捣了两天jaxb,
: marshal的时候,没明白设utf-8 和 utf-16有什么区别,
: 试验了一下utf-16,结果生成的文件全是乱码,
: 另外java 是怎么处理中文的,
: 知道的就给扫扫盲吧,也不想太深入,将来要用的时候有个底。

g*****g
发帖数: 34805
3
utf-8是可变长度,1-3个字节编码都有。utf-16是固定双字节编码,
java内部总是用utf-16

【在 h**********c 的大作中提到】
: 鼓捣了两天jaxb,
: marshal的时候,没明白设utf-8 和 utf-16有什么区别,
: 试验了一下utf-16,结果生成的文件全是乱码,
: 另外java 是怎么处理中文的,
: 知道的就给扫扫盲吧,也不想太深入,将来要用的时候有个底。

F****n
发帖数: 3271
4
Use UTF-8, not UTF-16. UTF-16 is based on the naive assumption that 2 bytes
are enough to encode all characters in the world. This has proved to be
wrong, and it has now become the source of a big mess in Java's internal
character representation, which is based on UTF-16.

【在 h**********c 的大作中提到】
: 鼓捣了两天jaxb,
: marshal的时候,没明白设utf-8 和 utf-16有什么区别,
: 试验了一下utf-16,结果生成的文件全是乱码,
: 另外java 是怎么处理中文的,
: 知道的就给扫扫盲吧,也不想太深入,将来要用的时候有个底。

c*****t
发帖数: 1879
5
Without the naive assumption, most string computations will be too complex
and inefficient.
It is merely a trade off people make.

bytes

【在 F****n 的大作中提到】
: Use UTF-8, not UTF-16. UTF-16 is based on the naive assumption that 2 bytes
: are enough to encode all characters in the world. This has proved to be
: wrong, and it has now become the source of a big mess in Java's internal
: character representation, which is based on UTF-16.

F****n
发帖数: 3271
6
Why?

【在 c*****t 的大作中提到】
: Without the naive assumption, most string computations will be too complex
: and inefficient.
: It is merely a trade off people make.
:
: bytes

p*a
发帖数: 592
7
utf16不是有surrogate pair吗,可以支持4个字节

bytes

【在 F****n 的大作中提到】
: Use UTF-8, not UTF-16. UTF-16 is based on the naive assumption that 2 bytes
: are enough to encode all characters in the world. This has proved to be
: wrong, and it has now become the source of a big mess in Java's internal
: character representation, which is based on UTF-16.

m****r
发帖数: 6639
8
我怀疑他看到乱码, 是因为他用notepad之类的去 开, 然后因为notepad不支持utf-16
所以是乱码. 没有道理说utf-8可以写, utf-16就写出来不对了.

【在 g*****g 的大作中提到】
: utf-8是可变长度,1-3个字节编码都有。utf-16是固定双字节编码,
: java内部总是用utf-16

F****n
发帖数: 3271
9
Then why UTF-16 in the first place. The fact is UTF-16 is a older and broken
system that people have to fix .

【在 p*a 的大作中提到】
: utf16不是有surrogate pair吗,可以支持4个字节
:
: bytes

F****n
发帖数: 3271
10
UTF-16 has a byte-ordering problem as many programs assume specific byte
ordering without specifying it in the output files. That's why UTF-16 files
frequently have this cross-program issue.

16

【在 m****r 的大作中提到】
: 我怀疑他看到乱码, 是因为他用notepad之类的去 开, 然后因为notepad不支持utf-16
: 所以是乱码. 没有道理说utf-8可以写, utf-16就写出来不对了.

h**********c
发帖数: 4120
11
thanks for the replies
This question is no longer my focus.
But I remember, not very sure, I did the following tests.
Marshal to UTF-16, then print with printwriter -> unreadable mess.
Marshal to UTF-8, java String has Chinese characters, writing to file, every
Chinese character becomes '?'
Win 7, Eclipse, English platform.
if using VC++ msxml, same PC, no such problem. But msxml has no option to
set UTF-16 or 8.
Thanks again,
g*****g
发帖数: 34805
12
You should decode and encode using UTF-8 for Chinese characters.
Now once it's encoded to UTF-8, you need right edtor to open it,
like M$ word should do it.

every

【在 h**********c 的大作中提到】
: thanks for the replies
: This question is no longer my focus.
: But I remember, not very sure, I did the following tests.
: Marshal to UTF-16, then print with printwriter -> unreadable mess.
: Marshal to UTF-8, java String has Chinese characters, writing to file, every
: Chinese character becomes '?'
: Win 7, Eclipse, English platform.
: if using VC++ msxml, same PC, no such problem. But msxml has no option to
: set UTF-16 or 8.
: Thanks again,

1 (共1页)
进入Java版参与讨论
相关主题
question on stream writea stupid question
jsp开发请教请教汉字的utf-8 mapping (转载)
how to set java run time locale/encoding?显示email中文的问题
[转载] Lousy WSDL?请问用eclipse开发中文的软件
怎麼得到字符串中的raw bytes?运行servlet时出现的http status 404问题
Eclipse不能保存UTF-8文件?今天面试。
Re: 问题:用Java/HTTP协议传送jpg图像文件火车旅行家在中文WINDOWS下无法运行的大问题已被解决
实实在在受不了了,还是来这里求教!A question about Content-type & encoding
相关话题的讨论汇总
话题: utf话题: 16话题: chinese话题: marshal话题: java