由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
JobHunting版 - 一道google面经题
相关主题
一道google的面试题.g家 如何判断一个字符串里的都是合法的UTF-8编码in java?
到底什么样的二叉树才是平衡二叉树?请问一道G phone interview题
弱问题,连反转链表都看不懂了问两个题,一直不知道如何做
请教电面试题怎么把 integer 转为 multi-byte integer format? (转载)
Google Intern判断 bst 疑问
有道面试题是 UTF-8 string, 大家知道原题是什么吗?记得有人报过,但是不知道具体是干嘛?remove a node in O(1) from link list
Google电面题 写dominoChecker class看到一个c的面试题,求教。
刚结束大G面试新鲜Amazon面经(附参考答案) 顺便求各种大公司refer
相关话题的讨论汇总
话题: byte话题: type话题: rune话题: return话题: int
进入JobHunting版参与讨论
1 (共1页)
s*******3
发帖数: 6
1
UTF-8 encoding scheme is described below:
0XXXXXXX = this is the entire rune
10XXXXXX = this is a continuation of the rune from the previous byte
110XXXXX = this is the start of a 2-byte rune.
1110XXXX = this is the start of a 3-byte rune.
11110XXX = this is the start of a 4-byte rune.
111110XX = this is the start of a 5-byte rune.
1111110X = this is the start of a 6-byte rune.
11111110 = this is the start of a 7-byte rune.
11111111 = this is the start of a 8-byte rune.
For example, a 3-byte rune would be 1110XXXX 10XXXXXX 10XXXXXX.
Write a function that decides whether a given byte array (or string) is
valid UTF-8 encoded text.
求讨论求指导
x***j
发帖数: 75
2
没太看懂
l*****a
发帖数: 14598
3
那就去学习一下UTF-8的定义

【在 x***j 的大作中提到】
: 没太看懂
l*****a
发帖数: 14598
4
通过这个识别每个byte的type,再check是否满足规则?
public int getType(byte input) {
int target=1<<7;
int type=0;
while(target>0) {
if(intput & target==0) return type;
type++;
target>=1;
}
}

【在 s*******3 的大作中提到】
: UTF-8 encoding scheme is described below:
: 0XXXXXXX = this is the entire rune
: 10XXXXXX = this is a continuation of the rune from the previous byte
: 110XXXXX = this is the start of a 2-byte rune.
: 1110XXXX = this is the start of a 3-byte rune.
: 11110XXX = this is the start of a 4-byte rune.
: 111110XX = this is the start of a 5-byte rune.
: 1111110X = this is the start of a 6-byte rune.
: 11111110 = this is the start of a 7-byte rune.
: 11111111 = this is the start of a 8-byte rune.

l*******a
发帖数: 16
5
Nice~
public boolean validUTF8(byte [] array){
if (array == null || array.length == 0){
return false;
}
int leftByte = 0;
for (int i = 0; i < array.length; i++){
int val = getType(array[i]);
if (leftByte == 0){ // new start
if (val == 1){ // continue byte
return false;
}
else if (val == 0){
continue;
}
else {
leftByte = val-1;
}
}
else{ // shall continue
if (val != 1){
return false;
}
else{
leftByte--;
}
}
}
if (leftByte == 0){
return true;
}
else{
return false;
}
}

public int getType(byte input) {
int target=1<<7;
int type=0;
while(target>0) {
if((input & target)==0) return type;
type++;
target = target>>1;
}
return type;
}

【在 l*****a 的大作中提到】
: 通过这个识别每个byte的type,再check是否满足规则?
: public int getType(byte input) {
: int target=1<<7;
: int type=0;
: while(target>0) {
: if(intput & target==0) return type;
: type++;
: target>=1;
: }
: }

p**p
发帖数: 742
6
这样做不是最优吧。没必要把8位都循环一遍。
遇到第一个0就可以返回了:
private int getType(byte b) {
int mask = 1 << 7;
int type = 0;
while((mask & b) != 0) {
type++;
mask >>= 1;
}
return type;
}

【在 l*****a 的大作中提到】
: 通过这个识别每个byte的type,再check是否满足规则?
: public int getType(byte input) {
: int target=1<<7;
: int type=0;
: while(target>0) {
: if(intput & target==0) return type;
: type++;
: target>=1;
: }
: }

p**p
发帖数: 742
7
看错了。你的方法也是第一个0就返回。

【在 p**p 的大作中提到】
: 这样做不是最优吧。没必要把8位都循环一遍。
: 遇到第一个0就可以返回了:
: private int getType(byte b) {
: int mask = 1 << 7;
: int type = 0;
: while((mask & b) != 0) {
: type++;
: mask >>= 1;
: }
: return type;

p*****2
发帖数: 21240
8
我回家写一下

【在 l*****a 的大作中提到】
: 通过这个识别每个byte的type,再check是否满足规则?
: public int getType(byte input) {
: int target=1<<7;
: int type=0;
: while(target>0) {
: if(intput & target==0) return type;
: type++;
: target>=1;
: }
: }

p**p
发帖数: 742
9
private int getType(byte b) {
int mask = 1 << 7;
int type = 0;
while((mask & b) != 0) {
type++;
mask >>= 1;
}
return type;
}

public boolean isValidUTF8(byte[] bytes) {
if(bytes == null || bytes.length == 0) {
return false;
}
int bytesLeft = 0;
for(byte b : bytes) {
int type = getType(b);
if(type == 0) {
if(bytesLeft != 0) {
return false;
}
continue;
} else if (type == 1) {
if(bytesLeft == 0) {
return false;
}
bytesLeft--;
} else {
if(bytesLeft != 0) {
return false;
}
bytesLeft = type-1;
}
}
return bytesLeft == 0;
}

【在 s*******3 的大作中提到】
: UTF-8 encoding scheme is described below:
: 0XXXXXXX = this is the entire rune
: 10XXXXXX = this is a continuation of the rune from the previous byte
: 110XXXXX = this is the start of a 2-byte rune.
: 1110XXXX = this is the start of a 3-byte rune.
: 11110XXX = this is the start of a 4-byte rune.
: 111110XX = this is the start of a 5-byte rune.
: 1111110X = this is the start of a 6-byte rune.
: 11111110 = this is the start of a 7-byte rune.
: 11111111 = this is the start of a 8-byte rune.

n**p
发帖数: 1150
10
while(s != NULL && *s != NULL)
{
if (*s & 0x80 == 0) {s++; continue;}
if (*s & 0xC0 == 0x80) return false;
char* t = s+1;
for(int bitmask = 0x40;
bitmask >0 && (*s & bitmask != 0);
bitmask /= 2, t++)
if (*t == NULL || (*t & 0xC0 != 0x80)) return false;
s = t + 1;
}
return true;

【在 s*******3 的大作中提到】
: UTF-8 encoding scheme is described below:
: 0XXXXXXX = this is the entire rune
: 10XXXXXX = this is a continuation of the rune from the previous byte
: 110XXXXX = this is the start of a 2-byte rune.
: 1110XXXX = this is the start of a 3-byte rune.
: 11110XXX = this is the start of a 4-byte rune.
: 111110XX = this is the start of a 5-byte rune.
: 1111110X = this is the start of a 6-byte rune.
: 11111110 = this is the start of a 7-byte rune.
: 11111111 = this is the start of a 8-byte rune.

p*****2
发帖数: 21240
11
def isUTF(xs: List[String]):Boolean = {
xs match {
case Nil => true
case head::tail => head.toList match {
case '0'::_ => isUTF(xs.tail)
case '1'::'0'::_ => false
case _ =>
val len = (x: String) => x.prefixLength(_ == '1')
val l = len(head)
val rest = tail.take(l-1)
rest.size == l-1 && rest.forall(len(_)==1) && isUTF(xs.drop(l))
}
}
}
l*********8
发帖数: 4642
12
bool isUTF8(const string & str) {
int ct = 0;
for (char ch : str) {
int type = 8;
for (ch = ~ch; ch; ch >>= 1)
--type;
if (ct > 0) {
if (type != 1) return false;
--ct;
} else {
if (type == 1) return false;
ct = max(0, type - 1);
}
}
return ct == 0;
}
1 (共1页)
进入JobHunting版参与讨论
相关主题
新鲜Amazon面经(附参考答案) 顺便求各种大公司referGoogle Intern
google 电面面经有道面试题是 UTF-8 string, 大家知道原题是什么吗?记得有人报过,但是不知道具体是干嘛?
crack code interview 4.7 给的答案是对的么Google电面题 写dominoChecker class
Programming interview exposed 上面的那道NULL or Cycle的linked list题刚结束大G面试
一道google的面试题.g家 如何判断一个字符串里的都是合法的UTF-8编码in java?
到底什么样的二叉树才是平衡二叉树?请问一道G phone interview题
弱问题,连反转链表都看不懂了问两个题,一直不知道如何做
请教电面试题怎么把 integer 转为 multi-byte integer format? (转载)
相关话题的讨论汇总
话题: byte话题: type话题: rune话题: return话题: int