那个快？ - Java版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Java版 - 那个快？

相关主题
● Java Map 存 1 million 记录	● treemap和hashma p的问题
● 出个简单题，看你Java APi熟悉到什么程度	● JDK Source?
● Interview的人问我最新java 版本是多少	● 问一个很基本很基本很基本的API问题
● Oracle的jvm收费版本	● java知道一个reference怎么删掉它指向的内存空间？ (转载)
● JDK8要出来了？	● 一个简单的关于java Map的问题
● java8的default出来之后	● JDK HashMap 的 Entry class
● 现在做JAVA开发的用什么系统可靠点啊?	● 一个Java程序员的话(3)
● 几个Java面试题	● 有没有 replaceall for String?

相关话题的讨论汇总
话题: hash话题: 40话题: string话题: contains话题: compare

进入Java版参与讨论

(共1页)

m****r
发帖数: 6639

我有一个40个entry的string array。现在需要implement contains。不知道直接
scan快还是放在map里面快。谁来说说？感觉还是放在map里面快点。但是这个
threshold到底在哪里？

b**l
发帖数: 25

折中一下，用treemap

S**********C
发帖数: 161

40 entries? It doesn't make any difference, don't bother.

【在 m****r 的大作中提到】

: 我有一个40个entry的string array。现在需要implement contains。不知道直接
: scan快还是放在map里面快。谁来说说？感觉还是放在map里面快点。但是这个
: threshold到底在哪里？

m****r
发帖数: 6639

at the end of the day, it probably doesn't matter. but i was just wondering
since the issue came up.

【在 S**********C 的大作中提到】

: 40 entries? It doesn't make any difference, don't bother.

g*****g
发帖数: 34805

Suffix tree最快，但你这数量级，是吃饱了撑的。到4M级别，差别就很大。

【在 m****r 的大作中提到】

m****r
发帖数: 6639

我日。
我的问题就是，在这个数量级，那个快？
如果只有三个，肯定直接scan快。对吧。如果4m， suffix tree快？hashmap快？
反正比linear scan快。
但是中间，有个时候， linear scan 变成不快了。对吧？这个时候在那里？

【在 g*****g 的大作中提到】

: Suffix tree最快，但你这数量级，是吃饱了撑的。到4M级别，差别就很大。

g*****g
发帖数: 34805

40个都是suffix tree快，只不过差不了太多。

【在 m****r 的大作中提到】

: 我日。
: 我的问题就是，在这个数量级，那个快？
: 如果只有三个，肯定直接scan快。对吧。如果4m， suffix tree快？hashmap快？
: 反正比linear scan快。
: 但是中间，有个时候， linear scan 变成不快了。对吧？这个时候在那里？

p****i
发帖数: 6135

答案是： depends
姐给你分析一下：
先说linear scan，这个容易
空间： one table with 40 entries. 40*sizeof （yourobject）
时间：
如果查找不成功，你需要40 个 index＋＋操作外加40个比较操作： 40 ＋＋
operations and 40 compare operations
如果成功， averagely 你需要 40/2 ＝20 个index ＋＋ operations and 20 compare
operations

【在 m****r 的大作中提到】

g*****g
发帖数: 34805

This is wrong. Both the hotspot JDK and OpenJDK implementation uses a native
O(nm) algorithm for contains operation. Not a Boyer–Moore string which is
O(n).
So worst case can be 40 * sizeof(search key) * average_sizeof(keys)

compare

【在 p****i 的大作中提到】

: 答案是： depends
: 姐给你分析一下：
: 先说linear scan，这个容易
: 空间： one table with 40 entries. 40*sizeof （yourobject）
: 时间：
: 如果查找不成功，你需要40 个 index＋＋操作外加40个比较操作： 40 ＋＋
: operations and 40 compare operations
: 如果成功， averagely 你需要 40/2 ＝20 个index ＋＋ operations and 20 compare
: operations

p****i
发帖数: 6135

然后来看hashmap
hashmap的特点就是用空间换时间
重点就是你准备用多大的空间来换你的时间呢？
而且这个也跟你的key密切相关
举个简单的例子，如果你的key是integer， range 1-100
那最简单的hash就是建立一个空间为100个entry 的hash table
直接用key作index
这样你的查找时间无论是miss还是hit，都只有一个compare operation而已
稍微变化一下这个例子，如果你key的range是1-10000呢？
你是否还会愿意用一个空间为10000个entry的hashtable来储存你仅仅40个的数据？
如果说，你不愿意，那你需要在hash function上下功夫，
generally speaking， hash function通常的形势是： Key part 1 * prime number A
+ Key part 2 * prime number B +...
通过认真设计hash function来达到节约一定空间的目的
但是，不是没有代价的，代价就是这个hash function的复杂度。
不考虑collision
这时候查找的cost，无论miss还是hit都是：一次hash function的运算 cost ＋一
次compare 的cost
在这个简单的例子里：通常一个hashfunction的运算cost已经超过了你的20次＋＋
and 20 次compare的cost

compare

【在 p****i 的大作中提到】

相关主题
● java8的default出来之后	● treemap和hashma p的问题
● 现在做JAVA开发的用什么系统可靠点啊?	● JDK Source?
● 几个Java面试题	● 问一个很基本很基本很基本的API问题
进入Java版参与讨论

p****i
发帖数: 6135

我Java不熟，就是high level， theoretical的给他分析一下
而且我不太理解：你的这个说的是空间还是时间？

native
is

【在 g*****g 的大作中提到】

: This is wrong. Both the hotspot JDK and OpenJDK implementation uses a native
: O(nm) algorithm for contains operation. Not a Boyer–Moore string which is
: O(n).
: So worst case can be 40 * sizeof(search key) * average_sizeof(keys)
:
: compare

p****i
发帖数: 6135

上面是general speaking，
回到你的问题本身，如果object本身是string的话，
重点就是看hash string －－》 index 的cost
和compare two strings的cost了
这个对java熟的人可以具体分析下
好了，我也要问个问题了，大家帮忙看下

【在 p****i 的大作中提到】

: 然后来看hashmap
: hashmap的特点就是用空间换时间
: 重点就是你准备用多大的空间来换你的时间呢？
: 而且这个也跟你的key密切相关
: 举个简单的例子，如果你的key是integer， range 1-100
: 那最简单的hash就是建立一个空间为100个entry 的hash table
: 直接用key作index
: 这样你的查找时间无论是miss还是hit，都只有一个compare operation而已
: 稍微变化一下这个例子，如果你key的range是1-10000呢？
: 你是否还会愿意用一个空间为10000个entry的hashtable来储存你仅仅40个的数据？

r*****s
发帖数: 985

要考虑节约空间的话
用bloom filter
还是要看具体要求。

【在 p****i 的大作中提到】

w**z
发帖数: 8232

试一下，不就知道了？用啥hash ,存啥value 都有关系。

【在 m****r 的大作中提到】

m****r
发帖数: 6639

contains() for what?

native
is

【在 g*****g 的大作中提到】

g*****g
发帖数: 34805

说的是JDK里 String.contains是一个O(nm)时间复杂度的实现，不是最好的算法。

【在 m****r 的大作中提到】

: contains() for what?
:
: native
: is

m****r
发帖数: 6639

yes. i understand all this. but again, the question i am asking is, at how
many strings does the linear scan becomes slower.

【在 p****i 的大作中提到】

m****r
发帖数: 6639

why do i need contains()? i would used equals.

【在 g*****g 的大作中提到】

: 说的是JDK里 String.contains是一个O(nm)时间复杂度的实现，不是最好的算法。

g*****g
发帖数: 34805

你自己说 "现在需要implement contains"。
完全匹配简单，hashmap性能就不错。

【在 m****r 的大作中提到】

: why do i need contains()? i would used equals.

m****r
发帖数: 6639

我再日。 contains 是针对这个40个string的list说的。不是per string。

【在 g*****g 的大作中提到】

: 你自己说 "现在需要implement contains"。
: 完全匹配简单，hashmap性能就不错。

相关主题
● java知道一个reference怎么删掉它指向的内存空间？ (转载)	● 一个Java程序员的话(3)
● 一个简单的关于java Map的问题	● 有没有 replaceall for String?
● JDK HashMap 的 Entry class	● [转载] Java 1.5 Generic 问题
进入Java版参与讨论

m****r
发帖数: 6639

看来我的语言障碍很大。不适合xiecode。

【在 m****r 的大作中提到】

: 我再日。 contains 是针对这个40个string的list说的。不是per string。

p****i
发帖数: 6135

你去研究一下java default string hash function
然后回来update我们一下吧

how

【在 m****r 的大作中提到】

: yes. i understand all this. but again, the question i am asking is, at how
: many strings does the linear scan becomes slower.

c*********e
发帖数: 16335

恩，我也觉得hashmap可以解决楼主的问题，既然楼主要快的那个。hashmap其实是
hashtable,就是o(1),這個速度绝对快。

【在 p****i 的大作中提到】

f**r
发帖数: 865

想要知道的话写个benchmark test试试呗。:-)

F****n
发帖数: 3271

Depends. Usually a Map uses hash table and thus depends on how you calculate
the hashcode for a string.
Therefore, your map's performance will vary for different values in your
array because the default String.hashCode depends on the char sequence.

【在 m****r 的大作中提到】

F****n
发帖数: 3271

Again, the answer is "depend on what specific strings"
{"1", "2", "3"....} is different from {"one", "two", "three"...}

how

【在 m****r 的大作中提到】

: yes. i understand all this. but again, the question i am asking is, at how
: many strings does the linear scan becomes slower.

m****r
发帖数: 6639

ok. what i have is a list of locale strings, such as "en_US", "nl_BE", etc.

【在 F****n 的大作中提到】

: Again, the answer is "depend on what specific strings"
: {"1", "2", "3"....} is different from {"one", "two", "three"...}
:
: how

(共1页)

进入Java版参与讨论

相关主题
● 有没有 replaceall for String?	● JDK8要出来了？
● [转载] Java 1.5 Generic 问题	● java8的default出来之后
● Question on JSP EL	● 现在做JAVA开发的用什么系统可靠点啊?
● web service returns HashMap that contains multiple ArrayList	● 几个Java面试题
● Java Map 存 1 million 记录	● treemap和hashma p的问题
● 出个简单题，看你Java APi熟悉到什么程度	● JDK Source?
● Interview的人问我最新java 版本是多少	● 问一个很基本很基本很基本的API问题
● Oracle的jvm收费版本	● java知道一个reference怎么删掉它指向的内存空间？ (转载)

相关话题的讨论汇总
话题: hash话题: 40话题: string话题: contains话题: compare

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天