mySQL select 的速度与table的长度有关吗？ - Database版 - 未名存档

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Database版 - mySQL select 的速度与table的长度有关吗？

相关主题
● advices please on learning Oracle	● 想考个oracle的证书,哪个好点?入门级别的.
● 新手求助：学那个DATABASE软件好呢？	● 印度老板有些什么不好？
● 今天，光荣的成为了一个DBA	● 诚恳请求帮忙--DBA training
● mysql	● 湾曲哪个数据库好找工作啊？
● MySQL DBA 的前途	● 做数据库内核开发怎么样？
● 一个SUN公司certificate的deal兼问MySQL DBA certificate有用吗？	● mysql position
● How popular is Sybase these days?	● Non (MS SQL and Oracle) 快过来报个到
● 请问Facebook是用Mysql and php做的吗？ (转载)	● 洛杉矶DBa 什么价位？

相关话题的讨论汇总
话题: index话题: big话题: dba话题: log话题: turn

进入Database版参与讨论

1

(共1页)

s********1 发帖数: 581	1 请教哪位高人： mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N " 有关吗？是O(LOG(N))的关系吗？谢谢。
v*****r 发帖数: 1119	2 Assuming index is maintained properly,it is O(LOG(N)) N 【在 s********1 的大作中提到】 : 请教哪位高人： : mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N : " 有关吗？是O(LOG(N))的关系吗？ : 谢谢。
g***l 发帖数: 18555	3 哇，很专业啊
s********1 发帖数: 581	4 请教哪位高人： mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N " 有关吗？是O(LOG(N))的关系吗？谢谢。
v*****r 发帖数: 1119	5 Assuming index is maintained properly,it is O(LOG(N)) N 【在 s********1 的大作中提到】 : 请教哪位高人： : mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N : " 有关吗？是O(LOG(N))的关系吗？ : 谢谢。
g***l 发帖数: 18555	6 哇，很专业啊
s*****o 发帖数: 303	7 MSSQL，ORACLE 的也有同样关系么？
y****9 发帖数: 144	8 for binary search a sorted list with lengh N it is O(log2(N)) . For index access a key, it should not related to N in this way. Based on common sense, it is related to index depth. index depth is related to N and how many entries can a block holds. it probably is O(logb(N)) where b is the number of entries in the block. see: http://en.wikipedia.org/wiki/B-tree#Search N 【在 s********1 的大作中提到】 : 请教哪位高人： : mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N : " 有关吗？是O(LOG(N))的关系吗？ : 谢谢。
v*****r 发帖数: 1119	9 Don't get confused. Log2(N) is the best case of height value for a N node self-balancing b-tree. Logb(N) is the best case of height value for a N node regular b-tree with each node containing b number of elements. For search/insert/delete efficiency of any b-tree in O notation, it is O(log N). For algorithm with logarithm efficiency, logarithm base doesn't matter. index sense, 【在 y****9 的大作中提到】 : for binary search a sorted list with lengh N it is O(log2(N)) . For index : access a key, it should not related to N in this way. Based on common sense, : it is related to index depth. index depth is related to N and how many : entries can a block holds. it probably is O(logb(N)) where b is the : number of entries in the block. : see: http://en.wikipedia.org/wiki/B-tree#Search : : N
y****9 发帖数: 144	10 tree. log YOu are probably right. this maybe the first time I need to think in big O notation after school. Even in school I was not goot at algorithm anaysis. If we put therotical algorithm complexity anaysis aside, from a practical point of view,Is it safe to say the speed of select based on PK index is almost same regardless of table lengh? I just checked the bigest table in one of my Oracle db, it has 2 billion row. the consistent gets is 4. So it should be index root block, branch block, leaf block and data block. As long as we do 4 gets for a typical index, the speed should be same. I know one can argue the differenes among 1,2,3,4 gets.But really practically does it really matter when talking about the speed of select based on index wrt table length ? 【在 v*****r 的大作中提到】 : Don't get confused. : Log2(N) is the best case of height value for a N node self-balancing b-tree. : Logb(N) is the best case of height value for a N node regular b-tree with : each node containing b number of elements. : For search/insert/delete efficiency of any b-tree in O notation, it is O(log : N). For algorithm with logarithm efficiency, logarithm base doesn't matter. : : index : sense,
v*****r 发帖数: 1119	11 It is not right to assume speed is the same for the same number of consistent gets, which is determined by the height of index. You just ignored the cpu time even though it is only a fraction of disk IO. (that is why we never need to worry about big o in db world as we have other more practical things to concern) Also if we have to consider a real world case, there are usually more than one algorithms involved. Like in your case, the first three consistent gets are related to index lookup, which is O(logn) in theory. The last consistent get is to getting table row using rowid obtained from index. It is using a separate algorithm, which is O(1) if thinking about big O again. . 【在 y****9 的大作中提到】 : : tree. : log : YOu are probably right. this maybe the first time I need to think in big O : notation after school. Even in school I was not goot at algorithm anaysis. : If we put therotical algorithm complexity anaysis aside, from a practical : point of view,Is it safe to say the speed of select based on PK index is : almost same regardless of table lengh? I just checked the bigest table in : one of my Oracle db, it has 2 billion row. the consistent gets is 4. : So it should be index root block, branch block, leaf block and data block.
y****9 发帖数: 144	12 Original question: mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N " 有关吗？是O(LOG(N))的关系吗？ If we understand the original question as the relationship between the cost of index access and the table length, my opinion is that they do not have direct relashionship. Big-O notation is used to analyze algorithm's efficiency and complexity, it seems not appropriate to be used to describe such a relathiship based on Jonathan Lewis's CBO book, the cost of a simple B-tree access is cost = blevel + ceiling(leaf_blocks * effective index selectivity) + ceiling(clustering_factor * effecitive table selectivity). Imagining we have many books with different pages but same size, each book has exactly one page refer to the word "DBA", we want to know the speed of find this page for different books. In this case the speed really means the cost - the cost can expressed by the total pages you turn. 1. Assuming every one's firt turn can find the exact the index page which contains D - this is one turn 2. Assuming we need to turn to next page to get the "DBA" entry under D section 3. Go to the book page - the final turn Above cost is 3. As how faster your eye can locate the "D" "DBA" within a page should be not relevant. No matter how thick or thin your book is, as long as the number of pages turned are same, we should say we have the same speed(cost) to find the page contains "DBA" Return to the real case, we do one consiste get for a block, then the CPU time to examin a block should be not in the equation of index access cost. My arugument is just trying to clarify the concept for every one's benefit, not necessarily limited to original question ( first for myself of cours). I do agree that the algorimethic efficiency of index access is O(logN) - it is good for me to take the chance to revisit the big-O notation. is gets consistent a 【在 v*****r 的大作中提到】 : It is not right to assume speed is the same for the same number of : consistent gets, which is determined by the height of index. You just : ignored the cpu time even though it is only a fraction of disk IO. (that is : why we never need to worry about big o in db world as we have other more : practical things to concern) : Also if we have to consider a real world case, there are usually more than : one algorithms involved. Like in your case, the first three consistent gets : are related to index lookup, which is O(logn) in theory. The last consistent : get is to getting table row using rowid obtained from index. It is using a : separate algorithm, which is O(1) if thinking about big O again.
v*****r 发帖数: 1119	13 Use your example, we need to locate "DBA" word in a book, "DBA" is indexed and occurs once in the book. 1. Assuming every one's first turn can find the exact the index page which contains D - this is one turn Comment: In big O, this turn's efficiency is O (log n). 2. Assuming we need to turn to next page to get the "DBA" entry under D section Comment: In big O, this turn's efficiency is O (n), unless you have a fancy structure implemented here within each index directory (for "D"). 3. Go to the book page - the final turn Comment: In big O, this final turn's efficiency is O (1). As you can see, there are three different algorithms being involved in your example. What's the point of discussing big O here other than going through basics of algorithms. Practically speaking there is no point. Why? One reason is the physical device speed limitation, like logcial/physical IOs. Those factors in Big O should be ignored. But in practice, how can you ignore speed factors that accounts for 99% percent of time? That is why Optimizer calculate Costs using index height/clustering factor/selectivity etc and DBAs looks at real factors that impacts performance. Even in programming world, other than algorithm analysis, I don't think there are real practical usages of Big O on performance area. But programmers needs to worry about Big O simply on interviews there are Big O questions. N cost 【在 y****9 的大作中提到】 : Original question: : : mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N : " 有关吗？是O(LOG(N))的关系吗？ : : If we understand the original question as the relationship between the cost : of index access and the table length, my opinion is that they do not have : direct relashionship. Big-O notation is used to analyze algorithm's : efficiency and complexity, it seems not appropriate to be used to describe : such a relathiship

1

(共1页)

进入Database版参与讨论

相关主题
● 洛杉矶DBa 什么价位？	● MySQL DBA 的前途
● CINAOUG/CINASSUG MySQL 2013讲座	● 一个SUN公司certificate的deal兼问MySQL DBA certificate有用吗？
● I am hiring Lead DBA (MySQL) (转载)	● How popular is Sybase these days?
● 带娃大妈读个州大database certificate如何？ (转载)	● 请问Facebook是用Mysql and php做的吗？ (转载)
● advices please on learning Oracle	● 想考个oracle的证书,哪个好点?入门级别的.
● 新手求助：学那个DATABASE软件好呢？	● 印度老板有些什么不好？
● 今天，光荣的成为了一个DBA	● 诚恳请求帮忙--DBA training
● mysql	● 湾曲哪个数据库好找工作啊？

相关话题的讨论汇总
话题: index话题: big话题: dba话题: log话题: turn

未名新帖统计// 7月16日

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

* 这里只显示发帖超过25的版面，努力灌水吧:-)