由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Database版 - mySQL select 的速度与table的长度有关吗?
相关主题
advices please on learning Oracle想考个oracle的证书,哪个好点?入门级别的.
新手求助: 学那个DATABASE软件好呢?印度老板有些什么不好?
今天,光荣的成为了一个DBA诚恳请求帮忙--DBA training
mysql湾曲哪个数据库好找工作啊?
MySQL DBA 的前途做数据库内核开发怎么样?
一个SUN公司certificate的deal兼问MySQL DBA certificate有用吗?mysql position
How popular is Sybase these days?Non (MS SQL and Oracle) 快过来报个到
请问Facebook是用Mysql and php做的吗? (转载)洛杉矶DBa 什么价位?
相关话题的讨论汇总
话题: index话题: big话题: dba话题: log话题: turn
进入Database版参与讨论
1 (共1页)
s********1
发帖数: 581
1
请教哪位高人:
mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N
" 有关吗?是O(LOG(N))的关系吗?
谢谢。
v*****r
发帖数: 1119
2
Assuming index is maintained properly,it is O(LOG(N))

N

【在 s********1 的大作中提到】
: 请教哪位高人:
: mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N
: " 有关吗?是O(LOG(N))的关系吗?
: 谢谢。

g***l
发帖数: 18555
3
哇,很专业啊
s********1
发帖数: 581
4
请教哪位高人:
mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N
" 有关吗?是O(LOG(N))的关系吗?
谢谢。
v*****r
发帖数: 1119
5
Assuming index is maintained properly,it is O(LOG(N))

N

【在 s********1 的大作中提到】
: 请教哪位高人:
: mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N
: " 有关吗?是O(LOG(N))的关系吗?
: 谢谢。

g***l
发帖数: 18555
6
哇,很专业啊
s*****o
发帖数: 303
7
MSSQL,ORACLE 的也有同样关系么?
y****9
发帖数: 144
8
for binary search a sorted list with lengh N it is O(log2(N)) . For index
access a key, it should not related to N in this way. Based on common sense,
it is related to index depth. index depth is related to N and how many
entries can a block holds. it probably is O(logb(N)) where b is the
number of entries in the block.
see: http://en.wikipedia.org/wiki/B-tree#Search

N

【在 s********1 的大作中提到】
: 请教哪位高人:
: mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N
: " 有关吗?是O(LOG(N))的关系吗?
: 谢谢。

v*****r
发帖数: 1119
9
Don't get confused.
Log2(N) is the best case of height value for a N node self-balancing b-tree.
Logb(N) is the best case of height value for a N node regular b-tree with
each node containing b number of elements.
For search/insert/delete efficiency of any b-tree in O notation, it is O(log
N). For algorithm with logarithm efficiency, logarithm base doesn't matter.

index
sense,

【在 y****9 的大作中提到】
: for binary search a sorted list with lengh N it is O(log2(N)) . For index
: access a key, it should not related to N in this way. Based on common sense,
: it is related to index depth. index depth is related to N and how many
: entries can a block holds. it probably is O(logb(N)) where b is the
: number of entries in the block.
: see: http://en.wikipedia.org/wiki/B-tree#Search
:
: N

y****9
发帖数: 144
10

tree.
log
YOu are probably right. this maybe the first time I need to think in big O
notation after school. Even in school I was not goot at algorithm anaysis.
If we put therotical algorithm complexity anaysis aside, from a practical
point of view,Is it safe to say the speed of select based on PK index is
almost same regardless of table lengh? I just checked the bigest table in
one of my Oracle db, it has 2 billion row. the consistent gets is 4.
So it should be index root block, branch block, leaf block and data block.
As long as we do 4 gets for a typical index, the speed should be same.
I know one can argue the differenes among 1,2,3,4 gets.But really
practically does it really matter when talking about the speed of select
based on index wrt table length ?

【在 v*****r 的大作中提到】
: Don't get confused.
: Log2(N) is the best case of height value for a N node self-balancing b-tree.
: Logb(N) is the best case of height value for a N node regular b-tree with
: each node containing b number of elements.
: For search/insert/delete efficiency of any b-tree in O notation, it is O(log
: N). For algorithm with logarithm efficiency, logarithm base doesn't matter.
:
: index
: sense,

v*****r
发帖数: 1119
11
It is not right to assume speed is the same for the same number of
consistent gets, which is determined by the height of index. You just
ignored the cpu time even though it is only a fraction of disk IO. (that is
why we never need to worry about big o in db world as we have other more
practical things to concern)
Also if we have to consider a real world case, there are usually more than
one algorithms involved. Like in your case, the first three consistent gets
are related to index lookup, which is O(logn) in theory. The last consistent
get is to getting table row using rowid obtained from index. It is using a
separate algorithm, which is O(1) if thinking about big O again.

.

【在 y****9 的大作中提到】
:
: tree.
: log
: YOu are probably right. this maybe the first time I need to think in big O
: notation after school. Even in school I was not goot at algorithm anaysis.
: If we put therotical algorithm complexity anaysis aside, from a practical
: point of view,Is it safe to say the speed of select based on PK index is
: almost same regardless of table lengh? I just checked the bigest table in
: one of my Oracle db, it has 2 billion row. the consistent gets is 4.
: So it should be index root block, branch block, leaf block and data block.

y****9
发帖数: 144
12
Original question:

mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N
" 有关吗?是O(LOG(N))的关系吗?

If we understand the original question as the relationship between the cost
of index access and the table length, my opinion is that they do not have
direct relashionship. Big-O notation is used to analyze algorithm's
efficiency and complexity, it seems not appropriate to be used to describe
such a relathiship
based on Jonathan Lewis's CBO book, the cost of a simple B-tree access is
cost =
blevel +
ceiling(leaf_blocks * effective index selectivity) +
ceiling(clustering_factor * effecitive table selectivity).
Imagining we have many books with different pages but same size, each book
has exactly one page refer to the word "DBA", we want to know the speed of
find this page for different books. In this case the speed really means the
cost - the cost can expressed by the total pages you turn.
1. Assuming every one's firt turn can find the exact the index page which
contains D - this is one turn
2. Assuming we need to turn to next page to get the "DBA" entry under D
section
3. Go to the book page - the final turn
Above cost is 3. As how faster your eye can locate the "D" "DBA" within a
page should be not relevant. No matter how thick or thin your book is, as
long as the number of pages turned are same, we should say we have the same
speed(cost) to find the page contains "DBA"
Return to the real case, we do one consiste get for a block, then the CPU
time to examin a block should be not in the equation of index access cost.
My arugument is just trying to clarify the concept for every one's benefit,
not necessarily limited to original question ( first for myself of cours). I
do agree that the algorimethic efficiency of index access is O(logN) - it
is good for me to take the chance to revisit the big-O notation.

is
gets
consistent
a

【在 v*****r 的大作中提到】
: It is not right to assume speed is the same for the same number of
: consistent gets, which is determined by the height of index. You just
: ignored the cpu time even though it is only a fraction of disk IO. (that is
: why we never need to worry about big o in db world as we have other more
: practical things to concern)
: Also if we have to consider a real world case, there are usually more than
: one algorithms involved. Like in your case, the first three consistent gets
: are related to index lookup, which is O(logn) in theory. The last consistent
: get is to getting table row using rowid obtained from index. It is using a
: separate algorithm, which is O(1) if thinking about big O again.

v*****r
发帖数: 1119
13
Use your example, we need to locate "DBA" word in a book, "DBA" is indexed
and occurs once in the book.
1. Assuming every one's first turn can find the exact the index page which
contains D - this is one turn
Comment: In big O, this turn's efficiency is O (log n).
2. Assuming we need to turn to next page to get the "DBA" entry under D
section
Comment: In big O, this turn's efficiency is O (n), unless you have a fancy
structure implemented here within each index directory (for "D").
3. Go to the book page - the final turn
Comment: In big O, this final turn's efficiency is O (1).
As you can see, there are three different algorithms being involved in your
example. What's the point of discussing big O here other than going through
basics of algorithms. Practically speaking there is no point. Why? One
reason is the physical device speed limitation, like logcial/physical IOs.
Those factors in Big O should be ignored. But in practice, how can you
ignore speed factors that accounts for 99% percent of time? That is why
Optimizer calculate Costs using index height/clustering factor/selectivity
etc and DBAs looks at real factors that impacts performance.
Even in programming world, other than algorithm analysis, I don't think
there are real practical usages of Big O on performance area. But
programmers needs to worry about Big O simply on interviews there are Big O
questions.

N
cost

【在 y****9 的大作中提到】
: Original question:
:
: mySQL中"select by primary/unique key的速度" 与 "number of rows in a table= N
: " 有关吗?是O(LOG(N))的关系吗?
:

: If we understand the original question as the relationship between the cost
: of index access and the table length, my opinion is that they do not have
: direct relashionship. Big-O notation is used to analyze algorithm's
: efficiency and complexity, it seems not appropriate to be used to describe
: such a relathiship

1 (共1页)
进入Database版参与讨论
相关主题
洛杉矶DBa 什么价位?MySQL DBA 的前途
CINAOUG/CINASSUG MySQL 2013讲座一个SUN公司certificate的deal兼问MySQL DBA certificate有用吗?
I am hiring Lead DBA (MySQL) (转载)How popular is Sybase these days?
带娃大妈读个州大database certificate如何? (转载)请问Facebook是用Mysql and php做的吗? (转载)
advices please on learning Oracle想考个oracle的证书,哪个好点?入门级别的.
新手求助: 学那个DATABASE软件好呢?印度老板有些什么不好?
今天,光荣的成为了一个DBA诚恳请求帮忙--DBA training
mysql湾曲哪个数据库好找工作啊?
相关话题的讨论汇总
话题: index话题: big话题: dba话题: log话题: turn