y****9 发帖数: 144 | 1 I have read "As a rule of thumb, every table should have a clustered index"
(see http://www.sql-server-performance.com/2007/clustered-indexes/ )
Isn't this a really general rule? Do your production database tables follow
this?
If I understand correctly, this feature corresponds to "Index organized
table" (IOT) in Oracle, generally IOT could be good for look up talbe in
Oracle.
But I have never seen similiar recommendation in Oracle. Interested in
knowing why or whether it is really good to have clustered index in every
table
Denis
Oracle DBA | c*****d 发帖数: 6045 | 2 不要被名字迷惑了
oracle cluster-index table和ms sql server中的cluster index不是一个概念 | y****9 发帖数: 144 | 3
As far as I know, there are following types of tables in Oracle
- Heap organized tables ( default one, > 99% tables are this type in oracle
applications, in fact I did not see any of my production databases use any
other type of tables yet except for temporary tables - I use it, developers
have no idea about temporary tables in my work place)
- Index organized tables (IOT)
- Index clustered tables
- Hash clustered tables
- Sorted hash clustered tables
- Nested tables
- temporary tables
- Object tables
- External tables
I am pretty sure that if a SQL server table has a clustered index, such a
table corresponds to IOT in Oracle. I am not sure what do you mean by "
oracle cluster-index table". Forget who says it wrt Oracle application- it
says if your application only use Heap organized table, it means you have a
poor design. But my doubts is in SQL sever if it is recommended every table
should have a clustered index, this sounds too much to me from Oracle
perspective.
【在 c*****d 的大作中提到】 : 不要被名字迷惑了 : oracle cluster-index table和ms sql server中的cluster index不是一个概念
| v*****r 发帖数: 1119 | 4 You understanding is right, Oracle IOT serves the same purpose as clustered
index in SQLServer (only difference is you can view IOT is PK only clusted
index). Whatever advantage and disadvantage to use IOT applies to clustered
index in SQLServer, so obviously, it is NOT true "As a rule of thumb, every
table should have a clustered index".
"
follow
【在 y****9 的大作中提到】 : I have read "As a rule of thumb, every table should have a clustered index" : (see http://www.sql-server-performance.com/2007/clustered-indexes/ ) : Isn't this a really general rule? Do your production database tables follow : this? : If I understand correctly, this feature corresponds to "Index organized : table" (IOT) in Oracle, generally IOT could be good for look up talbe in : Oracle. : But I have never seen similiar recommendation in Oracle. Interested in : knowing why or whether it is really good to have clustered index in every : table
| y****9 发帖数: 144 | 5 If I have a one-million row table without any index, let's say it has a col
called SSN and its values are unique. now i create a clustered index on SSN,
what will happen to the existing table rows?
For example, let's say each page can only hold two rows, at the begining,
rows with ssn=1 and ssn=4 are in page no 1; rows with ssn=2 and ssn=3 are in
page no 2. now i create a clustered index on ssn col, at the end will
page no1 contains ssn=1,2 and page no 2 contains ssn= 3 and 4 ? | y****9 发帖数: 144 | 6 @vbitter,
Further reading ...
In SQL server, if a table does not contain a clustered index, when we create
a primary key constraint, SQL server will use the primary key column for
the clustered index key.
So I think we all agree, in most cases, a table should have a primary key,
no matter in oracle or in sql server.
I checked the AdventureWorks sample db in sql server, I saw > 99% table has
clustered index and most is also PK.
So I guess the rule of thumb is likely true in sql server.
clustered
clustered
every
【在 v*****r 的大作中提到】 : You understanding is right, Oracle IOT serves the same purpose as clustered : index in SQLServer (only difference is you can view IOT is PK only clusted : index). Whatever advantage and disadvantage to use IOT applies to clustered : index in SQLServer, so obviously, it is NOT true "As a rule of thumb, every : table should have a clustered index". : : " : follow
| v*****r 发帖数: 1119 | 7 PK column is not necessarily the most frequently used query criteria when
querying a table, so I guess it is common to use non-pk index as the
clustered index in sqlserver.
Not sure why clustered index is heavily used in SQLServer, it might be bad
application/table design, less power features in SQLServer, or maybe both.
In Oracle, you have more flexible way to group row data logically together (
Partition) instead of having to use physical method like IOT, which is un-
relational in nature and only applied to certain scenarios.
create
has
【在 y****9 的大作中提到】 : @vbitter, : Further reading ... : In SQL server, if a table does not contain a clustered index, when we create : a primary key constraint, SQL server will use the primary key column for : the clustered index key. : So I think we all agree, in most cases, a table should have a primary key, : no matter in oracle or in sql server. : I checked the AdventureWorks sample db in sql server, I saw > 99% table has : clustered index and most is also PK. : So I guess the rule of thumb is likely true in sql server.
| y****9 发帖数: 144 | 8 Just find an insteresting blog that seems addressing exactly same doubts:
"So how is it you can have good perf in Oracle, w/o IOTs, but in SQL Server,
everyone says need CIs for good perf?"
http://blog.sqlpositive.com/2011/06/diff-oracle-sql-server-inde
Not sure if i have time to study it today, but sure I will try to read it
thoroughly. If anyone insterested, please read and give your comments.
Denis | v*****r 发帖数: 1119 | 9 Very good blog, but the blogger didn't actually answer the question he
raised:
"So how is it you can have good perf in Oracle, w/o IOTs, but in SQL Server,
everyone says need CIs for good perf?"
My answer is Oracle simply has more options to achieve the same that
SQLServer clustered-index could achieve.
Let's first think about what kind of table will benefit most from using
clustered-index/IOT. One typical example is table storing time series data
will benefit a lot. Using clustered-index/IOT on a table with time series
data, you can retrieve rows very fast by doing a range-scan on the indexed
columns, since rows are stored in the same order as index, there is no need
to sort and in the case of IOT, rows are retrieved at the same time while
reading the indexes.
But it will make inserts/updates more expensive. Even worse, you lost the
query flexibility, enforcing physical row order to table usually will only
favor the query using clustered-index column as the criteria at the expense
of slower performance when query criteria are change to other columns.
Oracle provides IOT when you are sure your table/application design will not
suffer too much by sacrificing the relational flexibility. But Oracle also
has one feature that provides performance benefit on querying time series
data without suffering relational flexibility, that is Partition, which
basically a logical grouping of data based on data value instead of
physically grouping data together based on one index ordering.
Server,
【在 y****9 的大作中提到】 : Just find an insteresting blog that seems addressing exactly same doubts: : "So how is it you can have good perf in Oracle, w/o IOTs, but in SQL Server, : everyone says need CIs for good perf?" : http://blog.sqlpositive.com/2011/06/diff-oracle-sql-server-inde : Not sure if i have time to study it today, but sure I will try to read it : thoroughly. If anyone insterested, please read and give your comments. : Denis
| y****9 发帖数: 144 | 10
col
SSN,
in
Answer to myself: YES.
http://oracle-study-notes.blogspot.com/2011/09/sql-server-test-
It can imagine if we create a CI on a large table, we need free space and
temp db maybe to do the sorting.
【在 y****9 的大作中提到】 : If I have a one-million row table without any index, let's say it has a col : called SSN and its values are unique. now i create a clustered index on SSN, : what will happen to the existing table rows? : For example, let's say each page can only hold two rows, at the begining, : rows with ssn=1 and ssn=4 are in page no 1; rows with ssn=2 and ssn=3 are in : page no 2. now i create a clustered index on ssn col, at the end will : page no1 contains ssn=1,2 and page no 2 contains ssn= 3 and 4 ?
| | | w*******e 发帖数: 1622 | 11 对Oracle不熟, 就说说SQL Server吧.
1) 每个table最好有个CI, 是从两个主要方面来讲的:
a) Performance, b) storage (虽然storage现在不值俩钱, 但是我前段时间见到个
建了CI, storage减少了50%的--table level...)
2) PK 不一定就是CI. (但CI最好建在PK上)
至于怎么选择建CI, 咱们可以开另一个话题了. | y****9 发帖数: 144 | 12 Jonathan Lewis gives some good explainations why SQL Server prefers CI while
Oracle does not have to adopt it so widely.
http://www.simple-talk.com/sql/learn-sql-server/oracle-to-sql-s
Although I knew this series artical by JL for long time and have kept them
in my to read list, I only read the part 3 thoroughly till today. Thought it
is interesting for Oracle guy to know this difference.
Based on JL's test case, I fail to understand why CI will reduce storage as
compared to heap table, he shows the opposite.
【在 w*******e 的大作中提到】 : 对Oracle不熟, 就说说SQL Server吧. : 1) 每个table最好有个CI, 是从两个主要方面来讲的: : a) Performance, b) storage (虽然storage现在不值俩钱, 但是我前段时间见到个 : 建了CI, storage减少了50%的--table level...) : 2) PK 不一定就是CI. (但CI最好建在PK上) : 至于怎么选择建CI, 咱们可以开另一个话题了.
| v*****r 发帖数: 1119 | 13 如果 SQLServer 预设一个proper initial fill factor (like 70% or 80%) , I
doubt it is going to make any difference.
像 abusing CI 这种明显 violate relational theory 的东东,能成为 SQLServer 的
well accepted best practice, there is something seriously wrong with the
implementation of heap table in SQLServer.
while
it
as
【在 y****9 的大作中提到】 : Jonathan Lewis gives some good explainations why SQL Server prefers CI while : Oracle does not have to adopt it so widely. : http://www.simple-talk.com/sql/learn-sql-server/oracle-to-sql-s : Although I knew this series artical by JL for long time and have kept them : in my to read list, I only read the part 3 thoroughly till today. Thought it : is interesting for Oracle guy to know this difference. : Based on JL's test case, I fail to understand why CI will reduce storage as : compared to heap table, he shows the opposite.
| y****9 发帖数: 144 | 14 Can you elaborate a little bit about why CI violate relational theory?
CI seems dealing with how rows are organized in a table, while relational
theory seems dealing with how columns (attributes) of a table should be
releated. This is my understanding, could be wrong. Not good at theory and
never did database design :-(
【在 v*****r 的大作中提到】 : 如果 SQLServer 预设一个proper initial fill factor (like 70% or 80%) , I : doubt it is going to make any difference. : 像 abusing CI 这种明显 violate relational theory 的东东,能成为 SQLServer 的 : well accepted best practice, there is something seriously wrong with the : implementation of heap table in SQLServer. : : while : it : as
| v*****r 发帖数: 1119 | 15 Here are my understanding for discussion ;-)
Being relational basically means manipulating data in sets. A set is
orderless, which is typically implemented as heap table in rdbms.
In relational theory, the moment you enforcing ordering to data sets, you
are out of relational domain.
Oracle IOT or SQLServer CI are all non-relational features in rdbms to deal
with performance for specific scenarios. Given a well implemented heap table
, non-relational features should only applies to specific scenario instead
of being a general best practice for all scenarios, which is the case in
Oracle.
When a non-relational feature like CI is recommended for every table in
SQLServer, there is something seriously wrong with its heap table
implementation. Jonathan's study case shows it could be due to SQLServer's
default 100% page/block filling. I doubt it is the only reason, otherwise it
would be easy to address instead of still promoting this BS best practice.
【在 y****9 的大作中提到】 : Can you elaborate a little bit about why CI violate relational theory? : CI seems dealing with how rows are organized in a table, while relational : theory seems dealing with how columns (attributes) of a table should be : releated. This is my understanding, could be wrong. Not good at theory and : never did database design :-(
| y****9 发帖数: 144 | 16 I got your point. But my argument is as follows:
Relational theory does not require the enforcement of order in a set;
however, it does not mean the moment I enforce the order, I will lose all
the benefits/feature of relational data model. I still believe relational
theory or data model is about the relationship among attributes of an entity
. Being CI or IOT won't change the relationship of attributes, thus they are
still .... So I am OK with CI in SQL server, it is just a design choice,
maybe not smart, but OK.
Just my 2 cents.
deal
table
【在 v*****r 的大作中提到】 : Here are my understanding for discussion ;-) : Being relational basically means manipulating data in sets. A set is : orderless, which is typically implemented as heap table in rdbms. : In relational theory, the moment you enforcing ordering to data sets, you : are out of relational domain. : Oracle IOT or SQLServer CI are all non-relational features in rdbms to deal : with performance for specific scenarios. Given a well implemented heap table : , non-relational features should only applies to specific scenario instead : of being a general best practice for all scenarios, which is the case in : Oracle.
| v*****r 发帖数: 1119 | 17 The moment you enforce order, you lose the thing called "relation" and the
flexibility of being relational. One benefit of such flexibility is the
ability for SQL engine optimizer to transform/rewrite your query based on
the grows/updates of your tables especially in complicated SQL query to try
to adapt the plan to achieve best performance. Those transform/rewrite are
based on relational mathematical model which deals with relation (orderless
datasets).
Same in SQL, the moment the order by is applied, the query is out of
relational layer of rdbms engine (where the relational math works), but in
SQL, if you find the order by is applied in wrong place (like too earlier in
query), it is easy to re-write it.
But when you make IOT/CI a best practice in the design of all your tables,
how are you going to address it without tearing down the table structure
when you realize the data access path changes as time goes on.
Besides, IOT/CI implies high cost for update/delete by nature. The question
is why SQLServer still recommends CI for all tables even it implies high
cost in many cases and suffer losing the relational nature of dataset. The
only explanation is their heap table implementation is so bad in performance.
I guess that is why Jonathan is trying to looking into this in his study but
he is polite enough to not saying abusing CI in any rdbms database is BS,
which is obviously is in my opinion.
entity
are
【在 y****9 的大作中提到】 : I got your point. But my argument is as follows: : Relational theory does not require the enforcement of order in a set; : however, it does not mean the moment I enforce the order, I will lose all : the benefits/feature of relational data model. I still believe relational : theory or data model is about the relationship among attributes of an entity : . Being CI or IOT won't change the relationship of attributes, thus they are : still .... So I am OK with CI in SQL server, it is just a design choice, : maybe not smart, but OK. : Just my 2 cents. :
|
|