i********r 发帖数: 12113 | 1 SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有
D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。
我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建
primary key要快一些,还是慢。
有什么方法能快速建这个表?建Partitioned table有用么? | p********l 发帖数: 279 | 2 据我的经验,大多数情况下,SELECT INTO貌似比INSERT要快一点,还有就是预先分配
足够的空间给这个表所在的文件组。
【在 i********r 的大作中提到】 : SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有 : D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。 : 我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建 : primary key要快一些,还是慢。 : 有什么方法能快速建这个表?建Partitioned table有用么?
| y****9 发帖数: 144 | 3 1. Why 20 min is not good enough?If this is a one time job, I would put
more emphasis on correctness of implementation than speed
2. If speed is a concern, I am thinking of in SQL server, clustered index is
preferred to heap table. So PK is usually your clusterd index, so if you
can inset into the table with data ordered by (A, B, C), may be it can be
faster. i.e create the table structure with clustered index , then insert
with orderd records.
just my 2 cents.
【在 i********r 的大作中提到】 : SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有 : D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。 : 我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建 : primary key要快一些,还是慢。 : 有什么方法能快速建这个表?建Partitioned table有用么?
| y*****g 发帖数: 677 | 4 如果是你的数据文件已经分配好了,我就不知道还能如何提高速度,
除了先插数据,then build INDEX。
通常是很快了。 50 M 行在20分钟完成,不算赖! | s**********o 发帖数: 14359 | 5 SELECT INTO是最快的了,因为没有LOG,
如果速度还慢说明你的内存和IO有问题,考虑
上更多的内存和速度快的STORAGE | i********r 发帖数: 12113 | 6 SELECT INTO快多了,现在只要6,7分钟
【在 p********l 的大作中提到】 : 据我的经验,大多数情况下,SELECT INTO貌似比INSERT要快一点,还有就是预先分配 : 足够的空间给这个表所在的文件组。
| y****9 发帖数: 144 | 7 when you use 'select into', how do you populate null to the D, E, F etc
column?
when I use, for example, select ...,xxx as C, null as D into xxx from ...,
I find that D will have int data type. If I want D to be other data type,
how to do it?
【在 i********r 的大作中提到】 : SELECT INTO快多了,现在只要6,7分钟
| B*****g 发帖数: 34098 | 8 不能把表建好后再加column吗?
.,
,
【在 y****9 的大作中提到】 : when you use 'select into', how do you populate null to the D, E, F etc : column? : when I use, for example, select ...,xxx as C, null as D into xxx from ..., : I find that D will have int data type. If I want D to be other data type, : how to do it?
| y****9 发帖数: 144 | 9 I checked the BOL:
-- abount SELECT ... INTO clause
The amount of logging for SELECT...INTO depends on the recovery model in
effect for the database. Under the simple recovery model or bulk-logged
recovery model, bulk operations are minimally logged. With minimal logging,
using the SELECT… INTO statement can be more efficient than creating a
table and then populating the table with an INSERT statement.
---
But it seems that we can use minimal logging for isnert into select as well
as long as the following requirement is met:
--- from BOL http://technet.microsoft.com/en-us/library/ms174335(v=SQL.100).aspx
Using INSERT INTO…SELECT to Bulk Load Data with Minimal Logging
You can use INSERT INTO SELECT FROM
to efficiently transfer a large number of rows from one table, such as a
staging table, to another table with minimal logging. Minimal logging can
improve the performance of the statement and reduce the possibility of the
operation filling the available transaction log space during the transaction
.
Minimal logging for this statement has the following requirements:
The recovery model of the database is set to simple or bulk-logged.
The target table is an empty or nonempty heap.
The target table is not used in replication.
The TABLOCK hint is specified for the target table.
-------------------
So if we ensure minimal logging condition is met for both case, I don't
understand why SELECT ... INTO will be faster than INSERT INTO ... SELECT
【在 s**********o 的大作中提到】 : SELECT INTO是最快的了,因为没有LOG, : 如果速度还慢说明你的内存和IO有问题,考虑 : 上更多的内存和速度快的STORAGE
| y****9 发帖数: 144 | 10 valid option, but I am not sure to initailize 50 million row with NULL don't
take siginifcant time. If I remeber correctly, in Oracle it will take time.
I will test if get chance
【在 B*****g 的大作中提到】 : 不能把表建好后再加column吗? : : ., : ,
| | | y****9 发帖数: 144 | 11 I think I got wrong impression. Both in Oracle and SQL Server, when adding a
column to an existing table, if not specifying default value, it will be
NULL and it won't need to set the value row by row, so almost no time to
take.
So for OP, using 'select into' to create table with A, B, C column, then add
E, D, F is a valid option.
't
time.
【在 y****9 的大作中提到】 : valid option, but I am not sure to initailize 50 million row with NULL don't : take siginifcant time. If I remeber correctly, in Oracle it will take time. : I will test if get chance
| y****9 发帖数: 144 | 12 In terms of the transaction log generated, I did some test. My observation/
conclusion:
----
in terms of the amount of transaction log used, using "INSERT INTO ...
SELECT" with "TABLOCK" hint is essentially same as using "SELECT ... INTO"
regardless of recovery mode. Without the "TABLOCK" hint, regardless of the
recovery mode, using "INSERT INTO ... SELECT" will generate same amount of
transaction log and 4 times more than that in the case of FULL recovery mode
with the hint ( i.e. 125406 KB vs 32958 KB in my test case). People think "
INSERT INTO ... SELECT" is slow may be because they don't always remember to
add this hint when appropriate.
---
Detailed test see : http://oracle-study-notes.blogspot.com/2012/11/sql-server-insert-into-select-vs-select.html
,
well
【在 y****9 的大作中提到】 : I checked the BOL: : -- abount SELECT ... INTO clause : The amount of logging for SELECT...INTO depends on the recovery model in : effect for the database. Under the simple recovery model or bulk-logged : recovery model, bulk operations are minimally logged. With minimal logging, : using the SELECT… INTO statement can be more efficient than creating a : table and then populating the table with an INSERT statement. : --- : But it seems that we can use minimal logging for isnert into select as well : as long as the following requirement is met:
| i********r 发帖数: 12113 | 13 SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有
D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。
我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建
primary key要快一些,还是慢。
有什么方法能快速建这个表?建Partitioned table有用么? | p********l 发帖数: 279 | 14 据我的经验,大多数情况下,SELECT INTO貌似比INSERT要快一点,还有就是预先分配
足够的空间给这个表所在的文件组。
【在 i********r 的大作中提到】 : SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有 : D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。 : 我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建 : primary key要快一些,还是慢。 : 有什么方法能快速建这个表?建Partitioned table有用么?
| y****9 发帖数: 144 | 15 1. Why 20 min is not good enough?If this is a one time job, I would put
more emphasis on correctness of implementation than speed
2. If speed is a concern, I am thinking of in SQL server, clustered index is
preferred to heap table. So PK is usually your clusterd index, so if you
can inset into the table with data ordered by (A, B, C), may be it can be
faster. i.e create the table structure with clustered index , then insert
with orderd records.
just my 2 cents.
【在 i********r 的大作中提到】 : SQL Server一个表要存50 million记录,field (A,B,C)是primary key,表里还有 : D,E,F等field,初始化为NULL。A,B,C分别来自不同的三个表。 : 我用insert into...select创建这个表,很慢,要20分钟。如果用select into再建 : primary key要快一些,还是慢。 : 有什么方法能快速建这个表?建Partitioned table有用么?
| y*****g 发帖数: 677 | 16 如果是你的数据文件已经分配好了,我就不知道还能如何提高速度,
除了先插数据,then build INDEX。
通常是很快了。 50 M 行在20分钟完成,不算赖! | s**********o 发帖数: 14359 | 17 SELECT INTO是最快的了,因为没有LOG,
如果速度还慢说明你的内存和IO有问题,考虑
上更多的内存和速度快的STORAGE | i********r 发帖数: 12113 | 18 SELECT INTO快多了,现在只要6,7分钟
【在 p********l 的大作中提到】 : 据我的经验,大多数情况下,SELECT INTO貌似比INSERT要快一点,还有就是预先分配 : 足够的空间给这个表所在的文件组。
| y****9 发帖数: 144 | 19 when you use 'select into', how do you populate null to the D, E, F etc
column?
when I use, for example, select ...,xxx as C, null as D into xxx from ...,
I find that D will have int data type. If I want D to be other data type,
how to do it?
【在 i********r 的大作中提到】 : SELECT INTO快多了,现在只要6,7分钟
| B*****g 发帖数: 34098 | 20 不能把表建好后再加column吗?
.,
,
【在 y****9 的大作中提到】 : when you use 'select into', how do you populate null to the D, E, F etc : column? : when I use, for example, select ...,xxx as C, null as D into xxx from ..., : I find that D will have int data type. If I want D to be other data type, : how to do it?
| | | y****9 发帖数: 144 | 21 I checked the BOL:
-- abount SELECT ... INTO clause
The amount of logging for SELECT...INTO depends on the recovery model in
effect for the database. Under the simple recovery model or bulk-logged
recovery model, bulk operations are minimally logged. With minimal logging,
using the SELECT… INTO statement can be more efficient than creating a
table and then populating the table with an INSERT statement.
---
But it seems that we can use minimal logging for isnert into select as well
as long as the following requirement is met:
--- from BOL http://technet.microsoft.com/en-us/library/ms174335(v=SQL.100).aspx
Using INSERT INTO…SELECT to Bulk Load Data with Minimal Logging
You can use INSERT INTO SELECT FROM
to efficiently transfer a large number of rows from one table, such as a
staging table, to another table with minimal logging. Minimal logging can
improve the performance of the statement and reduce the possibility of the
operation filling the available transaction log space during the transaction
.
Minimal logging for this statement has the following requirements:
The recovery model of the database is set to simple or bulk-logged.
The target table is an empty or nonempty heap.
The target table is not used in replication.
The TABLOCK hint is specified for the target table.
-------------------
So if we ensure minimal logging condition is met for both case, I don't
understand why SELECT ... INTO will be faster than INSERT INTO ... SELECT
【在 s**********o 的大作中提到】 : SELECT INTO是最快的了,因为没有LOG, : 如果速度还慢说明你的内存和IO有问题,考虑 : 上更多的内存和速度快的STORAGE
| y****9 发帖数: 144 | 22 valid option, but I am not sure to initailize 50 million row with NULL don't
take siginifcant time. If I remeber correctly, in Oracle it will take time.
I will test if get chance
【在 B*****g 的大作中提到】 : 不能把表建好后再加column吗? : : ., : ,
| y****9 发帖数: 144 | 23 I think I got wrong impression. Both in Oracle and SQL Server, when adding a
column to an existing table, if not specifying default value, it will be
NULL and it won't need to set the value row by row, so almost no time to
take.
So for OP, using 'select into' to create table with A, B, C column, then add
E, D, F is a valid option.
't
time.
【在 y****9 的大作中提到】 : valid option, but I am not sure to initailize 50 million row with NULL don't : take siginifcant time. If I remeber correctly, in Oracle it will take time. : I will test if get chance
| y****9 发帖数: 144 | 24 In terms of the transaction log generated, I did some test. My observation/
conclusion:
----
in terms of the amount of transaction log used, using "INSERT INTO ...
SELECT" with "TABLOCK" hint is essentially same as using "SELECT ... INTO"
regardless of recovery mode. Without the "TABLOCK" hint, regardless of the
recovery mode, using "INSERT INTO ... SELECT" will generate same amount of
transaction log and 4 times more than that in the case of FULL recovery mode
with the hint ( i.e. 125406 KB vs 32958 KB in my test case). People think "
INSERT INTO ... SELECT" is slow may be because they don't always remember to
add this hint when appropriate.
---
Detailed test see : http://oracle-study-notes.blogspot.com/2012/11/sql-server-insert-into-select-vs-select.html
,
well
【在 y****9 的大作中提到】 : I checked the BOL: : -- abount SELECT ... INTO clause : The amount of logging for SELECT...INTO depends on the recovery model in : effect for the database. Under the simple recovery model or bulk-logged : recovery model, bulk operations are minimally logged. With minimal logging, : using the SELECT… INTO statement can be more efficient than creating a : table and then populating the table with an INSERT statement. : --- : But it seems that we can use minimal logging for isnert into select as well : as long as the following requirement is met:
| i********r 发帖数: 12113 | 25 select CAST(NULL AS yourtype) as D
.,
,
【在 y****9 的大作中提到】 : when you use 'select into', how do you populate null to the D, E, F etc : column? : when I use, for example, select ...,xxx as C, null as D into xxx from ..., : I find that D will have int data type. If I want D to be other data type, : how to do it?
|
|