怎么初始化大表？ - Database版

本页内容为未名空间相应帖子的节选和存档，一周内的贴子最多显示50字，超过一周显示500字访问原贴

Database版 - 怎么初始化大表？

相关主题
● [转载] Can anyone interpret this simple SQL?	● 请教高手，包子谢
● How to get other columns after UNION?	● 也問 Common Table Expression 问题
● correlated subquery	● urgent help! insert value into table
● About INSERT IGNORE	● A sql question
● 问一个SQL Server的问题	● MS T-SQL 问题
● sql的2个问题 (转载)	● How to replace 0 with empty?
● sql server 面试题（6）	● Import 50GB data from multiple .txt file into MS SQL database
● question about MS SQL mirroring...	● 请问如何实现这样一个mysql的query, 谢谢

相关话题的讨论汇总
话题: select话题: table话题: insert话题: logging话题: minimal

进入Database版参与讨论

(共1页)

i********r
发帖数: 12113

SQL Server一个表要存50 million记录，field （A，B，C）是primary key，表里还有
D，E，F等field，初始化为NULL。A，B，C分别来自不同的三个表。
我用insert into...select创建这个表，很慢，要20分钟。如果用select into再建
primary key要快一些，还是慢。
有什么方法能快速建这个表？建Partitioned table有用么？

p********l
发帖数: 279

据我的经验，大多数情况下，SELECT INTO貌似比INSERT要快一点，还有就是预先分配
足够的空间给这个表所在的文件组。

【在 i********r 的大作中提到】

: SQL Server一个表要存50 million记录，field （A，B，C）是primary key，表里还有
: D，E，F等field，初始化为NULL。A，B，C分别来自不同的三个表。
: 我用insert into...select创建这个表，很慢，要20分钟。如果用select into再建
: primary key要快一些，还是慢。
: 有什么方法能快速建这个表？建Partitioned table有用么？

y****9
发帖数: 144

1. Why 20 min is not good enough？If this is a one time job, I would put
more emphasis on correctness of implementation than speed
2. If speed is a concern, I am thinking of in SQL server, clustered index is
preferred to heap table. So PK is usually your clusterd index, so if you
can inset into the table with data ordered by (A, B, C), may be it can be
faster. i.e create the table structure with clustered index , then insert
with orderd records.
just my 2 cents.

【在 i********r 的大作中提到】

y*****g
发帖数: 677

如果是你的数据文件已经分配好了，我就不知道还能如何提高速度，
除了先插数据，then build INDEX。
通常是很快了。　50 M　行在20分钟完成，不算赖！

s**********o
发帖数: 14359

SELECT INTO是最快的了，因为没有LOG，
如果速度还慢说明你的内存和IO有问题，考虑
上更多的内存和速度快的STORAGE

i********r
发帖数: 12113

SELECT INTO快多了，现在只要6，7分钟

【在 p********l 的大作中提到】

: 据我的经验，大多数情况下，SELECT INTO貌似比INSERT要快一点，还有就是预先分配
: 足够的空间给这个表所在的文件组。

y****9
发帖数: 144

when you use 'select into', how do you populate null to the D, E, F etc
column?
when I use, for example, select ...,xxx as C, null as D into xxx from ...,
I find that D will have int data type. If I want D to be other data type,
how to do it?

【在 i********r 的大作中提到】

: SELECT INTO快多了，现在只要6，7分钟

B*****g
发帖数: 34098

不能把表建好后再加column吗？

.,
,

【在 y****9 的大作中提到】

: when you use 'select into', how do you populate null to the D, E, F etc
: column?
: when I use, for example, select ...,xxx as C, null as D into xxx from ...,
: I find that D will have int data type. If I want D to be other data type,
: how to do it?

y****9
发帖数: 144

I checked the BOL:
-- abount SELECT ... INTO clause
The amount of logging for SELECT...INTO depends on the recovery model in
effect for the database. Under the simple recovery model or bulk-logged
recovery model, bulk operations are minimally logged. With minimal logging,
using the SELECT… INTO statement can be more efficient than creating a
table and then populating the table with an INSERT statement.
---
But it seems that we can use minimal logging for isnert into select as well
as long as the following requirement is met:
--- from BOL http://technet.microsoft.com/en-us/library/ms174335(v=SQL.100).aspx
Using INSERT INTO…SELECT to Bulk Load Data with Minimal Logging
You can use INSERT INTO SELECT FROM
to efficiently transfer a large number of rows from one table, such as a
staging table, to another table with minimal logging. Minimal logging can
improve the performance of the statement and reduce the possibility of the
operation filling the available transaction log space during the transaction
.
Minimal logging for this statement has the following requirements:
The recovery model of the database is set to simple or bulk-logged.
The target table is an empty or nonempty heap.
The target table is not used in replication.
The TABLOCK hint is specified for the target table.
-------------------
So if we ensure minimal logging condition is met for both case, I don't
understand why SELECT ... INTO will be faster than INSERT INTO ... SELECT

【在 s**********o 的大作中提到】

: SELECT INTO是最快的了，因为没有LOG，
: 如果速度还慢说明你的内存和IO有问题，考虑
: 上更多的内存和速度快的STORAGE

y****9
发帖数: 144

valid option, but I am not sure to initailize 50 million row with NULL don't
take siginifcant time. If I remeber correctly, in Oracle it will take time.
I will test if get chance

【在 B*****g 的大作中提到】

: 不能把表建好后再加column吗？
:
: .,
: ,

相关主题
● sql的2个问题 (转载)	● 请教高手，包子谢
● sql server 面试题（6）	● 也問 Common Table Expression 问题
● question about MS SQL mirroring...	● urgent help! insert value into table
进入Database版参与讨论

y****9
发帖数: 144

I think I got wrong impression. Both in Oracle and SQL Server, when adding a
column to an existing table, if not specifying default value, it will be
NULL and it won't need to set the value row by row, so almost no time to
take.
So for OP, using 'select into' to create table with A, B, C column, then add
E, D, F is a valid option.

't
time.

【在 y****9 的大作中提到】

: valid option, but I am not sure to initailize 50 million row with NULL don't
: take siginifcant time. If I remeber correctly, in Oracle it will take time.
: I will test if get chance

y****9
发帖数: 144

In terms of the transaction log generated, I did some test. My observation/
conclusion:
----
in terms of the amount of transaction log used, using "INSERT INTO ...
SELECT" with "TABLOCK" hint is essentially same as using "SELECT ... INTO"
regardless of recovery mode. Without the "TABLOCK" hint, regardless of the
recovery mode, using "INSERT INTO ... SELECT" will generate same amount of
transaction log and 4 times more than that in the case of FULL recovery mode
with the hint ( i.e. 125406 KB vs 32958 KB in my test case). People think "
INSERT INTO ... SELECT" is slow may be because they don't always remember to
add this hint when appropriate.
---
Detailed test see : http://oracle-study-notes.blogspot.com/2012/11/sql-server-insert-into-select-vs-select.html

,
well

【在 y****9 的大作中提到】

: I checked the BOL:
: -- abount SELECT ... INTO clause
: The amount of logging for SELECT...INTO depends on the recovery model in
: effect for the database. Under the simple recovery model or bulk-logged
: recovery model, bulk operations are minimally logged. With minimal logging,
: using the SELECT… INTO statement can be more efficient than creating a
: table and then populating the table with an INSERT statement.
: ---
: But it seems that we can use minimal logging for isnert into select as well
: as long as the following requirement is met:

i********r
发帖数: 12113

p********l
发帖数: 279

据我的经验，大多数情况下，SELECT INTO貌似比INSERT要快一点，还有就是预先分配
足够的空间给这个表所在的文件组。

【在 i********r 的大作中提到】

y****9
发帖数: 144

y*****g
发帖数: 677

如果是你的数据文件已经分配好了，我就不知道还能如何提高速度，
除了先插数据，then build INDEX。
通常是很快了。　50 M　行在20分钟完成，不算赖！

s**********o
发帖数: 14359

SELECT INTO是最快的了，因为没有LOG，
如果速度还慢说明你的内存和IO有问题，考虑
上更多的内存和速度快的STORAGE

i********r
发帖数: 12113

SELECT INTO快多了，现在只要6，7分钟

【在 p********l 的大作中提到】

: 据我的经验，大多数情况下，SELECT INTO貌似比INSERT要快一点，还有就是预先分配
: 足够的空间给这个表所在的文件组。

y****9
发帖数: 144

: SELECT INTO快多了，现在只要6，7分钟

B*****g
发帖数: 34098

不能把表建好后再加column吗？

.,
,

【在 y****9 的大作中提到】

相关主题
● A sql question	● Import 50GB data from multiple .txt file into MS SQL database
● MS T-SQL 问题	● 请问如何实现这样一个mysql的query, 谢谢
● How to replace 0 with empty?	● user_objects 与CAT view
进入Database版参与讨论

y****9
发帖数: 144

: SELECT INTO是最快的了，因为没有LOG，
: 如果速度还慢说明你的内存和IO有问题，考虑
: 上更多的内存和速度快的STORAGE

y****9
发帖数: 144

: 不能把表建好后再加column吗？
:
: .,
: ,

y****9
发帖数: 144

: valid option, but I am not sure to initailize 50 million row with NULL don't
: take siginifcant time. If I remeber correctly, in Oracle it will take time.
: I will test if get chance

y****9
发帖数: 144

i********r
发帖数: 12113

select CAST(NULL AS yourtype) as D

.,
,

【在 y****9 的大作中提到】

(共1页)

进入Database版参与讨论

相关主题
● 请问如何实现这样一个mysql的query, 谢谢	● 问一个SQL Server的问题
● user_objects 与CAT view	● sql的2个问题 (转载)
● Need help on a strange SQL server problem	● sql server 面试题（6）
● 请教Load table的问题	● question about MS SQL mirroring...
● [转载] Can anyone interpret this simple SQL?	● 请教高手，包子谢
● How to get other columns after UNION?	● 也問 Common Table Expression 问题
● correlated subquery	● urgent help! insert value into table
● About INSERT IGNORE	● A sql question

相关话题的讨论汇总
话题: select话题: table话题: insert话题: logging话题: minimal

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

boards

未名新帖统计// 7月16日

历史上的今天