由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Database版 - 问个database design的问题
相关主题
SQL问题求救!!Error of SQL query on IBM netezza SQL database from Aginity (转载)
问一个excel的弱问题primary key?
请教一个SQL问题Special character in insert values
mySQL 问题 (转载)如何在oracle8i中得到视图和表的定义?
这个 query 为什么可以 update multiple rows请教一个问题
求助SQL高手,这个join怎么做比较好SQL问题
keep group of values of SQL procedure in one table纪录查找问题
generate unique integer ID from columns in SQL table (转载column in table point to another table
相关话题的讨论汇总
话题: stock话题: user话题: last话题: table话题: database
进入Database版参与讨论
1 (共1页)
F**e
发帖数: 593
1
用的是MySQL. 要用来记录很多user的timestamp based的数据。做query的时候都是针
对单独的一个user. 每个记录象这样:
user_i timestamp1 value1
user_j timestamp2 value2
user_i timestamp3 value3
user_i timestamp3 value4
有三种选择,第一种就是用一个table来放所有数据,象上面一样。这样的坏处就是每
次要scan很多不相关user的record.
选择二:每个user用一个单独的table, 当然相同的schema.
选择三:每个user用一个单独的database.
二,三的好处就是query会快一点。但挺怪的。DB的高手们指教一下。先谢了。
n****c
发帖数: 13
2
首先,你这数据库主要是用来干嘛的?估计得多大?是输入数据的时候多还是做报告的
时候多?server的configurations怎么样?
对于你描述的情况,比较常见的做法是,如果你是用来记录数据为主要目的,也就是所
谓的transactional database, 你首先要normalize, 也就是说,把user data放在一个
单独的table中,然后把加一个transactional table就够了。不管怎么样,你上面列的
3种方法都不专业,不可取。
举个简单的例子,你可以有两个table: Employee, Attandence
Employee (Emp_ID, Last_Name, First_Name, Emp_Status, Last_Updated)
Attandence(Emp_ID,Time_Stamp, Value, Date,...)
因为不了解你的具体要求,只能提这么点简单的建议。希望有点用。
F**e
发帖数: 593
3
非常感谢你的回复.
不专业是肯定的啦.呵呵.
举个例子吧:用DB来存股票价格历史. 每个记录包括股票代号,时间,价格:
SYMB1 timestamp1 price1
SYMB2 timestamp2 price2
SYMB3 timestamp3 price3
SYMB1 timestamp3 price4
...
imagine 1000 symbols, each symbol(stock) has 1million records.
做报告的时候比输入数据多.但每次输入tends to be in bulk volume.
这样,如果把所有记录放在一个table中,要严究一个股票(symbol),就要scan
1000million records.如果每个股票一个table, 那就只要scan 1million records.
谢谢谢谢

【在 n****c 的大作中提到】
: 首先,你这数据库主要是用来干嘛的?估计得多大?是输入数据的时候多还是做报告的
: 时候多?server的configurations怎么样?
: 对于你描述的情况,比较常见的做法是,如果你是用来记录数据为主要目的,也就是所
: 谓的transactional database, 你首先要normalize, 也就是说,把user data放在一个
: 单独的table中,然后把加一个transactional table就够了。不管怎么样,你上面列的
: 3种方法都不专业,不可取。
: 举个简单的例子,你可以有两个table: Employee, Attandence
: Employee (Emp_ID, Last_Name, First_Name, Emp_Status, Last_Updated)
: Attandence(Emp_ID,Time_Stamp, Value, Date,...)
: 因为不了解你的具体要求,只能提这么点简单的建议。希望有点用。

n****c
发帖数: 13
4
You still didn't answer my question in the first paragraph. So I assume that
your database is a transactional database and would make suggestions based
on this assumption.
Based on your additional info, you may want to add at least another table,
let's call it STOCK.
STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)
With another two tables shown below:
TRADER(TRADER_ID, LAST_NAME, FIRST_NAME, STATUS, LAST_UPDATED)
STOCK_TRADE(TRADER_ID, STOCK_ID, TRADE_TIME, TRADE_VOLUME, TRADE_PRICE, LAST
_
w*******e
发帖数: 1622
5
zan!说得挺清楚的

that
based
LAST

【在 n****c 的大作中提到】
: You still didn't answer my question in the first paragraph. So I assume that
: your database is a transactional database and would make suggestions based
: on this assumption.
: Based on your additional info, you may want to add at least another table,
: let's call it STOCK.
: STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)
: With another two tables shown below:
: TRADER(TRADER_ID, LAST_NAME, FIRST_NAME, STATUS, LAST_UPDATED)
: STOCK_TRADE(TRADER_ID, STOCK_ID, TRADE_TIME, TRADE_VOLUME, TRADE_PRICE, LAST
: _

F**e
发帖数: 593
6
Really appreciate it!
我没有糖,但你也不用用牙膏皮来换了,送你5个包子略表谢意.呵呵.
OK. back to the question.
Here, assume I'm doing research on stock price history. I have tick data for
1000 stocks, each with 1 million ticks, in separate text(.csv) files, one
file per stock.
I would like to import the data (csv files) into database and run queries on
the DB, and I only run query on one stock at a time. This is not really a
transactional database, if I understand what it is -- the import part
happens very infrequently, whil

【在 n****c 的大作中提到】
: You still didn't answer my question in the first paragraph. So I assume that
: your database is a transactional database and would make suggestions based
: on this assumption.
: Based on your additional info, you may want to add at least another table,
: let's call it STOCK.
: STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)
: With another two tables shown below:
: TRADER(TRADER_ID, LAST_NAME, FIRST_NAME, STATUS, LAST_UPDATED)
: STOCK_TRADE(TRADER_ID, STOCK_ID, TRADE_TIME, TRADE_VOLUME, TRADE_PRICE, LAST
: _

x***e
发帖数: 2449
7
one additional recommendation.
if you have 1 billion records for short period, you might have to do table
partitioning, depends on your DB.

that
based
LAST

【在 n****c 的大作中提到】
: You still didn't answer my question in the first paragraph. So I assume that
: your database is a transactional database and would make suggestions based
: on this assumption.
: Based on your additional info, you may want to add at least another table,
: let's call it STOCK.
: STOCK(STOCK_ID, STOCK_SYMB, STOCK_DESC, LAST_UPDATED)
: With another two tables shown below:
: TRADER(TRADER_ID, LAST_NAME, FIRST_NAME, STATUS, LAST_UPDATED)
: STOCK_TRADE(TRADER_ID, STOCK_ID, TRADE_TIME, TRADE_VOLUME, TRADE_PRICE, LAST
: _

F**e
发帖数: 593
8
thanks. I don't have 1 billion records, just several millions or even less.

【在 x***e 的大作中提到】
: one additional recommendation.
: if you have 1 billion records for short period, you might have to do table
: partitioning, depends on your DB.
:
: that
: based
: LAST

q**1
发帖数: 193
9

我看你应该先看看这些table columns有没有normalization的空间再决定
怎么分表,我估计你这情况用 star schema database model 最简单实用,
或者根本就一个表就ok。。
后两个一看就不行,如果你有很多用户,怎么解决scability的问题。。

【在 F**e 的大作中提到】
: 用的是MySQL. 要用来记录很多user的timestamp based的数据。做query的时候都是针
: 对单独的一个user. 每个记录象这样:
: user_i timestamp1 value1
: user_j timestamp2 value2
: user_i timestamp3 value3
: user_i timestamp3 value4
: 有三种选择,第一种就是用一个table来放所有数据,象上面一样。这样的坏处就是每
: 次要scan很多不相关user的record.
: 选择二:每个user用一个单独的table, 当然相同的schema.
: 选择三:每个user用一个单独的database.

f********t
发帖数: 74
10
How about define index on userid?
n****c
发帖数: 13
11
I believe table partition will be a good option for your data. Also, create
an index on stock_id AFTER table is partitioned.1000 stocks for a multi-
million row table is still considered low cardinality and an index will
improve the performance.

for
on

【在 F**e 的大作中提到】
: Really appreciate it!
: 我没有糖,但你也不用用牙膏皮来换了,送你5个包子略表谢意.呵呵.
: OK. back to the question.
: Here, assume I'm doing research on stock price history. I have tick data for
: 1000 stocks, each with 1 million ticks, in separate text(.csv) files, one
: file per stock.
: I would like to import the data (csv files) into database and run queries on
: the DB, and I only run query on one stock at a time. This is not really a
: transactional database, if I understand what it is -- the import part
: happens very infrequently, whil

1 (共1页)
进入Database版参与讨论
相关主题
column in table point to another table这个 query 为什么可以 update multiple rows
oracle中如何查询已建立的表结构求助SQL高手,这个join怎么做比较好
Help about a SQL statementkeep group of values of SQL procedure in one table
Oracle下如何能得到所有table的名字?generate unique integer ID from columns in SQL table (转载
SQL问题求救!!Error of SQL query on IBM netezza SQL database from Aginity (转载)
问一个excel的弱问题primary key?
请教一个SQL问题Special character in insert values
mySQL 问题 (转载)如何在oracle8i中得到视图和表的定义?
相关话题的讨论汇总
话题: stock话题: user话题: last话题: table话题: database