由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Statistics版 - 请问:query about checking consistency (转载)
相关主题
学习R的过程还是挺痛苦的!请教一下R高手们
关于rank sum test(Mann-Whitney test)Clustering analysis with categorical variables
Please help me with SAS! Thank you![合集] k-mean clustering
昨天考了SAS ADVANCED,不难请教一个R:K-means的问题
请问如何把SAS dataset转到R里?请教一个频率优化问题(相关性?)
求问一道SAS adv 题这种情况应该用什么hypothesis test。
如何把access query转成sas dataset?从大data 产生多个小data 的方法
请教一个数据分类的问题问个弱弱的sas问题
相关话题的讨论汇总
话题: date话题: prev话题: attr话题: query
进入Statistics版参与讨论
1 (共1页)
z**k
发帖数: 378
1
【 以下文字转载自 Database 讨论区 】
发信人: zerk (阿呆), 信区: Database
标 题: 请问:query about checking consistency
发信站: BBS 未名空间站 (Fri Jul 16 13:10:46 2010, 美东)
sorry i cannot type chinese at work.
I have a collection of records, ordered by date, the record has two other
attributes, say attr and prev, here's a simple example:
date, attr, prev
z**k
发帖数: 378
2
btw, i don't wanna use the any sas-only sql function
D******n
发帖数: 2836
3
at least u should give a sample desired output, dude.
are u to just check 1 obs ahead? why did u mention date? i don't see
anything to do with the problem unless the date is not sorted.
z**k
发帖数: 378
4
well, let me make it concrete. (and u r right, the date attr is redundent, it's just part of the dataset)
I took record of stock price every trading day, for each tuple in the db, I
have 3 attributes:
date, previous_day_close_price, close_price
and I want to check if the system is consistent, if I see something like:
... ...
2010-07-14, 5.17, 5.21
2010-07-15, 5.21, *5.14
2010-07-16, *5.10, 5.06
... ...
(* - inconsist prices)
I know something's wrong here, since 2010-07-15's previous close pr

【在 D******n 的大作中提到】
: at least u should give a sample desired output, dude.
: are u to just check 1 obs ahead? why did u mention date? i don't see
: anything to do with the problem unless the date is not sorted.

D******n
发帖数: 2836
5
not a sql guru, i can only see this can be quite easily done in sas.
A*******s
发帖数: 3942
6
you can do it in data step with retain statement or lag function.
With proc sql, i think you need subqueries. Not sure whether the code work
but you can give it a shot.
SELECT t1.date
FROM table as t1, (select *, date-1 as pre_date from table) as t2
WHERE t1.prevclose <> t2.close
AND t1.date = t2.pre_date;

it's just part of the dataset)
I

【在 z**k 的大作中提到】
: well, let me make it concrete. (and u r right, the date attr is redundent, it's just part of the dataset)
: I took record of stock price every trading day, for each tuple in the db, I
: have 3 attributes:
: date, previous_day_close_price, close_price
: and I want to check if the system is consistent, if I see something like:
: ... ...
: 2010-07-14, 5.17, 5.21
: 2010-07-15, 5.21, *5.14
: 2010-07-16, *5.10, 5.06
: ... ...

z**k
发帖数: 378
7
my point is the trading day is not consecutive, only the row index
matters here, for example, 2010-07-19 is a Monday, so the previous
trading day is the previous Friday, in the data set, i only have:
--Date-- + -Prev- + -close-
2010-07-16, xxxxx, 35.17 # Friday
2010-07-19, 35.17, yyyyy # Monday
all I wanna do is create a table that shift original table by one, you
can view the Date column here as just a string or row name, not
something appears in the WHERE clause

work

【在 A*******s 的大作中提到】
: you can do it in data step with retain statement or lag function.
: With proc sql, i think you need subqueries. Not sure whether the code work
: but you can give it a shot.
: SELECT t1.date
: FROM table as t1, (select *, date-1 as pre_date from table) as t2
: WHERE t1.prevclose <> t2.close
: AND t1.date = t2.pre_date;
:
: it's just part of the dataset)
: I

z**k
发帖数: 378
l*********s
发帖数: 5409
9
although daily stock price dataset is pretty small, use table join to find
the last record is highly inefficient.

【在 z**k 的大作中提到】
: ok, Problem solved.
: http://www.mitbbs.com/article_t0/Database/31146665.html

z**k
发帖数: 378
10
I hv hash index on ticker and date, as long as the data fits in ram, the
query could be done in linear time. I guess for large data set the
clustered tree index could greatly reduce disk io.
what's ur suggestion?

find

【在 l*********s 的大作中提到】
: although daily stock price dataset is pretty small, use table join to find
: the last record is highly inefficient.

l*********s
发帖数: 5409
11
I think you shall follow Acturies's advice, use lag function in SAS or last
function in SQL. A full join is not good, as most entries are going to be
waste and the excessive "if" conditioning.

【在 z**k 的大作中提到】
: I hv hash index on ticker and date, as long as the data fits in ram, the
: query could be done in linear time. I guess for large data set the
: clustered tree index could greatly reduce disk io.
: what's ur suggestion?
:
: find

z**k
发帖数: 378
12
... well, i m not suppose to use any sas function.

last
be

【在 l*********s 的大作中提到】
: I think you shall follow Acturies's advice, use lag function in SAS or last
: function in SQL. A full join is not good, as most entries are going to be
: waste and the excessive "if" conditioning.

1 (共1页)
进入Statistics版参与讨论
相关主题
问个弱弱的sas问题请问如何把SAS dataset转到R里?
发包子求大牛解SAS问题,急求问一道SAS adv 题
SAS Question如何把access query转成sas dataset?
用什么可以画这个clustering 图? R?请教一个数据分类的问题
学习R的过程还是挺痛苦的!请教一下R高手们
关于rank sum test(Mann-Whitney test)Clustering analysis with categorical variables
Please help me with SAS! Thank you![合集] k-mean clustering
昨天考了SAS ADVANCED,不难请教一个R:K-means的问题
相关话题的讨论汇总
话题: date话题: prev话题: attr话题: query