i***r 发帖数: 1035 | 1 有2列随机数字(不重复),很大比如2个G,排序之后,我想找出所有共同的数字,比
如A开头可能是:
1
2
3
7
10
74
100
349
。。。
B是:
2
3
6
78
89
90
120
。。。
他们相同的部分是
2
3
所以我需要2边同时往下读,然后不停地track谁大。实际上我的data里面,B总是比A增
加的快,所以我写了2个for loop,对每一个A,看B里面有没有相同的,找到了就记下
位置,作为下次loop的起点。
我不是科班出身,觉得自己的算法很烂(目前发现work不错,没有问题),相信有优化
算法,故请教大家
谢谢!
附上我的code结构,注意不要笑岔气了:
ind3=1;
for ind1=1:length(A)
for ind2=ind3:length(B)
if (A==B)
blahblah
end
ind3=ind2;
end
end |
|
a***r 发帖数: 420 | 2 比如有一个dataset
group($) id($)
1 IND1
1 IND2
2 IND3
2 IND4
3 IND5
3 IND6
想把它变成:
group($) id_1($) id_2($)
1 IND1 IND2
2 IND3 IND4
3 IND5 IND6
应该怎么做?
sorry这个问题版上肯定有人问过,但是时间比较紧来不及考古了
非常感谢!! |
|
m*****n 发帖数: 3575 | 3 Many "modern" medicine believers just think that you don't have something,
you will never have it. This methodology is assembling a car. But think
about yourself, are you assembled or rather evolved?
What was you when you were an infant? You had little, yet you developed
everything. And this is Natural.
On specific prescription. No one here is educated doctor. But that doesn't
mean there is no treatment.
I just google one site for you
http://www.jiazhuangxian.com.cn/jinghua/ind3.htm |
|
h********o 发帖数: 103 | 4 Try this:
============================
DATA TEST;
INPUT GROUP $ ID $;
CARDS;
1 IND1
1 IND2
2 IND3
2 IND4
3 IND5
3 IND6
;
PROC SORT DATA = ONE;
BY GROUP;
RUN;
DATA ONE(RENAME = (ID = ID1))
TWO(RENAME = (ID = ID2)) ;
SET TEST;
BY GROUP;
IF FIRST.GROUP THEN OUTPUT ONE;
IF LAST.GROUP THEN OUTPUT TWO;
RUN;
DATA FINAL;
MERGE ONE TWO;
BY GROUP;
RUN; |
|
c*****1 发帖数: 131 | 5 I think oloolo give the best answer for Q2.
However, in actual business, this can happen from time to time. We can not consult back with business and data side always. There should be a way to handle this kind of situation.
Add my 2 cents:
Check the distribution of variable x in data1 and data2. If frequency of x in ('TX' and 'NV') are small, and there are other predictors in the model besides x, you can still fit the model if you create indicators like this (it will take 'TX' and 'NV' into acco... 阅读全帖 |
|