关于rcpp的讨论汇总 - 话题女王

e*********6
发帖数: 3453

最近帮朋友看一些Rcpp上的问题，实在没搞明白Rcpp有啥好处，坏处一大堆
1）多了个Rcpp，要调整一个compiler flag，比如从O2改成O3都很麻烦；
2）貌似也没有找到能用gdb去debug的方法
3）各种library的设置都很麻烦
要说好处，只能说RCpp读取数据方便点，但是这也不如先用Python之类的把数据整理一
下，用C++直接把文件读出来就是

w**********y
发帖数: 1691

来自主题: Statistics版 - 实在搞不懂Rcpp有啥好处

debug的确很麻烦，但是如果你cpp的部分都需要非常复杂的debug的情况下，可能R已经
不是一个很好的选择了。
Rcpp对于一些逻辑很简单但是又避免不了forloop的simple task很有用。
举个例子，假如你有100只股票，每只股票过去10年的每分钟数据，那么你大概有一个
100x一百万的矩阵，现在要求你算每个股票的hourly rolling mean/variance/
correlation/alpha/beta. 每个的计算都不会超过20行的代码。Rcpp 对这样的任务简
直是神器。
runMean, runSD这些都在TTR的package里面有。但是runBeta, runAlpha, runSkew,
runCor这种我都是自己定义，基本上每天都在用。
btw，请教一下，你说的compiler flag, o2, o3是什么？

n*****3
发帖数: 1584

来自主题: DataSciences版 - 问一道(大)数据 algorithm (转载)

你说的对，这不是大数据。
因为在modeling training stage，用 R 跑prototype system，
R itself 不方便并行。下一阶段会上 spark／hadoop。
继续抛砖引玉一下。下面的 CPP file 跑得还行。用 sourceRCPP call
就可。看看有什么可以改进的，
再次谢谢大家 all
#include
using namespace Rcpp;
using namespace std;
// Below is a the balancing C++ function to R. You can
// source this function into an R session using the Rcpp::sourceCpp
// function (or via the Source button on the editor toolbar)
// For more on using Rcpp click the Help button on the editor toolbar
// [... 阅读全帖

k*******a
发帖数: 772

来自主题: Statistics版 - R，用apply比用for loop 快？

apply没用
首先，看看有没有一些code可以vectorize来提高运算，如果已经vectorize极限了，可
以考虑C++算
推荐用 Rcpp 来把运算最多的部分（比如posterior函数) 用C++ code写，Rcpp很好用
，写起来和写Rcode差不多。
或者，全部bayesian的运算都用C++写就更快了

n*****3
发帖数: 1584

来自主题: Military版 - 感觉python的前途堪忧

R is not designed for such task, if you really need to do something like
this,
do it in c++, then use RCPP.
use ggplot/shiny or other packages , it is very easy, actually that is
what R is good at.

n*****3
发帖数: 1584

来自主题: JobHunting版 - 问一道(大)数据 algorithm

请教大家一下：
两组人， POSITIVE 和 Negative ，
say
POSITIVE 100K ppl，
Negative 900K ppl.
基本的数据结构是人的 ID 和 length of stay（待了几天）。
ID length of stay(days)
ppl-0000001 8
ppl-0000002 10
...
目的是 sample Negative 组出来 100K 人 ,
which one-to-one match the Positive 组人
的 length of stay（待了几天），
这样 match 完, 两组人的 100K 个 length of stay（待了几天）
完全一样.
当然如果 negative
组人有多个 match 一个 POSITIVE 组人，任取一个就好了。
想用 c++ 写，use STL／Map hash，
不知有没好的算法哦，
or 更好的 STL 数据结构／算法可用？
因为是准备写成 RC... 阅读全帖

n*****3
发帖数: 1584

来自主题: CS版 - 问一道(大)数据 algorithm (转载)

【以下文字转载自 JobHunting 讨论区】
发信人: nacst23 (cnc), 信区: JobHunting
标题: 问一道(大)数据 algorithm
发信站: BBS 未名空间站 (Sun Mar 22 00:11:01 2015, 美东)
请教大家一下：
两组人， POSITIVE 和 Negative ，
say
POSITIVE 100K ppl，
Negative 900K ppl.
基本的数据结构是人的 ID 和 length of stay（待了几天）。
ID length of stay(days)
ppl-0000001 8
ppl-0000002 10
...
目的是 sample Negative 组出来 100K 人 ,
which one-to-one match the Positive 组人
的 length of stay（待了几天），
这样 match 完, 两组人的 100K 个 length of stay（待了几天）
完全... 阅读全帖

c****t
发帖数: 19049

来自主题: Programming版 - Scala or clojure

都是理想。科学计算的基本模块从来没需要超越甚至FORTRAN77的能力。数据分析要么
用SAS，好歹SAS现在有IML了，不那么土了; 如果非要自己写算法就学C,用Rcpp和
Cython。

only
power

G**Y
发帖数: 33224

来自主题: Programming版 - 这两天感觉编程功力有长进

看楼上link呀:
http://adv-r.had.co.nz/
这算是intermediate的书吧。大概你用R写过些code了，体会会深些。
不过如果你有其他语言的编程基础，当入门书也可以。
各种语言无非是几条：
data structure
functions
oop
各种软件包
R的优势是各种统计包和vector运算。
看了这本书，和大家谈论的名词们一比较。我才意识到肃然R里面的很多概念其实很先
进的。
R开发软件的界面方面，传统上不带GUI。目前的trend似乎是用JS+Web来弥补。但这些
方面我也才刚刚开始看。
上面这个书的作者，也搞了很多R的interactive graphics的东西。
近两天看到另外一个和牛的东西叫RCpp。可以让嵌入cpp程序容易一个量级。

n*****3
发帖数: 1584

来自主题: Programming版 - 问一道(大)数据 algorithm (转载)

n******g
发帖数: 2201

来自主题: Programming版 - data.table谁用过？有那么神吗？

python pandas的速度不如data.table ,
这个data.table虽然是R package, 但是地层是C, 用Rcpp写的。

w**********y
发帖数: 1691

来自主题: Quant版 - Python 替代 R，好使吗？

速度快慢与oop无关。
script language都慢。 R的解决办法是Rcpp， Python的解决办法是CPython
做大的project，production code 的时候，R的语法结构性的劣势就非常明显了。
adhoc work， R更flexible

a*******1
发帖数: 1554

来自主题: Quant版 - Python 替代 R，好使吗？

把复杂的耗时间多的运算放进Rcpp/inline那边处理，比如用sourceCpp来编译，然后R
里面直接调用，正常情况下可以快100倍。

L*******t
发帖数: 2385

来自主题: Quant版 - 学Matlab还是R?

大哥用函数之前要做功课啊，哈哈
RStudio还能编辑Latex，还有Rcpp，rJava,rJython等等包，实在是无敌了。
package多的数不过来。缺点是慢，我不了解Python的速度如何，希望这个语言的
package也能像R一样多。
Python还支持symbolic math，如果速度快我考虑把我的Mathematica卸了。

a*******1
发帖数: 1554

来自主题: Quant版 - 学Matlab还是R?

R在各个领域都有package,想速度快可以用Rcpp/inline/parallel或者结合起来，想多
快都行；相比之下，python只有最基础的几个包，很多东西都要从零开始自己写。运行
速度上大家都是解释性语言，都要结合c++和并行才能快。

f***a
发帖数: 329

来自主题: Statistics版 - 【欢迎进来讨论】for loop in R

It seems the actual looping of "lapply" is done internally in C code and "
apply" isn't really faster than writing a loop. The main advantage of "apply
" is it simplifies code writing?
colMeans/rowSums() and vectorization of a function are faster than a loop
though.
Anyway, I think, for algorithm with heavy computation involved, C/C++ should
be employed to handle computing part. And I strongly recommend {Rcpp} which
provides much much better API than the original one in R.
(My previous questions... 阅读全帖

o**m
发帖数: 828

来自主题: Statistics版 - R 有点令人失望

rcpp

t****a
发帖数: 1212

来自主题: Statistics版 - R 有点令人失望

如楼上所说，你可以用Rcpp去为专门某个任务去写一个应用来解决特定的问题。

B******5
发帖数: 4676

来自主题: Statistics版 - R 有点令人失望

gdb可以debug的，STL debug确实恶心点。这不是rcpp的问题，是C++的问题。因噎废食了

n*****3
发帖数: 1584

来自主题: Statistics版 - R 有点令人失望

But Rccp extensivly use STL and template, both are bad when debugging.
when use C with R, no such issue.
As I said, some small trunk of computation extensive codes, rcpp is just
fine.
just my 2 cents.

食了

a******r
发帖数: 706

来自主题: Statistics版 - 提高R速度的一些tips

Rcpp is another way to go, especially when programing the bottleneck
function.

n*****3
发帖数: 1584

来自主题: Statistics版 - 学习R的过程还是挺痛苦的！

agree, it is more lisp/fuctional programming style.
BTW I do not think R is easy; it is easy for some adhoc analysis, quick
dirty and done;
but for serious/real life development, it is at least as hard as python or
other
script language. If you want performance, link it with Rcpp or just use the
c source library, it is sure not an easy task. very hard to debug..

d******e
发帖数: 7844

来自主题: Statistics版 - 实在搞不懂Rcpp有啥好处

我除了写package之外，几乎从不从R里call C或C++。
没什么便利，都是麻烦。

l******n
发帖数: 9344

来自主题: Statistics版 - 实在搞不懂Rcpp有啥好处

很多常用的package都用很多现成c/cp的东西，这个还是很必要的。

e*********6
发帖数: 3453

来自主题: Statistics版 - 实在搞不懂Rcpp有啥好处

compiler flag可以参见这个http://www.zhihu.com/question/27090458，还有很多选项，有时候新加的library在g++命令上加上比较方便。

个

f***8
发帖数: 571

来自主题: DataSciences版 - 求问编程语言的选择，学stat的往DS努力

如果R和C++混合用可以吗，比如Rcpp？

f***8
发帖数: 571

来自主题: DataSciences版 - 求问编程语言的选择，学stat的往DS努力

如果R和C++混合用可以吗，比如Rcpp？

w**********y
发帖数: 1691

来自主题: DataSciences版 - 求问编程语言的选择，学stat的往DS努力

rcpp. foreach. 并行计算又能解决loop 浪费时间的问题而且选对了package 基本上
底层的代码都是Java c FORTRAN 这三个
速度秒杀python...

n*****3
发帖数: 1584

来自主题: DataSciences版 - 问一道(大)数据 algorithm (转载)

n*****3
发帖数: 1584

来自主题: DataSciences版 - 问一道(大)数据 algorithm (转载)

first line
／*** assume f[] g[] are all sorted!!!!! ***/
I sort them in the R part, before passing them to RCPP.
In general, I think R vectorized operating is fast
enough.

n*****3
发帖数: 1584

来自主题: DataSciences版 - 问一道(大)数据 algorithm (转载)

／*** assume f[] g[] are all sorted!!!!! ***/
I sort them in the pure R section first; vectorized operation like data.
table/setkey is very fast already. Try to keep the RCPP as mini as possible.

n*****3
发帖数: 1584

来自主题: DataSciences版 - 问一道(大)数据 algorithm (转载)

good point;
if everything in c/c++, I bet there is better solution;
but here I just want to speed up a small, but very time consuming(big loop)
part ,
in R.
Sure I can use std::sort to sort them in the RCPP code instead of using R/
data.table/setkey.
BTW do you have better algorithm for this? my solution is just for 抛砖引玉.

only
wrong

#	版面	帖数(主题数)
-	全站	4871 (796)
1	Military	3777 (569)
2	Stock	341 (51)
3	Joke	117 (17)
4	History	116 (3)
5	Automobile	100 (9)
6	USANews	55 (9)
7	Midlife	45 (1)
8	Headline	41 (41)
9	Dreamer	33 (13)
10	FleaMarket	32 (20)
11	Living	30 (7)

topics

未名新帖统计// 7月16日

历史上的今天