z**********i 发帖数: 12276 | 1 不了解,头一次听说。
Current Release: HDF5-1.8.10
HDF5 is a data model, library, and file format for storing and managing data
. It supports an unlimited variety of datatypes, and is designed for
flexible and efficient I/O and for high volume and complex data. HDF5 is
portable and is extensible, allowing applications to evolve in their use of
HDF5. The HDF5 Technology suite includes tools and applications for managing
, manipulating, viewing, and analyzing data in the HDF5 format. |
|
p**z 发帖数: 65 | 2 不把笔记翻译成中文了,就直接贴了。
Example Python code below for creating the HDF5 file. Note uncompressed or '
gzip' compression type can be understood by both Matlab and HDFView.
from __future__ import division, print_function
import h5py
import numpy as np
data = np.array([('John', 35, 160.5), ('Mary', 20, 150)], dtype= [('Name', '
a10'), ('Age' ,'i'), ('Weight', 'f')])
##alternative:
#data = np.array([('John', 35, 160.5), ('Mary', 20, 150)], dtype = {'names':
['Name','Age','Weight'], 'formats':['a10','i','f'... 阅读全帖 |
|
gw 发帖数: 2175 | 3 提示是要把 库文件改名结果 permission denied.其实是根本没有那个要改名的文件
比如 mv hdf5.so.8.0.0.1 hdf5.so.8.0.0.1U
看了其中一些debug 信息,似乎有说 relink before install 不理解 |
|
r****t 发帖数: 10904 | 4 现在 .mat 都是 hdf5 格式,直接用 hdf5 提供的c++ api 读。 |
|
B********e 发帖数: 1062 | 5 json and hdf5 are both our friends They are both very flexible and can be
used to represent complex structures.
Normally, we used json to define high level concepts and used hdf5 to store
the intermediate results. |
|
D***h 发帖数: 183 | 6 就是python pandas写出的hdf5文件,然后由于想从java直接访问python输出的hdf5文
件,可行吗? |
|
p*******e 发帖数: 125 | 7 Hdf5 on Hadoop? 感觉除了高频数据,大多没有那么big,是不是hdf5 file分时间段(
一年一个file)存就不错?这Hadoop hdfs可能提供了一个fault tolerance的好处,不
过文件corrupted大多也可以重新load一次。这distributed file system对time
series data还有什么好处?欢迎大家讨论。想到这个因为听说一些fintech公司用
Hadoop spark处理这些数据。 |
|
w********w 发帖数: 4 | 8 仿真时老是说hdf5的库找不到h5f.c之类的文件
是不是那个hdf5 serial的包有问题? |
|
w********w 发帖数: 4 | 9 仿真时老是说hdf5的库找不到h5f.c之类的文件
是不是那个hdf5 serial的包有问题? |
|
r*****s 发帖数: 590 | 10 你机器里hdf5装了马 我记得是必须的 装完meep自动就有hdf5了吧 |
|
o**n 发帖数: 1249 | 11 btw, the reason I'm using hdf5 instead mat is because hdf5 can handle
multiple levels of data such as my data structure: /Data001/X,Y,Z,.. /
Data002/X,Y,Z.. (X, Y, Z are some matrix). You've got to use structure in
this case for mat data, but I think structure is not efficient in terms of
both time and space consumption.
it. |
|
a*****9 发帖数: 153 | 12 cvmastersonline上线以来楼主已经改过40份左右的cs简历了,实在忍不住来版上吐个
槽。各种不准确不专业今天就不提了,主要吐槽一下楼主见过的低级失误:
ubuntu是没有11.02这个版本号的。。
HDFS,不是HDF5
Demo,不是demon。demon是魔鬼的意思。。。
大家都知道刷题重要,但是楼主还是忍不住提醒大家一下:刷题确实对面试很重要,但
前提是您的简历允许您拿到面试。如果楼主是面试官,楼主对简历的想法会很简单:同
学您这简历都可以写的这么草率,写的code也就不用看了吧。。。 |
|
c*******y 发帖数: 1630 | 13 I will show you some sample I collected. I spent hours on programming
tricks/usages, potential outdated packages. asking around stackoverflow
to get something working.
Originally I thought IB disconnects frequently, but I will give a second
thought.
Here's some test.
Time range:
2014-03-03 23:30:30 to 2014-03-05 19:41:41 almost 2 days.
In [40]: e.head(1)
Out[40]:
Bid n
Time
2014-03-03 23:30:30.224323 0.8925 0
In [42]: e.tail(2)
Ou... 阅读全帖 |
|
c*******y 发帖数: 1630 | 14 I will show you some sample I collected. I spent hours on programming
tricks/usages, potential outdated packages. asking around stackoverflow
to get something working.
Originally I thought IB disconnects frequently, but I will give a second
thought.
Here's some test.
Time range:
2014-03-03 23:30:30 to 2014-03-05 19:41:41 almost 2 days.
In [40]: e.head(1)
Out[40]:
Bid n
Time
2014-03-03 23:30:30.224323 0.8925 0
In [42]: e.tail(2)
Ou... 阅读全帖 |
|
v*********w 发帖数: 7 | 15 Academic Division:
Engineering, Mathematics, Natural and Physical Sciences
Academic Department/Research Unit:
Computer Science and Engineering, Electrical and Computer Engineering,
Mathematics, San Diego Supercomputer Center
Disciplinary Specialty of Research:
Computational Sciences and Scalable IO Library Development
Description:
High Performance GeoComputing Laboratory at San Diego Supercomputer Center,
the University of California at San Diego (UCSD), invites applications for a
postdoct... 阅读全帖 |
|
gw 发帖数: 2175 | 16 不是make install centos 6.9
是装其他软件,比如hdf5, graphviz |
|
r****t 发帖数: 10904 | 17 听说 matlab .mat 现在已经是 hdf5 了,是不是这样? |
|
y**b 发帖数: 10166 | 18 最近发现MPI并行程序操作HDF5格式大文件时,使用array简直是恶梦。
一是分配和释放同一块内存可能由不同函数完成,程序员负担极重。
二是各个进程(比如发送者和接收者)必须极其明确自己是否分配或释放了某块内存,
点对点通讯与collective通讯混合使用时候,极易出错,而且不好调试。
改成vector以后轻松一大截。 |
|
y**b 发帖数: 10166 | 19 多谢pptwo and goodbug! 按这个思路做了,感觉不错,有几个问题再请教一下:
1. 这个singleton维护的hashmap类似于一个全局变量,无需传递函数参数,
任何对象和函数都可以取用,很方便,可是总觉得有点特别。想问一下这样做
很普遍吗?有个实验室开发的一个大型面向对象程序包,读入数据之后进行了
无数的分离和传递,直到每个用到(不同数据部分)的对象都完全用local的数据
结构来维护所需数据,好处是各个对象显得high cohesion, 缺点是非常繁琐、
数据冗余很多。你们觉得那种设计更好?
2. pptwo: You got great flexibility by not hard-coding all the parameters
in that singleton class. 这句话怎么理解?我想把所有数据一次性读入到
该singleton class,这样失去flexibility?
3. 大量进程读(一次)一个小文件(比如singleton class存储的内容)开销不大,
但是读那些很大的数据文件开销可能很大。比如我在该singleton cla... 阅读全帖 |
|
B********e 发帖数: 1062 | 20 hdf5 作计算的很多都用这种格式
bson比较简单,转化容易 |
|
|
|
|
g*******u 发帖数: 3948 | 24 每条数据存一个文件? binary的? 不需要压缩一下?
一个一条 读起来会不会 来回读 费时间? 不搞笑?
我本来还想好多数据放一个表 存个hdf5? 似乎也没有意思对吧?
主要是, 一个一条读起来高效吗?
thx |
|
w***g 发帖数: 5958 | 25 你要给出应用场景,或许能再给点别的建议。
HDF5没意思。如果非要数据库,可以考虑leveldb。 |
|
g*******u 发帖数: 3948 | 26 我有两个应用 time series 数据
1 就是 固定长度的数据 我组织好用来做训练 。 比如每条1分钟之类的。 数据可能上
千万。每条数据倒不大。
2 就是时间序列很长 比如一个文件可能是1个月的数据, 一条数据有 可能60G之类的
, 可以比较方便的进行按某个时间段进行查询和截取 比如 需要 今天 10点到 11点
的数据
目前先侧重1 吧
感觉1,2 要采用不同的方法吧?
我也知道hdf5太老了 但是也不知道用别的啥
多谢 |
|
f*******a 发帖数: 80 | 27 I installed MEEP on cygwin with HDF5 1.8. No problem. |
|
v*********w 发帖数: 7 | 28 Academic Division:
Engineering, Mathematics, Natural and Physical Sciences
Academic Department/Research Unit:
Computer Science and Engineering, Electrical and Computer Engineering,
Mathematics, San Diego Supercomputer Center
Disciplinary Specialty of Research:
Computational Sciences and Scalable IO Library Development
Description:
High Performance GeoComputing Laboratory at San Diego Supercomputer Center,
the University of California at San Diego (UCSD), invites applications for a
postdoct... 阅读全帖 |
|
v*********w 发帖数: 7 | 29 Academic Division:
Engineering, Mathematics, Natural and Physical Sciences
Academic Department/Research Unit:
Computer Science and Engineering, Electrical and Computer Engineering,
Mathematics, San Diego Supercomputer Center
Disciplinary Specialty of Research:
Computational Sciences and Scalable IO Library Development
Description:
High Performance GeoComputing Laboratory at San Diego Supercomputer Center,
the University of California at San Diego (UCSD), invites applications for a
postdoct... 阅读全帖 |
|
r***6 发帖数: 401 | 30 In memory last n ticks use circular buffer. For a whole days data use hdf5
or binary or R dataset.
co-ask.
★ Sent from iPhone App: iReader Mitbbs 6.88 - iPhone Lite |
|
mw 发帖数: 525 | 31 有钱的话kdb
没钱的话hdf5
hiahia |
|
|
k*******d 发帖数: 1340 | 33 每个entry都一样长度的话还是可以的,直接seek。
HDF5估计不行 |
|
k*******d 发帖数: 1340 | 34 每个entry都一样长度的话还是可以的,直接seek。
HDF5估计不行 |
|
c*******g 发帖数: 695 | 35 Package里面
http://cran.r-project.org/web/packages/xgobi/index.html
Windows binary: not available, see ReadMe
ReadMe 说
ADaCGH, GDD, PermuteNGS, RDieHarder, RScaLAPACK, Rcplex, Rmpi, SV,
cudaBayesreg, doMPI, gputools, hdf5, magma, ncdf4, rpud, rpvm, xgobi
or their dependencies also require additional libraries / software to
build on Windows I do not have (and may not even exist in versions
for Windows).
manual 里面说
SystemRequirements xgobi must be installed additionally, see file README, or
INST... 阅读全帖 |
|
s*********e 发帖数: 1051 | 36 how is the performance? |
|
s*********e 发帖数: 1051 | 37 就是这第一句话。
两个特点有点意思
- 存储大数据
- 读效率很高
data
of
managing |
|
D******n 发帖数: 2836 | 38 貌似是一个旧概念,当初matlab好像挺喜欢用这个format。
如果没记错的话。 |
|
|
|
|
D**u 发帖数: 288 | 42 Ok, I am going to try hdf5 + data.table combination, compare to rsqlite+
sqldf. That will be the optimal way I can think of now. |
|
s*********e 发帖数: 1051 | 43 看情况。SQLite 的portability 好,但是hdf5的读取速度快。 |
|