j*******e 发帖数: 674 | 1 我想自己做个类似于web crawler的东东,实现自动登陆进入网站,购物,付费,提交
。希望能够适用于各个网站,比如Amazon, ebay ....
不知用什么语言合适?有没有通用的接口,工具,框架可用?
我是做C++的,对web 编程一窍不通。希望牛人给指点方向。 |
x****d 发帖数: 1766 | 2 chrome macro should be good enough for this.
you can use java, but it is overkill. And the learning curve is steep. |
z****e 发帖数: 54598 | 3 python
我这边就是用python教这个web crawler
当然java也可以,但是基于你不会java的话
还是python |
l**********n 发帖数: 8443 | 4 why java is overkill? java is the easiest to pick up, esp. for someone who
has oop exp.
【在 x****d 的大作中提到】 : chrome macro should be good enough for this. : you can use java, but it is overkill. And the learning curve is steep.
|
l**********n 发帖数: 8443 | 5 node.js should be one of the interesting options. python is ok.
who
【在 l**********n 的大作中提到】 : why java is overkill? java is the easiest to pick up, esp. for someone who : has oop exp.
|
d****i 发帖数: 4809 | 6 agree, if OP has C++ experience, then Java is more suitable since C++ is
father of Java and the syntax of both are very similar (both are derived
from C).
who
【在 l**********n 的大作中提到】 : why java is overkill? java is the easiest to pick up, esp. for someone who : has oop exp.
|
x****d 发帖数: 1766 | 7 maybe I get it wrong? Was he trying to do a real crawler? or just something
manage login script like macro? make daily life easier?
If he is trying to do something like monitoring ebay bids, place bids, that
is different. I agree java is good option. But how many of us use a robot
shop amazon? I think it is crazy.
I don't know how much existing java code out there. but php has tons as I
know, should be good for doing things like ebay crawling, if he means
crawling like real.
most users don't have a case for programming crawler. just a hot head, then
find himself can't use it much.
Java has bixo. it comes with mining codes too. so you can crawl mitbbs to
harvest all the pics plmm "ben"ed. And find out who "ben" the most. LOL. |
j*******e 发帖数: 674 | 8 感谢各位回复。
我要做得东西很接近“like monitoring ebay bids, place bids”。
目前我得主要编程语言是C++,还有些Perl。Java培训过,用过一点,但J2EE没有过。
对Python,Java,JavaScript都感兴趣,希望借一个实际项目真正学一下。对其他新兴
技术不了解,也希望开阔眼界。
something
that
then
【在 x****d 的大作中提到】 : maybe I get it wrong? Was he trying to do a real crawler? or just something : manage login script like macro? make daily life easier? : If he is trying to do something like monitoring ebay bids, place bids, that : is different. I agree java is good option. But how many of us use a robot : shop amazon? I think it is crazy. : I don't know how much existing java code out there. but php has tons as I : know, should be good for doing things like ebay crawling, if he means : crawling like real. : most users don't have a case for programming crawler. just a hot head, then : find himself can't use it much.
|
|
e*******o 发帖数: 4654 | 9 https://metacpan.org/pod/WWW::Mechanize
Perl 干这个绰绰有余。
【在 j*******e 的大作中提到】 : 感谢各位回复。 : 我要做得东西很接近“like monitoring ebay bids, place bids”。 : 目前我得主要编程语言是C++,还有些Perl。Java培训过,用过一点,但J2EE没有过。 : 对Python,Java,JavaScript都感兴趣,希望借一个实际项目真正学一下。对其他新兴 : 技术不了解,也希望开阔眼界。 : : something : that : then
|
m******t 发帖数: 635 | 10 每个网站应该是不同的吧,难道要针对他们一个个写?工作量很大的样子,不知道有没
有必要好的办法,比如录制macro之类的
【在 j*******e 的大作中提到】 : 我想自己做个类似于web crawler的东东,实现自动登陆进入网站,购物,付费,提交 : 。希望能够适用于各个网站,比如Amazon, ebay .... : 不知用什么语言合适?有没有通用的接口,工具,框架可用? : 我是做C++的,对web 编程一窍不通。希望牛人给指点方向。
|
|
|
x****d 发帖数: 1766 | 11 crawling is the easiest part. parsing html yourself, you need to put in tons
of hours that not worth. you can get text from html with tools like tika,
boilerpipe, that would make your life easier. but after that it is still
very hard to interact with websites.
million dollars testing software, like qtp, hpqc, they use screen position
to click/interact. Basically what they use is browser macro.
That is why I suggest Chrome Macro/Browser macro in the first place. |
x****d 发帖数: 1766 | 12 macro有现成的,很多。
【在 m******t 的大作中提到】 : 每个网站应该是不同的吧,难道要针对他们一个个写?工作量很大的样子,不知道有没 : 有必要好的办法,比如录制macro之类的
|
l*******s 发帖数: 1258 | 13 这个其实不能算是crawler,因为重点是在登陆购买提交部分
你要登陆Amazon等网站,我觉得最麻烦的安全验证机制,就是说人家让不让你自动提交
订单,这个可能是最麻烦的。
至于抓去网页或者跳来跳去的,好几个包可以用,编程语言这里不是重点,安全验证机
制才是。 |
z****e 发帖数: 54598 | 14 c++不是father of java
c++是一个被淘汰了不成功的实验品
在web领域,后续随便一个语言都beat c++
c++就是上个世纪末的产物
现在连做app都不用c++了
c++对于现阶段最大的legacy是那一堆游戏代码
【在 d****i 的大作中提到】 : agree, if OP has C++ experience, then Java is more suitable since C++ is : father of Java and the syntax of both are very similar (both are derived : from C). : : who
|
z****e 发帖数: 54598 | 15 如果是monitor的话
这种很多cloud平台都是ruby写的
比如digitalocean和rhcloud
【在 j*******e 的大作中提到】 : 感谢各位回复。 : 我要做得东西很接近“like monitoring ebay bids, place bids”。 : 目前我得主要编程语言是C++,还有些Perl。Java培训过,用过一点,但J2EE没有过。 : 对Python,Java,JavaScript都感兴趣,希望借一个实际项目真正学一下。对其他新兴 : 技术不了解,也希望开阔眼界。 : : something : that : then
|
d****i 发帖数: 4809 | 16 http://en.wikipedia.org/wiki/Java_%28programming_language%29#Sy
http://en.wikipedia.org/wiki/Java_syntax
这里都知道你是C++黑,但是Java是从C++简化继承来这种事实有什么好否认的,如果按
照继承类的话就是
C--
|___C++__
|____Java
|____C#
【在 z****e 的大作中提到】 : c++不是father of java : c++是一个被淘汰了不成功的实验品 : 在web领域,后续随便一个语言都beat c++ : c++就是上个世纪末的产物 : 现在连做app都不用c++了 : c++对于现阶段最大的legacy是那一堆游戏代码
|
z****e 发帖数: 54598 | 17 你知道wikipedia里面influenced by是什么意思么?
Influenced by Ada 83, C++, C#,[2] Eiffel,[3] Generic Java, Mesa,[4]
Modula-3,[5] Oberon,[6] Objective-C,[7] UCSD Pascal,[8][9] Smalltalk
Influenced Ada 2005, BeanShell, C#, Clojure, D, ECMAScript, Groovy, J#,
JavaScript, PHP, Python, Scala, Seed7, Vala
java原名c++++--
意思就是要改造c++
而不是简单的继承
【在 d****i 的大作中提到】 : http://en.wikipedia.org/wiki/Java_%28programming_language%29#Sy : http://en.wikipedia.org/wiki/Java_syntax : 这里都知道你是C++黑,但是Java是从C++简化继承来这种事实有什么好否认的,如果按 : 照继承类的话就是 : C-- : |___C++__ : |____Java : |____C#
|
d****i 发帖数: 4809 | 18 改造就是把指针,多继承,操作符重载,虚函数等等都去掉了,但是C++主要的基本特
性:继承,重载,多态,封装,模版,还有语法都继承下来了。
【在 z****e 的大作中提到】 : 你知道wikipedia里面influenced by是什么意思么? : Influenced by Ada 83, C++, C#,[2] Eiffel,[3] Generic Java, Mesa,[4] : Modula-3,[5] Oberon,[6] Objective-C,[7] UCSD Pascal,[8][9] Smalltalk : Influenced Ada 2005, BeanShell, C#, Clojure, D, ECMAScript, Groovy, J#, : JavaScript, PHP, Python, Scala, Seed7, Vala : java原名c++++-- : 意思就是要改造c++ : 而不是简单的继承
|
z****e 发帖数: 54598 | 19 那主要是oop的特征,说oop主要是smalltalk才是真正的先驱,而不是c++
语法本身也直接出自c,跟c++有什么关系?
The Smalltalk language, which was developed at Xerox PARC (by Alan Kay and
others) in the 1970s, introduced the term object-oriented programming to
represent the pervasive use of objects and messages as the basis for
computation. Smalltalk creators were influenced by the ideas introduced in
Simula 67, but Smalltalk was designed to be a fully dynamic system in which
classes could be created and modified dynamically rather than statically as
in Simula 67.[11] Smalltalk and with it OOP were introduced to a wider
audience by the August 1981 issue of Byte Magazine.
【在 d****i 的大作中提到】 : 改造就是把指针,多继承,操作符重载,虚函数等等都去掉了,但是C++主要的基本特 : 性:继承,重载,多态,封装,模版,还有语法都继承下来了。
|
d****i 发帖数: 4809 | 20 反正你这个C++黑是黑到底了,其实人open minded一点,我倒是见到原来我们公司几个
原来写Java的后来写C++一样写的很不错,语言都是想通的,尤其是这两个C类OO语言。
which
as
【在 z****e 的大作中提到】 : 那主要是oop的特征,说oop主要是smalltalk才是真正的先驱,而不是c++ : 语法本身也直接出自c,跟c++有什么关系? : The Smalltalk language, which was developed at Xerox PARC (by Alan Kay and : others) in the 1970s, introduced the term object-oriented programming to : represent the pervasive use of objects and messages as the basis for : computation. Smalltalk creators were influenced by the ideas introduced in : Simula 67, but Smalltalk was designed to be a fully dynamic system in which : classes could be created and modified dynamically rather than statically as : in Simula 67.[11] Smalltalk and with it OOP were introduced to a wider : audience by the August 1981 issue of Byte Magazine.
|
z****e 发帖数: 54598 | 21 我很open minded
现实就是c++机会越来越少
我这边有些搞c++的程序员连工作都找不到
甚至不如ruby哟,这是实话
【在 d****i 的大作中提到】 : 反正你这个C++黑是黑到底了,其实人open minded一点,我倒是见到原来我们公司几个 : 原来写Java的后来写C++一样写的很不错,语言都是想通的,尤其是这两个C类OO语言。 : : which : as
|