x**********g 发帖数: 82 | 1 最近想简单处理一下http://www.flcdatacenter.com/CasePerm.aspx
的perm数据,估计一下中印2,3类大致有多少。
简单的想法是估算,5万以下的算3类,8万以上的算2类,中间的一半一半。
不过我对数据不是非常理解,希望懂的同行解释一下,例如格式是
"CASE_NUMBER","APPLICATION TYPE","DECISION DATE","CASE STATUS","EMPLOYER
NAME","EMPLOYER ADDRESS_1","EMPLOYER ADDRESS_2","EMPLOYER CITY","EMPLOYER
STATE","EMPLOYER POSTAL CODE","2007 NAICS US CODE","2007 NAICS US TITLE","US
ECONOMIC SECTOR","PW SOC CODE","PW JOB TITLE 9089","PW LEVEL 9089","PW
AMOUNT 9089","PW UNIT OF PAY 9089","WAGE OFFER FROM 9089","WAGE OFFER TO
9089","WAGE OFFER UNIT OF PAY 9089","JOB INFO WORK CITY","JOB INFO WORK
STATE","COUNTRY OF CITZENSHIP","CLASS OF ADMISSION"
1. "CASE STATUS" 是Certified, 或Certified-Expired 都算批准,还是只有
Certified才算批准?
2. 工资已哪一项为准?"PW LEVEL 9089","PW AMOUNT 9089","PW UNIT OF PAY 9089",
"WAGE OFFER FROM 9089","WAGE OFFER TO 9089","WAGE OFFER UNIT OF PAY 9089" 看
不懂这些工资的区别。
如下面这个例子。
"C-08301-99618","PERM",8/28/2009 8:51:12,"Certified","NAVISTAR, INC.","4201
WINFIELD RD",,"WARRENVILLE","IL","60555","33612","Heavy Duty Truck
Manufacturing","Automotive","17-2141.00","Mechanical Engineer","Level I",
43347.00,"yr",43347.00,79651.00,"yr","Melrose Park","IL","CHINA","H-1B"
他的工资是 43347.00 还是 79651.00?
等我处理完了,把程序也贴上来 | y******a 发帖数: 510 | 2 定
US
【在 x**********g 的大作中提到】 : 最近想简单处理一下http://www.flcdatacenter.com/CasePerm.aspx : 的perm数据,估计一下中印2,3类大致有多少。 : 简单的想法是估算,5万以下的算3类,8万以上的算2类,中间的一半一半。 : 不过我对数据不是非常理解,希望懂的同行解释一下,例如格式是 : "CASE_NUMBER","APPLICATION TYPE","DECISION DATE","CASE STATUS","EMPLOYER : NAME","EMPLOYER ADDRESS_1","EMPLOYER ADDRESS_2","EMPLOYER CITY","EMPLOYER : STATE","EMPLOYER POSTAL CODE","2007 NAICS US CODE","2007 NAICS US TITLE","US : ECONOMIC SECTOR","PW SOC CODE","PW JOB TITLE 9089","PW LEVEL 9089","PW : AMOUNT 9089","PW UNIT OF PAY 9089","WAGE OFFER FROM 9089","WAGE OFFER TO : 9089","WAGE OFFER UNIT OF PAY 9089","JOB INFO WORK CITY","JOB INFO WORK
| B*******t 发帖数: 403 | 3 Certified, 或Certified-Expired 都算批准 | c******4 发帖数: 933 | | x**********g 发帖数: 82 | 5 ******************************************************
Perm 2009
all cases (all countries) = 38248
China_EB23 (all applicants) = 2516
China_EB23 (approved) = 2111
not processed due to non-recognized format =1
The cases seem abnormal =11
0~40K :398; 40K~60K: 355; 60K~80K: 606; >80K: 752
India_EB23 (all applicants) = 13532
India_EB23 (approved) = 11381
not processed due to non-recognized format =1
The cases seem abnormal =135
0~40K :354; 40K~60K: 1856; 60K~80K: 4014; >80K: 5157
*******************************************************
Perm 2008
all cases (all countries) = 61999
China_EB23 (all applicants) = 3787
China_EB23 (approved) = 3322
not processed due to non-recognized format =6
The cases seem abnormal =26
0~40K :627; 40K~60K: 645; 60K~80K: 940; >80K: 1110
India_EB23 (all applicants) = 18837
India_EB23 (approved) = 16555
not processed due to non-recognized format =6
The cases seem abnormal =210
0~40K :633; 40K~60K: 3543; 60K~80K: 5920; >80K: 6459
*******************************************************
Perm 2007
all cases (all countries) = 98755
China_EB23 (all applicants) = 7391
China_EB23 (approved) = 6817
not processed due to non-recognized format =10
The cases seem abnormal =41
0~40K :1424; 40K~60K: 1351; 60K~80K: 1957; >80K: 2085
India_EB23 (all applicants) = 26613
India_EB23 (approved) = 24518
not processed due to non-recognized format =10
The cases seem abnormal =264
0~40K :989; 40K~60K: 4514; 60K~80K: 9200; >80K: 9815 | x**********g 发帖数: 82 | 6 Simple explanation:
1. Certified, 或Certified-Expired 都算批准
2. Salary is "WAGE OFFER UNIT OF PAY 9089". if it does not exist, then it is
"WAGE OFFER TO 9089".
3. Some computation might be wrong. For example, I set all teachers work for
12 months, this may be wrong (they get paid for 10 months).
And there are some wield cases, like Freescale guys look to earn too much (>
150000/year). But generally, I think 99% case is correct. Anyway, it is only
an estimation.
4. Conclusion: Bright future for EBC2/3, when PD passes 2007. A3 sucks. | x**********g 发帖数: 82 | 7 Python program
# Authored by xiaofeixiang
# Copyright reserved to EBC23
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or (at
# your option) any later version.
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# General Public License for more details.
from Numeric import *
from array import *
import pdb
#File = open("PERM_FY2008.txt")
#File = open("PERM.txt")
File = open("PERM_FY2007_DATA.txt")
line_number = 0
China_EB2b_number_t = 0
China_EB2b_number_a = 0
China_EB2b_number_x = 0
SALARY_T_C = 0.0
check_C = 0
India_EB2b_number_t = 0
India_EB2b_number_a = 0
India_EB2b_number_x = 0
SALARY_T_I = 0.0
check_I = 0
C_less_40K = 0
C_bw_40K_60K = 0
C_bw_60K_80K = 0
C_morethan_80K = 0
I_less_40K = 0
I_bw_40K_60K = 0
I_bw_60K_80K = 0
I_morethan_80K = 0
line_items = zeros(50, Int)
i = 0
while 1:
line = File.readline()
# do some data processing
j = 0
tmp_item = array('c')
item_list = []
symbol = ''
token_time = 0
item_done = 0
line_len = len(line)
while j
char_token = line[j]
j += 1
tmp_item.append(char_token)
if item_done == 0:
item_done = 1
if char_token=='"' or char_token==',':
symbol = char_token
token_time = 0
else:
token_time = 1
if symbol==char_token:
token_time += 1
if token_time==2:
token_time = 0
item_done = 0
tmp_string = tmp_item.tostring()
tmp_string = tmp_string.replace('"','')
tmp_string = tmp_string.replace(',','')
item_list.append(tmp_string)
tmp_item = array('c')
# item_list = line.split('"')
# while (',' in item_list) :
# item_list.remove(',')
# while ('' in item_list) :
# item_list.remove('')
# while (',,' in item_list) :
# item_list.remove(',,')
# print len(item_list)
# print item_list
# line = line.replace('"', '')
# items = line.split(',')
# print line
# print line_number
# print " "
item_len = len(item_list)
status = " "
if line_len < 5 :
print line
print item_list
print "end of file"
break;
else:
line_number +=1
i +=1
line_items[item_len] +=1
if item_len > 10 :
status = item_list[3]
status = status.lower()
else:
continue
if "CHINA" in item_list :
index = item_list.index("CHINA")
pdb.set_trace()
China_EB2b_number_t +=1
# this is a approved case
if status == "certified" or status == "certified-expired":
# Process this guy's salary
# type can only be yr, mth, hr
app_type = item_list[index-3]
app_type = app_type.lower()
salary_type = ["yr", "mth", "wk","hr"]
salary_unit = [1, 12, 4*12, 40*4*12]
if app_type in salary_type:
amount = item_list[index-4].replace(',','')
index = salary_type.index(app_type)
if len(amount) < 1:
print line
continue;
# print line
amount_y = float (amount) * salary_unit[index]
# print amount, salary_unit[index], amount_y
#print item_list
SALARY_T_C += amount_y
China_EB2b_number_a +=1
if amount_y < 10000 or amount_y > 150000:
print "unreasonal amount, please check"
print amount, salary_unit[index], amount_y
print line
check_C +=1
if amount_y <= 40000:
C_less_40K +=1
elif amount_y > 40000 and amount_y <= 60000:
C_bw_40K_60K +=1
elif amount_y > 60000 and amount_y <= 80000:
C_bw_60K_80K +=1
else:
C_morethan_80K +=1
else:
print "ERROR", app_type
print item_list
print line
China_EB2b_number_x +=1
continue
if "INDIA" in item_list :
index = item_list.index("INDIA")
India_EB2b_number_t +=1
# print items
# this is a approved case
if status == "certified" or status == "certified-expired":
# Process this guy's salary
# type can only be yr, mth, hr
app_type = item_list[index-3]
app_type = app_type.lower()
salary_type = ["yr", "mth", "wk","hr"]
salary_unit = [1, 12, 4*12, 40*4*12]
if app_type in salary_type:
amount = item_list[index-4].replace(',','')
index = salary_type.index(app_type)
if len(amount) < 1:
print line
continue;
# print line
amount_y = float (amount) * salary_unit[index]
# print amount, salary_unit[index], amount_y
#print item_list
SALARY_T_I += amount_y
India_EB2b_number_a +=1
if amount_y < 10000 or amount_y > 150000:
print "unreasonal amount, please check"
print amount, salary_unit[index], amount_y
print line
check_I +=1
if amount_y <= 40000:
I_less_40K +=1
elif amount_y > 40000 and amount_y <= 60000:
I_bw_40K_60K +=1
elif amount_y > 60000 and amount_y <= 80000:
I_bw_60K_80K +=1
else:
I_morethan_80K +=1
else:
print "ERROR", app_type
print item_list
print line
India_EB2b_number_x +=1
continue
print "all cases (all countries) = " + str(line_number)
print "China_EB23 (all applicants) = " + str(China_EB2b_number_t)
print "China_EB23 (approved) = " + str(China_EB2b_number_a)
print "not processed due to non-recognized format =" + str(China_EB2b_number
_x)
print "The cases seem abnormal =" + str(check_C)
#print "average salary = " + str (SALARY_T_C /China_EB2b_number_a )
print "0~40K :" + str(C_less_40K) +"; 40K~60K: " + str(C_bw_40K_60K)+";
60K~80K: " + str(C_bw_60K_80K)+ "; >80K: " + str (C_morethan_80K)
print ""
print ""
print ""
print "India_EB23 (all applicants) = " + str(India_EB2b_number_t)
print "India_EB23 (approved) = " + str(India_EB2b_number_a)
print "not processed due to non-recognized format =" + str(China_EB2b_number
_x)
print "The cases seem abnormal =" + str(check_I)
print "0~40K :" + str(I_less_40K) +"; 40K~60K: " + str(I_bw_40K_60K)+";
60K~80K: " + str(I_bw_60K_80K)+ "; >80K: " + str (I_morethan_80K)
#print line_items | x**********g 发帖数: 82 | 8 one thought:
Lawyers and M.D. make a lots lots of $$$!!!! | S*p 发帖数: 1483 | 9 看来老中2008加上2009也就5500人左右,加上07年7月以后的就算4500,一共也就10K左
右,就是2009年末的PD,再有个三年左右也差不多绿了 | B********i 发帖数: 371 | 10 Thanks, that is an excellent job.
【在 x**********g 的大作中提到】 : one thought: : Lawyers and M.D. make a lots lots of $$$!!!!
| B********i 发帖数: 371 | 11 你忘了家属 和 niw。虽然2007大潮后的niw 没有大潮的那个月多。
【在 S*p 的大作中提到】 : 看来老中2008加上2009也就5500人左右,加上07年7月以后的就算4500,一共也就10K左 : 右,就是2009年末的PD,再有个三年左右也差不多绿了
| S*p 发帖数: 1483 | | L********n 发帖数: 930 | 13 Perm 2009 是指 Ficial Year approved (9,2008 - 8,2008), so these cases' PD 是
07 的吧?因为从PERM申请到批准大概要一年吧,
所以2009 PERM 是 PD 9.2007 - 8.2008, 即 PD大潮以后的 |
|