I am running an SSIS Package with in which 16 dataflow tasks reading an Oracle DB complete in about an hour. I found one task that was returning 54M rows in which we were processing every day. This is the original query running on SQL 2005;using Native OLE db\Microsoft OLE DB Provider for Oracle;
SELECT COMPANY ,
EMPLOYEE ,
CHECK_ID ,
OBJ_ID ,
DEPARTMENT ,
RECORD_TYPE ,
CHECK_TYPE ,
DST_ACCT_UNIT ,
DST_ACCOUNT ,
DIST_COMPANY ,
DED_CODE ,
DIST_AMT ,
GL_DATE ,
JOB_CODE ,
POSITION
FROM LAWSON.PRDISTRIB
WHERE COMPANY = 30
AND DED_CODE <> ' '
AND DED_CODE IS NOT NULL
We are moving to a new set of servers and I decided we did not need to reprocess history every time and added date criteria to narrow our results. It now looks like this using SQL 2008 R2 and Native OLE DB\Oracle Provider for OLE DB and has a much faster processor.
SELECT COMPANY ,
EMPLOYEE ,
CHECK_ID ,
OBJ_ID ,
DEPARTMENT ,
RECORD_TYPE ,
CHECK_TYPE ,
DST_ACCT_UNIT ,
DST_ACCOUNT ,
DIST_COMPANY ,
DED_CODE ,
DIST_AMT ,
GL_DATE ,
JOB_CODE ,
POSITION
FROM LAWSON.PRDISTRIB
WHERE COMPANY = 30
AND GL_DATE > '2012-09-30 00:00:00'
AND DED_CODE <> ' '
AND DED_CODE IS NOT NULL
The package now takes over 3 hours to pull data. Both are running 32 bit mode. Would adding the date make that much of a difference? I thought it would be better to reduce the number of rows coming into the pipe. thoughts? suggestions?
thanks!
Paws
↧