I was cc’d on an email yesterday morning that was sent out to the development group. It was a pretty bluntly worded question asking why a monthly job was taking well over an hour to run. Time to investigate.
The issue was reproducible in our staging environment and generating a SQL trace was an no-brainer to take a look at what the problem was.
After running the trace file through tkprof (all defaults) it was immediately apparent that the issue was with recursive statements. Recursive statements include all the SQL Oracle has to do in the background to parse incoming SQL. That overhead will vary depending on the type of parsing necessary but is unavoidable for newly created cursors. In addition, user defined PL/SQL functions called within SQL statements are also accounted for under the recursive statements summary.
Embedding user defined PL/SQL in a SQL statement or view definition has a number of disadvantages.
Below is the summary information from tkprof.
OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTScall count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 2 0.01 0.01 0 7 0 0 Execute 2 0.88 5.98 0 12145 74414 2 Fetch 0 0.00 0.00 0 0 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.89 5.99 0 12152 74414 2 OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 2141 17.89 17.89 0 0 0 0 Execute 13589173 492.64 489.04 5 180 0 0 Fetch 13589083 4007.37 4035.89 778773 293428221 462 13589118 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 27180397 4517.91 4542.83 778778 293428401 462 13589118
13.5 million executions of recursive SQL statements. Returning 13.5 million rows. I always have mixed reactions when I see stuff like this. On the one hand it is frustrating that easily avoidable inefficiencies are so common. On the other hand these sorts of things are typically quite easy to fix and the improvement in response time often seems like a miracle to the end user which is always rewarding.
Its my understanding that the application that produced this trace file was written under heavy deadlines and constantly changing requirements. I know, nothing new there. However, the developers seemed very fond of writing views on top of views on top of views. That wasn’t much of a problem when the data volume was relatively low but now that the application has been running for quite a few years it is not scaling very gracefully.
Anyways, after seeing the recursive statement summary it was only a matter of identifying the user defined function and figuring out how to merge the logic of that function into a purely SQL solution. I went backup up the trace file and found the following SQL was executed nearly 13.5 million times.
SELECT PERIODID
FROM PERIOD
WHERE TRUNC(STARTDATE) <= :B1 AND
TRUNC(ENDDATE) >= :B1;
It took me a little while to figure out where this was coming from but I eventually tracked it down to a view with a user defined function used to filter results for the current period. A few emails between the developers and myself and a simple view re-write resulted in a several orders of magnitude in improvement in response time. It was fun creating a new trace file after the rewrite and emailing out a difference report. It was a good day.