Friday, June 6, 2008

Light at the End of the Tunnel, Part I

This post should actually be titled "Upgrading Part IV", but my Roman math is not so good - I admit checking if I should write IV or VI - so I decided to stick to the simple numbers.

After bitching about it a lot, it's time I tell you how my inconsistent problem got solved.
This one was too much for me to solve on my own so I had an Oracle expert come over. After describing the issue I'm facing he said we should set up a trace on all client connections from the 8.0.6 home - we had to do this for all connections since we had no idea which concurrent will get stuck. It took some time before we realized that we had too small a pool of trace files and that trace files we were looking at didn't belong to a stuck concurrent anymore. After doing some re-configuration on the trace settings we finally found a trace file that belonged to a stuck concurrent.
Boy! The guy has patience. As someone on my team said, a good expert can be measured by the patience he demonstrates going over log and trace files.
Going over the entire trace file we discovered that at some point during connection establishment the process just stops, we could see that the client was waiting for some input and... nothing. Although it felt like closing on on the problem it seemed like a long and tedious process will be needed through a TAR, since however good the expert I had on site is, it's just not his specialty, and it definitely looked like some internal bug.
Before leaving, the expert noticed I had NTS defined in my sqlnet.ora file for SQLNET.AUTHENTICATION_SERVICES, I explained I was using it to logon as sysdba without entering a password. He suggested I try to remove it, so I did.
At this point we were towards the end of the day so I didn't expect any concurrents to get stuck anyway. But the next day the problem disappeared, I was thrilled, seems like the NTS setting was the problem all the way! Thinking about it, there's no chance I would've thought of it to be the root of all evil. I think the only reason the expert noticed it is because he's not used to dealing with installations on Windows, it seems to me that even he didn't believe it will solve anything and suggested it just as a shot in the dark.
Proper disclosure. The expert suggested some other changes as well. Those were changes to DB parameters as specified in an Oracle document regarding DB parameters for an EBS installations.
The changes had effect only on the next day (the day everything worked) as well, so I might be wrong and what really fixed the problem was altering those parameters. Since I'm curious and somewhat suicidal I'll probably have the NTS returned and check if the problem reoccurs, promise to let know if my intuition here is wrong.

The funny thing is that just after discovering this issue with NTS, I found another one, working on an entirely different thing. Try googling for "NTS sqlnet.ora".

Well, one problem down, one more to go. Probably will tell about it in my next post, although it's still not solved.

No comments: