Saturday, May 10, 2008

Upgrading Part III: Consistency

Well, I'm still battling problems after the database upgrade.
I think that the most annoying thing about the problems I'm facing is that they're not really consistent.

Of course, we'd all like a problem free world, but once we encounter a problem the first thing to do is to reproduce it, thus assuring ourselves we know what we're dealing with.
First you try to reproduce it on a development environment if that doesn't succeed, you at least expect some consistency on the production environment.
When you don't have this either you're in a trouble, you might do something one day and hope to see results on the next day and indeed the next day problem still occurs but in much lower rates and you think to yourself "I must be in the right direction" but then the next day business is as usual - the system is in a total mess. Then you understand you don't understand anything. The next step is to try and reproduce problem on your terms.
In my case the issue is with concurrents randomly getting stuck. So I first tried to see if the problem is with all executables or is EBS related, starting with simple cmd scripts running non-stop and going to submitting multiple concurrents using CONCSUB - at last one of my manually submitted concurrents hangs. But I got no real insights from that as well. No consistency whatsoever.
One of the main issues with inconsistent problems is that they're hard to describe - in a TAR for example - most directions for detecting the problem involve some major configuration problem, one that would prevent the system from working altogether, but remember, I don't have this kind of blessed consistency.
Another thing with inconsistent issues is that you come to work and hope things will get bad so you could poke the issue around a bit more and so you won't get that annoying voodoo feeling. Usually a sysadmin's hope is that nothing will get bad...

Actually, this week I had some progress, so hopefully the next post will contain a bit of advice that can potentially save a serious headache to someone if he stumbles upon this blog while trying to resolve a similar issue.

No comments: