Saturday, May 24, 2008

Widening My Horizons

Well, I've decided to take a break from my series of upgrade related posts and write about something more fun.

This week I got a new 22'' (wide) monitor as a replacement for my old 17'' monitor. No doubt this is a great improvement, so many things look different now:
Most emails fit the screen without scrolling, I can accommodate more tabs in my Total Commander, I can comfortably rearrange my OneNote pages - it actually feels like each page is twice as wide, I can open two Terminal Services side by side and work with them in parallel, it's easier to read documents - less scrolling, and those are just the things I could think of on the spot after one week's work. I'm sure I'll discover many more advantages. Oh, the joy of having a new toy!

Of course, there's a downside as well(not that I really care, but...), I'm going to get spoiled.
I think it's relevant to many aspects in life, when you get a taste of a better something, you can't really go back to what you previously did just fine with - and yes, I was pretty satisfied with my 17'', although getting an occasional "man, how can you work with it?!" from guys with bigger monitors. And now I'm going to become such a guy.
It reminds of when I was a kid and played Warcraft (the 1st one of course). I played it on my computer and was just fine with it, but then I was at a friend's house and he had a faster PC, after playing a few levels on his PC mine seemed so sloooow, it was impossible to play.

My conclusion is that such improvements are good only if you know you won't have to revert back to the previous situation, otherwise it's going to take a long time getting used to and might even be frustrating.
I guess I'm going to demand a 22'' monitor to be included in every contract I sign from now on...

Saturday, May 10, 2008

Upgrading Part III: Consistency

Well, I'm still battling problems after the database upgrade.
I think that the most annoying thing about the problems I'm facing is that they're not really consistent.

Of course, we'd all like a problem free world, but once we encounter a problem the first thing to do is to reproduce it, thus assuring ourselves we know what we're dealing with.
First you try to reproduce it on a development environment if that doesn't succeed, you at least expect some consistency on the production environment.
When you don't have this either you're in a trouble, you might do something one day and hope to see results on the next day and indeed the next day problem still occurs but in much lower rates and you think to yourself "I must be in the right direction" but then the next day business is as usual - the system is in a total mess. Then you understand you don't understand anything. The next step is to try and reproduce problem on your terms.
In my case the issue is with concurrents randomly getting stuck. So I first tried to see if the problem is with all executables or is EBS related, starting with simple cmd scripts running non-stop and going to submitting multiple concurrents using CONCSUB - at last one of my manually submitted concurrents hangs. But I got no real insights from that as well. No consistency whatsoever.
One of the main issues with inconsistent problems is that they're hard to describe - in a TAR for example - most directions for detecting the problem involve some major configuration problem, one that would prevent the system from working altogether, but remember, I don't have this kind of blessed consistency.
Another thing with inconsistent issues is that you come to work and hope things will get bad so you could poke the issue around a bit more and so you won't get that annoying voodoo feeling. Usually a sysadmin's hope is that nothing will get bad...

Actually, this week I had some progress, so hopefully the next post will contain a bit of advice that can potentially save a serious headache to someone if he stumbles upon this blog while trying to resolve a similar issue.

Saturday, May 3, 2008

Upgrading Part II: Bad Job

As with any upgrade/major installation I've ever made (at least as far as I can remember), after upgrading the database to 10g the application have demonstrated some interesting errors. One of the problems is related to concurrents and the executables they spawn crashing in mid-air - promise to tell about this in more detail when I myself have any idea.
At some point I suspected that the problem was with custom code running cmd scripts. Since some of those scripts are called from within a PL/SQL code using a Java stored procedure I thought that maybe I should try and use a "less custom" way to do that.

Luckily (or unluckily), I just read the previous week about a new feature in 10g - the dbms_scheduler package that's supposed to replace dbms_job and to be much more powerful, for instance it enables you to run cmd scripts. So I thought I'll try this out since it sounds exactly like the built-in method I was looking for.
Well, I have only one thing I can say about this: it's better to leave a feature out of the release than to keep it in when it sucks, totally.

Really, my keyboard is still soaked with sweat from my efforts to run a single script that echoes some text.
I've already written about this but I guess the message didn't get through, if I'm supposed to start the Oracle Scheduler service to run jobs then that's exactly what I expect to be written in the error message I get when not doing so - certainly not a "file not found error". I also expect this to be written in any (well, at least some) documentation describing the new feature, that's not the kind of things one supposed to dig up only in forums. By the way, some hilarious problem with similar symptoms I read about while searching for a solution to my own issue - I've actually tried this out since at first I thought this was the problem I was experiencing - it appears that for some users (maybe in earlier 10g versions) just supplying a cmd script didn't work, they had to run cmd.exe with parameters. Can't even begin to understand how you manage to create this bug and release it.
Well, after completing the POC (if at a the cost of health) I got to the real thing running a script with parameters. In some document I saw something like the following example:

dbms_scheduler.create_job(job_name => 'JOB', job_type => 'EXECUTABLE',
job_action => 'script.cmd', number_of_arguments => n);
dbms_scheduler.set_job_argument_value(job_name => 'JOB',
argument_position => 1, argument_value => '...');
...
dbms_scheduler.set_job_argument_value(job_name => 'JOB',
argument_position => n, argument_value => '...');
dbms_scheduler.enable (name => 'JOB');
dbms_scheduler.run_job(job_name => 'JOB');

Well, maybe it's just my system that is a freak but apparently the enable procedure erases the job. Exactly, I create a job, I run dbms_scheduler.enable and no job in the table - no error message, no nothing, I might have run drop_job instead. Well, apparently I don't need that line anyway. After some more struggle with cryptic error messages I got the package to do what I wanted it to do, but that's really not good enough.
I can't even start to imagine what kind of efforts are needed to bootstrap all the advanced scheduling features - windows, chains etc..

I'm willing to bet money on the fact that most developers would have given up much earlier than I did saying this stuff just doesn't work. You can develop useful and cool new features all you like but if you can't cut/paste an example and see it just working no one would use it, if all users get when something is wrong are misleading error messages they'll just get frustrated (and you can see I am one such user) .
I first read about this feature in a "10g Top 20 New Features" document, and indeed it sounded great but if it's impossible to use it, it's not really a new feature at all.
I'm sometimes not sure if I should account all those funny errors I deal with to the fact that my system is on MS Windows - a less common platform, maybe I should. But that's not a good enough reason, if I'm in possession of a disk labeled 10.2.0.3 for Windows I expect it to work, I don't really mind having it released half a year later than the corresponding Linux version, I just want it to function.

As for me, I know about this feature, I can even make it work, but there's no chance I'll suggest it as a solution to any need unless as a last resort. Too bad.