Learn with Halim

Halim, a Georgia Tech graduate Senior Database Engineer/Data Architect based in Atlanta, USA, is an Oracle OCP DBA and Developer, Certified Cloud Architect Professional, and OCI Autonomous Database Specialist. With extensive expertise in database design, configuration, tuning, capacity planning, RAC, DG, scripting, Python, APEX, and PL/SQL, he combines technical mastery with a passion for innovation. Notably, Halim secured 16th place worldwide in PL/SQL Challenge Cup Playoff on the year 2010.

Tuesday, December 7, 2010

kkjcre1p: unable to spawn jobq slave process

why in alert log file showing this message

kkjcre1p: unable to spawn jobq slave process

Process J000 died, kkjcre1p: unable to spawn jobq slave process
================================================================

PROCESS J000 And M000 Die,

this might be related to a lack of OS-resources.

Using OSWatcher is a good tool to get more evidence of this.
or
use top/vmstat/prstat to check it.

Normally OSWatcher is taking snapshots every 30 seconds and in the above output it shows that several snapshots are missing. OSWatcher is using OS-commands like 'top' and 'vmstat'. So if they are having problems, than there is a serious problem on OS-level.

Another indication of the OS problems are the very high number of processes in the CPU Run and Blocked Q.

Alert log shows like belows-
------------------------------------------------
Process PN82 died
Tue Feb 22 16:42:29 2011
Process J000 died
Tue Feb 22 16:42:29 2011
kkjcre1p: unable to spawn jobq slave process
Tue Feb 22 16:42:29 2011
Errors in file /d04/admin/stlbas/bdump/stlbas_cjq0_1880.trc:

Tue Feb 22 16:42:30 2011
Process PN83 died
Tue Feb 22 16:42:32 2011
Process PN84 died
Tue Feb 22 16:42:33 2011
Process PN85 died
Tue Feb 22 16:42:34 2011
Process PN86 died
Tue Feb 22 16:42:35 2011
Process PN87 died
Tue Feb 22 16:42:36 2011
Process J000 died
Tue Feb 22 16:42:36 2011
kkjcre1p: unable to spawn jobq slave process
Tue Feb 22 16:42:36 2011
Errors in file /d04/admin/stlbas/bdump/stlbas_cjq0_1880.trc:

Tue Feb 22 16:42:37 2011
Process PN88 died
Tue Feb 22 16:42:40 2011

-----------------------------------------------

=======================
or
===================

"unable to spawn jobq slave process" this message can be show for
not correct setting of JOB_QUEUE_PROCESSES .

How to determine the correct setting for JOB_QUEUE_PROCESSES
============================================================

Solution
==============

The ideal setting for JOB_QUEUE_PROCESSES should be set to the maximum number of jobs that would ever be run concurrently on a system PLUS a few more.

1) Initially set JOB_QUEUE_PROCESSES higher than the most number of jobs that could ever run concurrently

Be careful not to set this too high as it can effect database performance or even exhaust OS resources

A fair estimate can be done by looking at DBA_JOBS ... NEXT_DATE and NEXT_SEC columns

The reason that this is a fair estimate is the JOB SCHEDULER submits one time execute jobs to the queue
for execution when it chooses to run a job ... so it is possible that the above estimate would be a good bit
off ... Example ... Running or UTLRP.sql ... on a system with multiple CPU's ... will cause he SCHEDULER
to submit multiple jobs ... at the same time ... to the job queue

NOTE: Add one more to your estimate for the job that will be created below

2) Use the job queue to monitor itself

a) Login as SYS

b) Create the table to do the monitoring

CREATE TABLE MAX_CONCURRENT_JOBS (
SAMPLE_DATE DATE,
JOBS_RUNNING NUMBER);

c) Create the job to do the monitoring

CREATE OR REPLACE PROCEDURE MAX_CONCURRENT_JOBS_MONITOR IS
BEGIN
INSERT INTO MAX_CONCURRENT_JOBS
SELECT SYSDATE, COUNT(*) FROM DBA_JOBS_RUNNING;
COMMIT;
END;
/

d) Create the job queue job to do the monitoring

DECLARE
v_jobno number;
BEGIN
DBMS_JOB.SUBMIT(
JOB => v_jobno,
WHAT => 'MAX_CONCURRENT_JOBS_MONITOR;',
NEXT_DATE => SYSDATE,
INTERVAL => 'SYSDATE + (5/1440)');
COMMIT;
END;
/

NOTE : The example above uses a 5 minute interval (5/1440) ... this can be adjusted as needed

e) Monitor MAX_CONCURRENT_JOBS

SELECT MAX(JOBS_RUNNING) FROM MAX_CONCURRENT_JOBS;

The length of time for monitoring depends on the INTERVAL in the job queue for ALL jobs ... It is recommended that at least two (2) executions of every job in the job queue be monitored with this process

3) Set JOB_QUEUE_PROCESSES to the appropriate setting

ALTER SYSTEM SET JOB_QUEUE_PROCESSES = SCOPE = ...

4) Remove the monitoring system

SELECT JOB FROM DBA_JOBS WHERE WHAT='MAX_CONCURRENT_JOBS_MONITOR;';

EXEC DBMS_JOB.REMOVE();

DROP PROCEDURE MAX_CONCURRENT_JOBS_MONITOR;

DROP TABLE MAX_CONCURRENT_JOBS;

NOTE : this process needs to be done periodically ... especially if new jobs are being added to the job queue

and also see here:
http://halimdba.blogspot.com/2010/12/solaris-error-32-broken-pipe-or-linux.html

Solaris Error: 32: Broken pipe or Linux Error: 32: broken pipe

Yesterday I found in alert log file bellows warning.
and In that time some clients informed us they are not able to log in
to the application and after some times they able to log in to
database automatically. so we tried to find out the cause and solution. see below...

It is a dedicated environment .

In parameter file
----------------------
processes = 4000

===========================================
Content of alert log file.
===========================================

Mon Dec 6 16:03:00 2010
Thread 1 advanced to log sequence 67690 (LGWR switch)
Current log# 3 seq# 67690 mem# 0: /d01/oracle/oradata/stlbas/redo03.log
Mon Dec 6 16:20:39 2010
Process J000 died, see its trace file
Mon Dec 6 16:20:39 2010
kkjcre1p: unable to spawn jobq slave process
Mon Dec 6 16:20:39 2010
Errors in file /d04/admin/stlbas/bdump/stlbas_cjq0_1885.trc:

Process J000 died, see its trace file
Mon Dec 6 16:20:45 2010
kkjcre1p: unable to spawn jobq slave process
Mon Dec 6 16:20:45 2010
Errors in file /d04/admin/stlbas/bdump/stlbas_cjq0_1885.trc:

Mon Dec 6 16:21:01 2010
Process J000 died, see its trace file
Mon Dec 6 16:21:01 2010
kkjcre1p: unable to spawn jobq slave process
Mon Dec 6 16:21:01 2010
Errors in file /d04/admin/stlbas/bdump/stlbas_cjq0_1885.trc:

Mon Dec 6 16:21:17 2010
Process J000 died, see its trace file
Mon Dec 6 16:21:17 2010
kkjcre1p: unable to spawn jobq slave process
Mon Dec 6 16:21:17 2010
Errors in file /d04/admin/stlbas/bdump/stlbas_cjq0_1885.trc:

Mon Dec 6 16:36:24 2010
Process J000 died, see its trace file
Mon Dec 6 16:36:24 2010
kkjcre1p: unable to spawn jobq slave process
Mon Dec 6 16:36:24 2010
Errors in file /d04/admin/stlbas/bdump/stlbas_cjq0_1885.trc:

Process J000 died, see its trace file
Mon Dec 6 16:36:30 2010
kkjcre1p: unable to spawn jobq slave process
Mon Dec 6 16:36:30 2010
Errors in file /d04/admin/stlbas/bdump/stlbas_cjq0_1885.trc:

Mon Dec 6 16:36:41 2010
Process J000 died, see its trace file
Mon Dec 6 16:36:41 2010
kkjcre1p: unable to spawn jobq slave process
Mon Dec 6 16:36:41 2010
Errors in file /d04/admin/stlbas/bdump/stlbas_cjq0_1885.trc:

Mon Dec 6 16:38:18 2010
Thread 1 advanced to log sequence 67691 (LGWR switch)
Current log# 4 seq# 67691 mem# 0: /d01/oracle/oradata/stlbas/redo04.log

====================================
contents in the trace file are -
====================================

/d04/admin/stlbas/bdump/XXXX_cjq0_1885.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining Scoring Engine
and Real Application Testing options
ORACLE_HOME = /d04/oracle/ora102
System name: SunOS
Node name:XXXX
Release: 5.10
Version: Generic_142900-07
Machine: sun4v
Instance name: XXXX
Redo thread mounted by this instance: 1
Oracle process number: 26
Unix process pid: 1885, image: oracle@XXXX (CJQ0)

*** SERVICE NAME:(SYS$BACKGROUND) 2010-12-01 13:24:30.248
*** SESSION ID:(2191.1) 2010-12-01 13:24:30.248
*** 2010-12-01 13:24:30.248
Process J000 is dead (pid=25006, state=3):
*** 2010-12-01 13:24:37.277
Process J000 is dead (pid=25012, state=3):
*** 2010-12-01 13:59:56.397
Process J000 is dead (pid=1185, state=3):
*** 2010-12-01 15:55:42.277
Process J000 is dead (pid=15686, state=3):
*** 2010-12-06 16:20:39.526
Process J000 is dead (pid=9458, state=3):
*** 2010-12-06 16:20:45.565
Process J000 is dead (pid=9480, state=3):
*** 2010-12-06 16:21:01.641
Process J000 is dead (pid=9518, state=3):
*** 2010-12-06 16:21:17.712
Process J000 is dead (pid=9550, state=3):
*** 2010-12-06 16:36:24.213
Process J000 is dead (pid=11942, state=3):
*** 2010-12-06 16:36:30.238
Process J000 is dead (pid=11970, state=3):
*** 2010-12-06 16:36:41.289
Process J000 is dead (pid=11998, state=3):

===============================================================
content of listener log file like bellows (in that time only)
================================================================

06-DEC-2010 16:16:21 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=APPLICATION-07)(USER=001zohur))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.36)(PORT=1771)) * establish * STLBAS * 0
06-DEC-2010 16:16:21 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\RWRBE60.exe)(HOST=APPLICATION-02)(USER=154shahadat))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.32)(PORT=1528)) * establish * STLBAS * 0
06-DEC-2010 16:16:21 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\RWRBE60.exe)(HOST=APPLICATION-07)(USER=043sathekur))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.36)(PORT=1758)) * establish * STLBAS * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Solaris Error: 32: Broken pipe
06-DEC-2010 16:16:21 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=APPLICATION-07)(USER=038alfee))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.36)(PORT=1772)) * establish * STLBAS * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Solaris Error: 32: Broken pipe
06-DEC-2010 16:16:22 * service_update * stlbas * 0
06-DEC-2010 16:16:24 * service_update * stlbas * 0
06-DEC-2010 16:16:25 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=APPLICATION-07)(USER=003aftab))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.36)(PORT=1794)) * establish * STLBAS * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Solaris Error: 32: Broken pipe
06-DEC-2010 16:16:25 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\RWRBE60.exe)(HOST=NEW-SUN-APP)(USER=023sohel))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.37)(PORT=4291)) * establish * STLBAS * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Solaris Error: 32: Broken pipe
06-DEC-2010 16:16:27 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=APPLICATION-07)(USER=038alfee))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.36)(PORT=1796)) * establish * STLBAS * 0
06-DEC-2010 16:16:27 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=APPLICATION-07)(USER=038alfee))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.36)(PORT=1797)) * establish * STLBAS * 0
06-DEC-2010 16:16:28 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\RWRBE60.exe)(HOST=APPLICATION-11)(USER=030salah))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.33)(PORT=3982)) * establish * STLBAS * 0
06-DEC-2010 16:16:29 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=stlbas)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=REPORTS_APP)(USER=504refayet))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.233)(PORT=4995)) * establish * stlbas * 0
06-DEC-2010 16:16:29 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=APPLICATION-07)(USER=019aporna))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.36)(PORT=1800)) * establish * STLBAS * 0
06-DEC-2010 16:16:29 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=stlbas)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=REPORTS_APP)(USER=504refayet))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.233)(PORT=4996)) * establish * stlbas * 0
06-DEC-2010 16:16:30 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=APPLICATION-07)(USER=003aftab))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.36)(PORT=1801)) * establish * STLBAS * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Solaris Error: 32: Broken pipe
06-DEC-2010 16:16:30 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=STLBAS)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=APPLICATION-07)(USER=019aporna))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.36)(PORT=1802)) * establish * STLBAS * 12518
TNS-12518: TNS:listener could not hand off client connection
TNS-12547: TNS:lost contact
TNS-12560: TNS:protocol adapter error
TNS-00517: Lost contact
Solaris Error: 32: Broken pipe
06-DEC-2010 16:16:30 * service_update * stlbas * 0
06-DEC-2010 16:16:30 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=stlbas)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=REPORTS_APP)(USER=501azad))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.233)(PORT=4997)) * establish * stlbas * 0
06-DEC-2010 16:16:31 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=stlbas)(CID=(PROGRAM=D:\OraNT\BIN\ifrun60.EXE)(HOST=REPORTS_APP)(USER=501azad))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.11.1.233)(PORT=4999)) * establish * stlbas * 0
06-DEC-2010 16:16:32 * service_update * stlbas * 0

What is the cause ?
and
why it shown "Solaris Error: 32: Broken pipe" ?

===================================================
===================================================
solutions:
===================================================
===================================================

1. listener log file is so big (greate than 2GB in linux)

2.

You have reached your max utilization of process parameter.

So increase processes and sessions parameter.

[ Each dedicated session creates process onto the server, consequently, you may overload your server itself. These parameters should not be set too large. you can think about shared server

Check CPU/Memory utilization of server before increasing the values .

Also you might have to increase other parameter also like SGA and PGA accordingly. So check their utilization also.

You will have to analyze database load during peak time and also find out the which processes are creating so many sessions. ]

SQL> select * from v$resource_limit ;

RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_AL LIMIT_VALU
------------------------------ ------------------- --------------- ---------- ----------
processes 1607 4000 4000 4000
sessions 1596 4005 4405 4405
enqueue_locks 1380 6280 57390 57390
enqueue_resources 563 2901 19600 UNLIMITED
ges_procs 0 0 0 0
ges_ress 0 0 0 UNLIMITED
ges_locks 0 0 0 UNLIMITED
ges_cache_ress 0 0 0 UNLIMITED
ges_reg_msgs 0 0 0 UNLIMITED
ges_big_msgs 0 0 0 UNLIMITED
ges_rsv_msgs 0 0 0 0
gcs_resources 0 0 0 0
gcs_shadows 0 0 0 0
dml_locks 43 718 19380 UNLIMITED
temporary_table_locks 0 3 UNLIMITED UNLIMITED
transactions 306 722 4845 UNLIMITED
branches 1 13 4845 UNLIMITED
cmtcallbk 2 4 4845 UNLIMITED
sort_segment_locks 1594 4451 UNLIMITED UNLIMITED
max_rollback_segments 136 361 4845 65535
max_shared_servers 0 0 UNLIMITED UNLIMITED
parallel_max_servers 1324 3600 2560 3600

22 rows selected.

or

SQL> select * FROM v$license;

========================
caution:
===========================

PROCESSES
=============================================
Property Description
Parameter type Integer
Default value 40 to operating system-dependent
Modifiable No
Range of values 6 to operating system dependent
Basic Yes

PROCESSES specifies the maximum number of operating system user processes that can simultaneously connect to Oracle. Its value should allow for all background processes such as locks, job queue processes, and parallel execution processes.

The default values of the SESSIONS and TRANSACTIONS parameters are derived from this parameter. Therefore, if you change the value of PROCESSES, you should evaluate whether to adjust the values of those derived parameters.

SESSIONS
================================================
Property Description
Parameter type Integer
Default value Derived: (1.1 * PROCESSES) + 5
Modifiable No
Range of values 1 to 231
Basic Yes
Real Application Clusters Multiple instances can have different values.

SESSIONS specifies the maximum number of sessions that can be created in the system. Because every login requires a session, this parameter effectively determines the maximum number of concurrent users in the system. You should always set this parameter explicitly to a value equivalent to your estimate of the maximum number of concurrent users, plus the number of background processes, plus approximately 10% for recursive sessions.

Oracle uses the default value of this parameter as its minimum. Values between 1 and the default do not trigger errors, but Oracle ignores them and uses the default instead.

The default values of the ENQUEUE_RESOURCES and TRANSACTIONS parameters are derived from SESSIONS. Therefore, if you increase the value of SESSIONS, you should consider whether to adjust the values of ENQUEUE_RESOURCES and TRANSACTIONS as well. (Note that ENQUEUE_RESOURCES is obsolete as of Oracle Database 10g release 2 (10.2).)

In a shared server environment, the value of PROCESSES can be quite small. Therefore, Oracle recommends that you adjust the value of SESSIONS to approximately 1.1 * total number of connections.

Sunday, December 5, 2010

viewing concurrent connections in oracle Database

viewing concurrent connections in oracle Database
------------------------------------------------

(Viewing Licensing Limits and Current Values)

select * FROM v$license;

SQL> desc v$license;
Name Null? Type
----------------------------------------- -------- ----------------
SESSIONS_MAX NUMBER
SESSIONS_WARNING NUMBER
SESSIONS_CURRENT NUMBER
SESSIONS_HIGHWATER NUMBER
USERS_MAX NUMBER
CPU_COUNT_CURRENT NUMBER
CPU_CORE_COUNT_CURRENT NUMBER
CPU_SOCKET_COUNT_CURRENT NUMBER
CPU_COUNT_HIGHWATER NUMBER
CPU_CORE_COUNT_HIGHWATER NUMBER
CPU_SOCKET_COUNT_HIGHWATER NUMBER

SQL>

SQL> SELECT RPAD (c.NAME || ':', 11)
|| RPAD (' current logons=' || (TO_NUMBER (b.sessions_current) - 1),
20)
|| 'cumulative logons='
|| RPAD (SUBSTR (a.VALUE, 1, 10), 10)
|| 'highwater mark='
|| b.sessions_highwater information
FROM v$sysstat a, v$license b, v$database c
WHERE a.NAME = 'logons cumulative' ;

---about number of users

SELECT COUNT(*) FROM dba_users;

---also session can see from here

select count(sid) from v$session

Tuesday, December 7, 2010

kkjcre1p: unable to spawn jobq slave process

Solaris Error: 32: Broken pipe or Linux Error: 32: broken pipe

Sunday, December 5, 2010

viewing concurrent connections in oracle Database

My Blog List