 |
» |
|
|
 |
|  |  |
In order to run version B.03.32.00 of the ACC software, the
following patches or their superseding patches must be installed. PHNE_25642 - s700_800
11.11 cumulative ARPA Transport Patch PHNE_25084 - s700_800
11.11 Cumulative STREAMS Patch PHNE_25596 - s700_800
11.11 J2793B X.25 SX25-HPerf/SYNC-WAN Patch
The patches along with the additional information, can be
downloaded from http://itrc.hp.com/.
The patches are also available from ftp://hpatlse.atl.hp.com/hp-ux_patches/s700_800/11.X/. Defect Fixes in ACC Version B.03.32.00 |  |
This release fixes the following defects. JAGad28181 Symptoms: Frames larger
than 8220 bytes are discarded when received by the HDLC.FRAME layer
(level-1). Defect/Fix: The ACC
HDLC/LAP-B (ABM) User's Guide states that the
maximum frame size is 10066 bytes. The limit has been raised to
that value.
JAGad54596 Symptoms: When x25init is run against an X.25 link repeatedly, a DMA timeout
or firmware error may result. This error could also happen when
using other protocols such as HDLC-LAPB or HDLC-LAPD. Defect/Fix: During the
disable processing, a NULL pointer is incorrectly de-referenced
by the code in unstdt_build(). A check has been added to avoid this de-reference.
JAGab67173 Symptoms: zconfig() portsc request hangs if a card is not available. Defect/Fix: During a
card startup, if the startup fails, zmon leaves the card in whatever state it was in previously
(e.g. being Reset). This allows API requests to be accepted by LDM
and DAM to operate on a non-functioning mux. If the API request
is a zconfig/zport request, it is never completed and therefore it hangs
the calling program. The zmon restart mechanism has been modified to fix this defect.
JAGac77692 Symptoms: On systems
with unstable firmware/hardware (one that can startup, but then fails),
the card crashes and the ZMON restart mechanism can get confused.
This may result in an un-stoppable ZCOM subsystem, or a system panic. Manually
stopping the ZCOM subsystem when ZMON is busy in card restarts can
also cause this situation. Defect/Fix: The problem
is caused by the driver making card restart requests to ZMON, while
ZMON is still retrying card restart or card shutdown. These multiple
requests can confuse ZMON and hence leave the system in a unstable
state. Two changes have been made to stop the various problems. At the end
of a successful card "stop" or "restart", all
pending card restart requests are ignored. This prevents ZMON from being
swamped by the card requests. When a card "shutdown" request
competes and loses to a card "restart" request,
the "shutdown" request retries immediately. Normally,
it should win the IFT back and the other card restart requests will
give up immediately.
JAGad03287 Symptoms: Z7340A: zmntr mx display shows incorrect values for Txusd. Defect/Fix: During the
testing cycle of 8-Port PCI software and hardware, the zmntr mx command shows incorrect values for the TXusd field of each port. The value is in bytes, but does
not correctly reflect the configured buffer size. It uses a fixed buffer
size of 256 bytes, rather than the configured zbufsize. The number is most noticeably wrong when the buffer size is
set to a large value (e.g. zbufsize = 4096). You may need to set Unack-limit to a larger value (e.g. 50000) to see larger values
of TXusd.
JAGad05873 Symptoms: zmlog logs a null char in hyphen lines. Defect/Fix: This problem
does not affect the log visually. However, when the log file is being
filtered by the 'head' program, the extra null confuses 'head' and
the correct line can't be extracted properly. The code has been
changed to remove the extra NULL character from hyphen lines.
JAGad12116 Symptoms: On-line addition
of card fails when ZCOM is already started. Defect/Fix: When the
ZCOM subsystem is started, zmon builds a table of all installed ACC cards which includes
their type and hardware path. This table is used to verify that
the configured ACC card is actually present in the system prior
to a download. The download is not allowed to proceed if the configuration does
not match the physical hardware. With an on-line card addition after ZCOM is started, zmon always rejects the card download because the new card
does not exist in its table. To fix this problem, zmon has been modified to clear the flag indicating the hardware table has
been initialized whenever zmon receives a Interface Request Record (IRR) denoting an on-line card addition.
This causes zmon to rebuild its hardware table, therefore allowing the
download to proceed properly.
JAGad13610 Symptoms: zqmve() gives an error when moving messages between program ZLUs. Defect/Fix: When the
'MV' command in ZTERM is used, it gives the error: "Error on ZQMVE: Different nodes not allowed" when moving a message from one program
ZLU to another program ZLU. Both are local ZLUs.
JAGad13821 Symptoms: ZTERM's TX
behaves inconsistently when out of system buffers. Defect/Fix: In QIP test
LDM043, it uses ZTERM TX to send a large number of messages to exhaust all ZCOM
buffers (e.g. TX <ZLU> 1000 030000). In some systems,
ZTERM's TX got suspended - this is expected. But in some systems,
ZTERM's TX returned with a ZCOM error: "Error on ZSEND: Not enough system free buffers".
JAGad15652 Symptoms: Zmasterd takes up to 7 minutes to startup the ZCOM subsystem with
large I/O systems. Defect/Fix: During ZCOM
subsystem startup, ioscan is run to verify that the cards installed in the system
match those configured through the TTGEN configuration (.answ) file. Most of the time taken during startup is spent
waiting for ioscan to scan the backplane for hardware. The fix is not to
scan the physical backplane, but simple to retrieve the hardware
information from the kernel that was gathered at system bootup. The
code now schedules ioscan with the parameters of "-FkC acc" instead of just "-F".
JAGad15656 Symptoms: 'x25init' command fails with the following message in Nettl log,
when 200 SVCs are configured on all the 8 ports. N2Z: The zconfig() call to create a new L3 ZLU has unexpectedly failed with error -15. Unable to startup the X.25 line (Mux = 0, Port 7, Subc 0)! Defect/Fix: The problem
is caused by not having enough ZCOM Terminal entries configured
in the .answ file. The fix is to tune these parameters in .answ file and then start-up the ZCOM subsystem.
JAGad21198 Symptoms: System panics
with a Data Page Fault. Defect/Fix: The problem happened because
a NULL pointer was referenced. To fix this problem, a check has
been added if the pointer is NULL.
JAGad21596 Symptoms: ZTERM mistakenly
takes msg request code 11 (ZCOM_MRQCODE_DEL) as "port configuration" to
report zport return status (by x25stat API). The correct request code
should be 14 (ZCOM_MRQCODE_PORT). Defect/Fix: While doing
QIP API047 test, in an erroneous ZTERM PT command (ZPORT configuration),
the returned status was incorrectly reported. E.g., ZPORT status
11 is reported as "Invalid subchannel number" (which is the status for terminal configuration
request). ZPORT status 11 should be "Mode/baud rate incompatibility". zterm uses an incorrect request code in calling x25stat().
JAGad21615 Symptoms: System panics
with Spinlock Deadlock when running reliability tests. Defect/Fix: When the
system was running out of memory, it would cause nli2zcom to drop a data packet and issue a reset request on the
VC. This particular test caused many resets and reset confirms to
be issued during heavy data transfer loads. The nli2zcom driver showed that a reset request (or response) was
being sent to the DAM at the same time that an inbound event was
arriving from the DAM on the same card. The DAM held the IFT lock
and passed the event to the nli2zcom driver which found the N2Z lock already held. The N2Z
driver locked its spinlock and passed a request to the DAM which
found the IFT lock already held. Thus a classic spinlock deadlock occurred. The fix is to have the nli2zcom driver drop its lock when issuing a request to the DAM and
then reacquire its lock upon return from the DAM. This was done
for both the outbound Reset Request and Reset Response functions.
JAGad24633 Symptoms: zconfig PORTSC enable/disable subchannel has insufficient validation. Defect/Fix: The zconfig PORTSC for enabling/disabling subchannels did not check
for valid card type and valid subchannel numbers. The zconfig call to 2-port and 8-port cards returns ok as long as
the port number is in range. The zconfig call to 4-port card accepts all subchannel numbers, including
40. The LDM should reject the subchannel configuration (for 2,8-port
cards) and the illegal subchannel number for 4-port card with ZCOM
error: ZESUBCH (-49, Illegal subchannel number). All the above problems have been corrected.
JAGad31008 Symptoms: MPU activity
field in the zmntr mx command shows incorrect value. Defect/Fix: The zmntr utility can be used to check the status of an ACC MUX.
The command zmntr mx also shows the utilization of the processor on the ACC
MUX. With the development of the TRAU/GPRS protocol, the customer runs
some tests, especially load tests. But the MPU activity with the
use of the TRAU/GPRS protocol stays always at 0%. The problem occurred because the $STAT message in the DAM was given lower priority compared
to messages in Express Queue, High Priority Queue and Low Priority
queue. Because of this, the status always got updated after processing
messages in the queues. Hence the % activity always showed 0% because
the CPU already finished processing messages and is now idle. The
code change has been done such that the $STAT message is given higher priority than the messages
in the Express Queue, High Priority, and Low Priority Queue.
JAGad40862 Symptoms: The x25stat command needs enhancment to show "VC Up Time" for
all active virtual circuits. Defect/Fix: Provided -u option in the x25stat command so that the user can request the "VC
Up Time" for all active virtual circuits. A new ioctl has been added for this purpose. This option must be
used in conjunction with the -v option of the x25stat command.
JAGad56962 Symptoms: ACC firmware
does not free buffers when the terminal is deactivated and disabled. Defect/Fix: When messages
are sent down to an HDLC-LAPB terminal after enabling it, but before
the link is established, those messages are never completed or flushed
unless the link comes up. This leads to a variety of problems, including
card buffer limits being reached. The Level 2 state machine has been adjusted to make sure messages
are flushed after the first attempt at link establishment. After
the T1 x N2 SABM attempt, all messages are flushed and no further
messages are accepted by the protocol as the link down status is set.
JAGad66065 Symptoms: zmasterd stop causes system panic. Defect/Fix: The customer is
using ACC/ISDN cards with OTS and FTAM software for collecting information
from phone pbx. During the night transfers, the card is intermittently
hung and transfer is stalled. When trying to stop the card by the
MC/SG scripts, the stopping of zmasterd caused system panic. The panic is due to accessing a null pointer and trying to
do bcopy of a value accessed by that null pointer. Now the defect
has been fixed by checking if the pda->resp_qdb is NULL. Only if it is not NULL, bcopy is done.
JAGad66909 Symptoms: File /dev/n is getting created when executing zmasterd cold *.answ. Defect/Fix: Instead of
redirecting the output of ttgen to /dev/null it has been redirected to /dev/n ull. One extra space caused the problem. Hence /dev/n was getting created. This extra space has been removed
and now the output will be redirected to /dev/null.
JAGad67062 Symptoms: Kernel build
fails because the path of spinlock.h mentioned in zcomsys.h is incorrect. Defect/Fix: During kernel
build, bx25 header file (/usr/conf/acc/csihdw.h) includes zcomsys.h from /usr/conf/acc directory. zcomsys.h includes spinlock.h as follows: #ifdef _KERNEL #include <sys/spinlock.h> #endif |
This makes the compiler get spinlock.h file from /usr/include/sys/spinlock.h. /usr/include/sys becomes the relative base directory (not /usr/conf/acc). /usr/include/sys/spinlock.h has following lines: #ifdef _KERNEL_BUILD #include "../h/types.h" ... #endif |
The kernel build fails here since it is looking for /usr/include/sys/../h/spinlock.h which does not exist. The exact error indicated by the compiler
is: cpp: "/usr/include/sys/spinlock.h", line 15: error 4036: Can't open include file '../h/types.h'. The fix is to change zcomsys.h to: #include "../h/spinlock.h" |
instead of #include <sys/spinlock.h> |
JAGad67065 Symptoms: The zconfig.3x man-page
contained incorrect information about the DSC feature for the subchannels. Defect/Fix: The man-page
has been modified so that it contains the updated information for
supporting the DSC feature for subchannels.
JAGad67069 Symptoms: When an x25stat is done, the Max Frame size that is displayed is shown
in bits instead of bytes. Defect/Fix: n2z_upper.c has been changed such that the max frame size is displayed
in bytes instead of bits.
JAGad83768 Symptoms: Add support
for ACC PCI cards on Superdome systems. Defect/Fix: When running
ACC 3.30 with Z7340A cards on a Superdome system, zmasterd cold start fails, and the cards do not start up. zmon complains that there is no ACC card in the slot configured.
This is because an extra "0" was added to the
I/O path on Superdome systems. For example, if an N-class system
had an I/O path of 16/4/1/0, on a Superdome the same path would
look like "16/4/1/0/0". The problem is in the
function nacc1_get_interface_info() which looks at the path entries starting with the rightmost
element and working left. On a superdome system, this yielded a bus
address of 4:1 and a slot address of 0 instead of 16:4 and slot
address 1. The address parsing has been changed to start with the leftmost
element and then work right for the PCI bus. This should result
in consistent bus and slot addresses regardless of the target system
that supports the PCI bus.
JAGad87791 Symptoms: ACC card description
in ioscan output is insufficient. Defect/Fix: ACC card
description in "ioscan -fC acc" output needs to be more informative. The code
has been changed so that "ioscan -fC acc" output displays the card number and the description.
For instance, the "Description" field of "ioscan -fC acc" output is as follows: Description ============== Z7340A PCI bridge Z7340A 8-port serial ACC |
JAGad94415 Symptoms: ACC 3.xx sends
the calling address in call accept packet. Defect/Fix: ACC 3.xx
has a non NULL calling address in the call accept packet, which
is refused by some X.25 implementations. This problem has been fixed
by providing a tunable parameter, which can be set using /opt/acc/bin/n2z_cntrl. By default this flag is set to 0. So, the calling address
in call accept packet is 0 by default. If set to 1, the call address
packet contains the calling address.
JAGad96420 Symptoms: ioctls in n2z_cntrl fails with "Bad address" error. Defect/Fix: When trying
to set the tunable values using n2z_cntrl, it fails with "Bad address" error.
For example, n2z_cntrl -d0 will display "n2z_cntrl: Bad address". The reason is that the ioctls were called directly instead of using I_STR as the second parameter
of ioctl and strioctl structure as the third parameter. This has been fixed as
follows: Instead of calling ioctl as follows: ioctl(ssmfd, N2Z_CTS_CD_FLAG, &flag) |
ioctl is now be called by filling the strioctl structure and then calling: ioctl(ssmfd, I_STR,&strioctl) |
JAGad71268 Symptoms: The processing
in ZCOM API zconfig terminal clear/delete is incomplete. If this
feature is used, it may result in a memory leak in the ZCOM buffer
pool and/or a system panic. Defect/Fix: A new queue "pendg_txreq" was added to the PTT in order to support the DAM's
$DATA processing. This queue is used as a temporary holding place
for the selected tx requests while building a $DATA backplane transaction.
However, in zconfig terminal CLEAR & DELETE, this new queue
is not checked or cleared at all. This means the DAM may be actively
using this queue while the LDM is clearing or deleting the associated
terminal. The Proper codes are added in the LDM to ensure this queue
is also checked and cleared.
JAGad74130 Symptoms: This is an
enhancement to improve the diagnostic capability. Defect/Fix: There are 2 parts of it: Basic changes (zscan/zmntr) make them display
more formatted information make them work on the zcom.memory file and OS core file
LDM trace changes replace all
old LDM debug statements with trace mechanism (similar
to that used in the DAM)
A minor problem with the enhancement in zscan was found. It is fixed in JAGad69643. Should also include
the JAGad69643 changes.
JAGad51644/JAGab66327 Symptoms: The zcom process
can't open /dev/zmlog and fails with the reason "--ENOENT--"No such file or directory" which is displayed on the tty from which zmasterd
is run. The text of the diagnostics is as follows: zmlog: can't open ZCOM log: No such file or directoryzmlog: file name: /dev/zmlogzmlog: program aborted (exit code = 3) |
Defect/Fix: The sequence
of events leading to this problem is as follows. zmlog executes and successfully calls check_dev_file() which creates /dev/zmlog. zmon executes and also calls check_dev_file(). It then successfully calls unlink()(which remove dev/zmlog) because the device's major number does not appear to
be correct. zmlog executes and calls open() which fails because /dev/zmlog doesn't exist.
The reason behind the problem is that the major number appears
to have the wrong value due to artificial sign extension. Eliminating
the artificial sign extension solves the problem.
JAGad69876 Symptoms: Transmit messages
stop after ACC runs out of transmit buffers. Defect/Fix: When an ACC
card is heavily using its internal buffers, the DAM can, in rare situations, be
notified that the card is out of buffers for this subchannel or
port even though the driver does not have a single pending write
request on the card. This is an issue because the write completion
is what normally restarts the outbound data flow when the driver
goes into outbound flow control (e.g., the card says its out of
buffers). In this special case, the driver keeps track of the subchannel(s)
which are in this state and starts a timer to restart the outbound
I/O since there are no write completions to start the outbound I/O. After analysis of the problem, it appears that the subchannel
is not being placed back onto the transmit linked list when the
timer pops. When the condition arises where the cards rejects an write
request on a subchannel and the driver does not have any pending
write requests on that subchannel, the driver starts a timer. When
the timer pops, the driver finds all subchannels in this state and
attempts to restart any pending writes. The problem is that the driver is not placing the subchannel
back into the linked lists of subchannels with a pending write request.
Code has been added to insert the subchannel back into the transmit
linked list.
JAGab67070 Symptoms: No automatic
card restart after firmware failure. Defect/Fix: When a Z7200
mux is crashed deliberately (using zcbug, "ru 0 off ok"), the DAM detects the crash, but there is no
automatic card restart. During ZCOM startup, if there is a problem
with the mux $RSET, the DAM issues IRR (Interface Restart Requests) to
the IRR queue and a failure result is returned to ZMON. ZMON retries
the RESET for 3 times. This ends up with 3 IRRs being queued up.
When ZCOM is ready (of course, with a bad mux), ZMON receives 3
IRRs in a row and tried to restart the same mux. ZMON forks a child
to handle each IRR. So a total of 3 zmon restarting the same mux in parallel, end up with a "DAM has too many requests" error. The following changes have been made, controlled restart/shutdown to avoid
concurrent control to a mux provide info for selective retry when necessary
JAGae00972 Symptoms: ACC X.25 IP/X25 no
longer works after the second x25init. Defect/Fix: If multiple
X.25 links are used concurrently on a system, then the following symptoms
happen: 'IP over x25' works with the 'x25init -c config_file -a ip_to_x121_map' command, if it was issued before 'sx25d process' startup. If we restart the x.25 link with the same command
while sx25d is still running, then 'IP over X25' does not work while
all the svc and pvc operations for the link work normally. If we stop all the x.25 links via x25stop -K or x25stop -d individually, which causes sx25d process stop, and then with the above x25init command, IP over X.25 works.
In N2z_stop_link processing, we scan through all the VC structures and
cleanup all that are not in idle state. In this process, we check
to see if there is any stream associated with the VC. If so, stype
of stream context was set as N2Z_ERROR_STRM for the duration till the
x25 upper layers release the previous connection established by
ping. So, when x25init is done for the second time within this duration, x25init path identifies that VC stream context type as ERROR
stream and ping doesn't go through. The fix is to change the
N2Z_ERROR_STRM to N2Z_UNKNOWN_STRM, so that the ping will go through.
JAGad97577 Symptoms: When there
is a lot of opening and closing of X.25 sockets and the ACC kernel limit n2z_max_devs is reached, X.25 socket creation with a socket() call returns invalid errno 65535(-1). The following message is logged in nettl log: Network NS_LS_N2Z Error 2001, pid 486745 N2Z: Too many streams used! Increase the size of the kernel tunable n2z_max_devs in /stand/system and rebuild your kernel. Defect/Fix: This problem
has been noticed with PVCs and while detaching the PVC in N2Z_F_pvc_detach_up() routine, the VC Stream q_ptr field was made NULL. However in VC Stream close routine N2Z_F0_close(), if q_ptr is NULL, then the further closing and clean-up of the
VC stream was not complete, and hence VC Streams were not returned
to the free pool. This caused the driver to run out of available
VC Streams and the socket() call returns errno. 65535 after the kernel n2z_max_devs is reached. A new 'lhvcp' flag mask has been introduced. This flag is set during
PVC detach processing to indicate that VC Stream close is still
pending and thereby avoiding a need to set the VC Stream q_ptr to NULL. A code change has been made to check for this
flag and to reset the flag during 'lhvcp' allocation.
JAGad98715 Symptoms: Panic with
Data Page Fault with the following stack trace: panic+0x6c report_trap_or_int_and_panic+0x94 interrupt+0x208 $ihndlr_rtn+0x0 nacc1_cmplt_read+0x38 nacc2_complete_req+0x34a8 nacc2_end_io+0x580 nacc2_isr+0x126c sapic_interrupt+0x2c ... |
Defect/Fix: When stressing
the system by running the tests on 16 cards simultaneously on a
HP rp8400 server, we see the above panic. The panic is because, in nacc1_cmplt_read, the DMA receive queue is found to be NULL. Dereferencing
this NULL pointer caused the above sustem panic. The fix is to check if qhead is NULL. If it is NULL, we log a trace message and
return.
JAGad67070 Symptoms: System panics
with the Spinlock timeout failure. Defect/Fix: The Spinlock
timeout failure seems to be happening because the function N2z_Disable_ZLUs()
calls the zcntl() holding the SPINLOCK(glock). Because there are
no buffers available, the Zc_gosleep() is eventually called which
in turn calls sleep(). This is causing the spinlock timeout failure
to occur because the spinlock should not be held when the sleep()
is called. To fix this, the spinlock is released before calling
the zcntl() in function N2z_Disable_ZLUs().
JAGae04681 Symptoms: This is an
enhancement to diagnostic capabilities of the ACC Firmware. Tracing
the ACC firmware to determine events leading up to and causes of
problems has been difficult to do without specialized equipment.
A better ACC firmware tracing mechanism was needed. Defect/Fix: A more advanced
ACC firmware tracing environment was developed to enable on-demand,
and dynamic tracing of the ACC firmware, without the need forextra
equipment, such as a logic analyzer. This environment also enabled
the amount of trace points within the firmware code to be substantially
increased - since the new mechanism has a very low impact on the
ACC card's performance. The new trace mechanism allows the amount
of active tracing to be varied and tracing of individual modules
and protocols be selectively enabled and disabled dynamically. The
mechanism allows tracing to be captured from a running ACC card. Notes for usage: When gathering
ACC card firmware trace for any further problem analysis, it is recommended
to always use the raw option of 'fwtrace' utility as shown below. $ /opt/acc/bin/fwtrace > dump -raw > /tmp/trace.out |
|