Problem
Resolution RCF Solaris & Panasas Servers
Maurice Askinazi
*** Additions made to this doc for Panasas servers and accessing remote consoles via private network ***
To
monitor the status of NFS servers, go to:
http://www.rhic.bnl.gov/RCF/UserInfo/Facilities/NFSService/Monitoring/Pong/index.html
http://www.rhic.bnl.gov/RCF/UserInfo/Facilities/NFSService/Monitoring/nfs_filesystems.shtml/
To
access a map of NFS Fileservers to NFS Filesystems, go to:
http://www.rcf.bnl.gov/BNLOnly/NFS/RCF_filesystem_locations.html
To access metrics on network load go to:
To access metrics on filesystem and NFS
server load go to:
https://gcegang01.rcf.bnl.gov/NoAuth/NFSstats/
Machines:
RCF2, RNFS04
BRAHMS NFS - RMINE001, 002, 003, 004
BRAHMS PANASAS -
RPAN001
PHENIX NFS - RMINE201, 202, 203, 204, 205, 206, 207, 208,
209, 210, 211, 212, 213, 214
PHENIX PANASAS -
RPAN201, 203
PHOBOS NFS - RMINE401, 402
PHOBOS PANASAS -
RPAN401
STAR NFS - RMINE601, 602, 603, 604, 605, 606, 607,
608, 609, 610, 611, 612
STAR PANASAS -
RPAN601, 603, 605, 607, 609
Primary machine duties:
RCF2
- Solaris general work machine.
RNFS04
– NFS server - User can't log onto.
NFS servers & Solaris data analysis machines - RMINE001,002,003, 004 ** RMINE201,202,203,204,
205,206,207, 208, 209, 210, 211, 212, 213, 214 **
RMINE401,402
** RMINE601,602,603,604
Network storage appliance - RPAN001, 201, 203, 401, 601, 603, 605, 607, 609
*********************************************************************************************
Experiment "Home directory" servers - RMINE001, RMINE202, RMINE401, RMINE602
If an experiment home directory machine is unreachable, this should be considered an
extremely critical problem and handled with urgency.
*********************************************************************************************
Anticipated problems/Recommended course of action:
Servers or NFS Filesystems are slow and appear unreachable:
Access
If so, the system is physically maxed out and should be left alone until load
diminishes.
Machine is unreachable when trying to Logon (Interactive use)
1.
machine is down
Symptom: machine is not reachable by network, go to
console and there is no response.
Action: call administrator {check if there’s power
to the machine}
2.
machine is up, tcp is down
Symptom: machine is not reachable by network, go to
console and there is response.
Action: See if you can log on. If not, call
administrator. If yes, see if you can use network
going out.
3.
network is down
Symptom: machine is not reachable by network, can’t reach
any other machines
Action: make sure network administrator is aware.
NFS Filesystems are unreachable (from client machines)
1.
machine is down
Symptom: machine is not reachable by network, go to
console and there is no response.
Action: call administrator
{check if there’s power to the machine}
2.
machine is up, tcp is down
Symptom: machine is not reachable by network, go to
console and there is response.
Action: See if you can log
on. If not, call administrator. If yes, see if you can use network
going out.
3.
network is down
Symptom: machine is not reachable by network, can’t
reach any other machines
Action: make sure network administrator is aware.
4.
machine is up, networking is up, all filesystems are
down
Raid hardware may be down – call administrator
5.
machine is up, networking is up, select filesystems
are down.
Raid hardware may be down or filesystem is corrupted – call administrator
Panasas Filesystems are unreachable (from client machines)
Lookup which Panasas server manages the filesystem. – Filesystem list is provided
The panasas servers are not logged on to, they are accessed through a browser.
Because of the RCF firewall only a few administrator desktop machines can get to
the managaement interface from outside of the computer room. So you either have
to work from the computer room, or log on to a machine behind the RCF firewall
and start a browser there. I recommend system MTI00. You can start a browser
by typing "firefox &"
Once a browser is started, access the server with https. ( example https://rpan609.rcf.bnl.gov)
The status window doesn't require a login. To look at the logs or the status of the volumes
requires login. The username is "admin". You'll have to ask for the password.
In the Panasas interface, there are tabs towards the top of the screen. The tab labeled "storage"
is useful: Error messages relating to the filesystems are listed at the top, the filesystems are
listed below. The right column in the filesystem table, tells the state. It should be "Online"
If it's not, call an administrator.
Accessing the Remote System Consoles of NFS servers
The console ports of the SUN NFS servers were put on a private network that can only be
accessed from systems mti00 or bay00. The older E450 servers are connected to a Baytech
serial console switch. Typing "bay" on mti00 or bay00, will telnet to the Baytech console.
There is no account, just enter the password. Choose the number machine from the list to
access the server's console. When done, four semi colons in a row ";;;;" will exit the server
console and return you to the list. The letter "t" will exit the baytech switch.
Accessing the newer SUN servers, V480 and V240, does not require the Baytech switch.
They have built-in remote system consoles. From mti00 or bay00 you can reach the
system consoles by using menus that I've made there, or telneting directly to the system
using the remote system console's address.
Using menus - the script "rsc" has a menu to access systems that are installed in shark
racks, in front of the administrators desk in the computer room.
the script "apc" has systems that are installed in apc racks, behind the administrators desk
in the computer room.
Using telnet - the convention for naming a server's remote console port was to append "rsc"
to the systems name. So to access the remote console port for rmine201, you should telnet to
rmine201rsc.rcf.bnl.local. The login is not that of the system, but a separate operating system
for the remote console. The login is "admin", you have to ask for the password. From this
prompt, you have the ability to poweroff, poweron, send a break to the O/S, or access the console.
Note: when accessing the V240 systems, access the console with "console -f"
When done with the console, type "!." (exclamation, period) to return to the remote console port.
To exit the remote console port type "logout"
Update: March 20, 2006 - MJA