Problem Resolution RCF Solaris & Panasas Servers

                                                                       Maurice Askinazi

 

 

 

***  Additions made to this doc for Panasas servers and accessing remote consoles via private network ***

 


To monitor the status of NFS servers, go to:
http://www.rhic.bnl.gov/RCF/UserInfo/Facilities/NFSService/Monitoring/Pong/index.html


To monitor the status of the NFS Filesystems, go to:
http://www.rhic.bnl.gov/RCF/UserInfo/Facilities/NFSService/Monitoring/nfs_filesystems.shtml/

 

To access a map of NFS Fileservers to NFS Filesystems, go to:
http://www.rcf.bnl.gov/BNLOnly/NFS/RCF_filesystem_locations.html

               

To access metrics on network load go to:
http://net2.rcf.bnl.gov/alarms/rmine.html

                

To access metrics on filesystem and NFS server load go to:
https://gcegang01.rcf.bnl.gov/NoAuth/NFSstats/

 

Machines:

RCF2, RNFS04
BRAHMS NFS    -    RMINE001
, 002, 003, 004
BRAHMS PANASAS    -    RPAN001
PHENIX NFS    -    RMINE201, 202, 203, 204
, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214
PHENIX  PANASAS    -    RPAN201, 203
PHOBOS NFS    -    RMINE401, 402
PHOBOS PANASAS    -    RPAN401
STAR NFS    -    RMINE601, 602, 603, 604
, 605, 606, 607, 608, 609, 610, 611, 612
STAR PANASAS    -    RPAN601, 603, 605, 607, 609

 

Primary machine duties:

RCF2 - Solaris general work machine.

RNFS04 – NFS server - User can't log onto.

NFS servers & Solaris data analysis machines - RMINE001,002,003, 004 ** RMINE201,202,203,204,

205,206,207, 208, 209, 210, 211, 212, 213, 214 ** RMINE401,402 ** RMINE601,602,603,604,605,

606,607, 608, 609, 610, 611, 612

Network storage appliance - RPAN001, 201, 203, 401, 601, 603, 605, 607, 609

 

*********************************************************************************************

Experiment "Home directory" servers - RMINE001, RMINE202, RMINE401, RMINE602

If an experiment home directory machine is unreachable, this should be considered an

extremely critical problem and handled with urgency.

*********************************************************************************************

 

Anticipated problems/Recommended course of action:

 

 

Servers or NFS Filesystems are slow and appear unreachable:
Access http://net2.rcf.bnl.gov/alarms/rmine.html and see if throughput for system is at or near 400Mb/s.
If so, the system is physically maxed out and should be left alone until load diminishes.

 

Machine is unreachable when trying to Logon (Interactive use)

 

1.      machine is down

Symptom: machine is not reachable by network, go to console and there is no response.

Action: call administrator {check if there’s power to the machine}

2.      machine is up, tcp is down

Symptom: machine is not reachable by network, go to console and there is response.

Action: See if you can log on. If not, call administrator. If yes, see if you can use network

going out.  

3.      network is down

Symptom: machine is not reachable by network, can’t reach any other machines

Action: make sure network administrator is aware.

 

 

NFS Filesystems are unreachable (from client machines)

 

Lookup which machine serves the storage, try to log on to it.  Filesystem list is provided

 

1.      machine is down

Symptom: machine is not reachable by network, go to console and there is no response.

Action: call administrator {check if there’s power to the machine}

2.      machine is up, tcp is down

Symptom: machine is not reachable by network, go to console and there is response.

Action: See if you can log on. If not, call administrator. If yes, see if you can use network

going out.

3.      network is down

Symptom: machine is not reachable by network, can’t reach any other machines

Action: make sure network administrator is aware.

4.      machine is up, networking is up, all filesystems are down

Raid hardware may be down – call administrator

5.      machine is up, networking is up, select filesystems are down.

Raid hardware may be down or filesystem is corrupted – call administrator

 

 

Panasas Filesystems are unreachable (from client machines)

 

            Lookup which Panasas server manages the filesystem.  Filesystem list is provided

 

            The panasas servers are not logged on to, they are accessed through a browser.

            Because of the RCF firewall only a few administrator desktop machines can get to

            the managaement interface from outside of the computer room. So you either have

            to work from the computer room, or log on to a machine behind the RCF firewall

            and start a browser there. I recommend system MTI00. You can start a browser

            by typing "firefox &"

 

            Once a browser is started,  access the server with https. ( example https://rpan609.rcf.bnl.gov)

            The status window doesn't require a login. To look at the logs or the  status of the volumes

            requires login. The username is "admin". You'll have to ask for the password.

 

            In the Panasas interface, there are tabs towards the top of the screen. The tab labeled "storage"

            is useful: Error messages relating to the filesystems are listed at the top, the filesystems are

            listed below. The right column in the filesystem table, tells the state. It should be "Online"

            If it's not, call an administrator.

 

 

Accessing the Remote System Consoles of NFS servers

 

            The console ports of  the SUN NFS servers were put on a private network that can only be

            accessed from systems mti00 or bay00. The older E450 servers are connected to a Baytech

            serial console switch. Typing "bay" on mti00 or bay00, will telnet to the Baytech console.

            There is no account, just enter the password. Choose the number machine from the list to

            access the server's console. When done, four semi colons in a row ";;;;" will exit the server

            console and return you to the list. The letter "t" will exit the baytech switch.

 

            Accessing the newer SUN servers, V480 and V240, does not require the Baytech switch.

            They have built-in remote system consoles. From mti00 or bay00 you can reach the

            system consoles by using menus that I've made there, or telneting directly to the system

            using the remote system console's address.

 

            racks, in front of the administrators desk in the computer room.

            the script "apc" has systems that are installed in apc racks, behind the administrators desk

            in the computer room.

            to the systems name. So to access the remote console port for rmine201, you should telnet to

            rmine201rsc.rcf.bnl.local. The login is not that of the system, but a separate operating system

            for the remote console. The login is "admin", you have to ask for the password. From this

            prompt, you have the ability to poweroff, poweron, send a break to the O/S, or access the console.

            Note: when accessing the V240 systems, access the console with "console -f"

            When done with the console, type "!."   (exclamation, period) to return to the remote console port.

            To exit the remote console port type "logout"

 

 

Update: March 20, 2006 - MJA