US ATlas: Moving Dcache pool node
7/2/2008
Wed Jul 2 16:06:35 EDT 2008
This item has been posted to usatlas-users-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov
Summary: 12 Dcache pool nodes moving to a new network switch
Duration: 4:15PM EDT to 4:30PM EDT
Group Responsible: BNL Dcache
Affected Area: BNL Dcache service
Expected User Impact: None
Maintenance Type: "Transparent"
Submitted By: Shigeki Misawa misawa@bnl.gov
Description: Additional Dcache pool nodes will be moved to a new network switch. Nodes will be moved one at at time and each node will be off line for about 10-30 seconds. No service glitches or interruptions are expected.
US Atlas: BNL Dcache system offline Tuesday July 8
7/2/2008
Wed Jul 2 15:53:27 EDT 2008
This item has been posted to usatlas-users-l@lists.bnl.gov, usatlas-computing-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch
Summary:
BNL Dcache will be offline (unavailable) all day
on Tuesday July 8
Duration:
9:00AM EDT - 5:00PM EDT on Tuesday July 8
Group Responsible:
BNL Dcache Group
Affected Area:
All BNL Dcache service
Expected User Impact:
Stop of production, data tranfer, user analysis at BNL
Maintenance Type:
Downtime
Submitted By:
Shigeki Misawa (misawa@bnl.gov)
Description:
BNL Dcache will be unavailable all day tuesday for
maintenance work. We will be doing the following:
1) Re-establishing backups of Dcache metadata
2) Upgrading the PNFS server (hardware)
More memory, more cpu cores, faster cores.
3) Moving PNFS backend Postgres database to
external RAID storage
4) Upgrading Dcache to the latest version
5) Upgrading the backend Postgres database to
Postgres 8.3.3
6) Changing the Postgres backup mechanism
US Atlas SSH, Samba, Interactive, and Web servers to be updated
7/2/2008
Wed Jul 2 15:44:15 EDT 2008
This item has been posted to usatlas-users-l@lists.bnl.gov
To all,
The US Atlas SSH gateways, Samba,
interactive, and web servers will be updated with
the latest kernel and system software security
patches next week. Since the kernel is being
updated, each system will need to be rebooted.
Each system will be unavailable for the time it
takes to perform a reboot. The list of systems
to be updated and their scheduled reboot times is:
Monday, July 7, 2008, shortly after 08:00 EDT:
atlasgw01.bnl.gov (SSH gateway),
asmb00.bnl.gov (Samba Server)
Tuesday, July 8, 2008, shortly after 08:00 EDT:
atlasgw00.bnl.gov (SSH gateway)
Wednesday, July 9, 2008, shortly after 08:00 EDT:
All US Atlas publicly accessible web sites
Thursday, July 10, 2008, shortly after 08:00 EDT:
atlas00.usatlas.bnl.gov (US Atlas interactive server),
rt.racf.bnl.gov (RT ticket system)
John M. (mccarthy@bnl.gov)
RHIC SSH, Samba, Interactive, Web, and Mail servers to be updated
7/2/2008
Wed Jul 2 15:39:35 EDT 2008
This item has been posted to rhic-rcf-l@lists.bnl.gov
To all,
The RHIC SSH gateways, Samba, interactive,
web, and mail servers will be updated with the
latest kernel and system software security patches
next week. Since the kernel is being updated,
each system will need to be rebooted. Each system
will be unavailable for the time it takes to
perform a reboot. The list of systems to be
updated and their scheduled reboot times is:
Monday, July 7, 2008, shortly after 08:00 EDT:
rssh04.rhic.bnl.gov (SSH gateway),
rssh03.rhic.bnl.gov (SSH gateway),
rsmb00.rhic.bnl.gov (Samba server),
www4.rcf.bnl.gov (RHIC user web server)
Tuesday, July 8, 2008, shortly after 08:00 EDT:
rssh02.rhic.bnl.gov (SSH gateway),
rssh01.rhic.bnl.gov (SSH gateway),
rcf.rhic.bnl.gov (RHIC mail server),
webmail.rhic.bnl.gov (RHIC web-mail server)
Wednesday, July 9, 2008, shortly after 08:00 EDT:
All RHIC publicly accessible web sites,
www.phenix.bnl.gov (Phenix web server)
Thursday, July 10, 2008, shortly after 08:00 EDT:
rcf2.rhic.bnl.gov (RHIC interactive server),
rt.racf.bnl.gov (RT ticket system)
John M. (mccarthy@bnl.gov)
US ATlas: Moving Dcache pool node
7/2/2008
Wed Jul 2 11:53:14 EDT 2008
This item has been posted to usatlas-users-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov
Summary:
12 Dcache pool nodes moving to a new network
switch
Duration:
12:00PM EDT to 12:10PM EDT
Group Responsible:
Dcache
Affected Area:
Dcache service
Expected User Impact:
None
Maintenance Type:
"Transparent"
Submitted By:
Shigeki Misawa misawa@bnl.gov
Description:
12 Dcache pool nodes will be moved to a new
network switch. Nodes will be moved one at
at time and each node will be off line for
about 10-30 seconds. No service glitches
or interruptions are expected.
US Atlas: Atlasnfs02 back online
7/1/2008
Tue Jul 1 13:15:36 EDT 2008
This item has been posted to usatlas-users-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov
Summary:
Atlasnfs02 is back online.
Duration:
12:50PM EDT
Group Responsible:
Atlas NFS
Affected Area:
Selected Atlas NFS file systems.
Expected User Impact:
Access restored to selected Atlas NFS file systems
Maintenance Type:
Service Interruption/Maintenance
Submitted By:
Shigeki Misawa misawa@bnl.gov
Description:
Atlasnfs02 is back on line after a system hang.
File systems that were affected were:
/usatlas/groups/sm,
/usatlas/groups/susy,
/usatlas/groups/tracking,
/usatlas/groups/calo,
/usatlas/dial,
/usatlas/ada_sw,
/usatlas/scratch2,
/usatlas/groups/higgs,
/usatlas/groups/exotics,
/usatlas/workarea,
/usatlas/OSG
We believe that the problem was caused by a
bug in the OS triggered by NFSv3 readdirplus
calls. System has been patched with the latest AIX
Service Pack which fixes this problem.
Opportunity was taken to increase system memory
from 4GB to 20GB to allow more space for file
system cache.
Rebooting rafs11 hoping to fix volume problem
7/1/2008
Tue Jul 1 12:55:51 EDT 2008
This item has been posted to rhic-rcf-l@lists.bnl.gov
Summary:
Rebooting rafs11 hoping to fix volume problem
Duration:. There is a problem with star.cvs
Group Responsible:
GCE
Affected Area:
AFS services
Submitted By:
Morris Strongson
US Atlas: Atlasnfs02 off line
7/1/2008
Tue Jul 1 10:36:31 EDT 2008
This item has been posted to usatlas-users-l@lists.bnl.gov
Summary:
Atlasnfs02 is off line
Duration:
10:34am
Group Responsible:
Atlas NFS
Affected Area:
Selected Atlas NFS directories
Expected User Impact:
/ustlas/groups, and other NFS directories.
(Note atlas home directories are on a
separate server)
Maintenance Type:
"Service Interruption"
Submitted By:
Shigeki Misawa misawa@bnl.gov
Description:
Loss of NFS service from atlasnfs02. We are
investigating. No time line to resolution
Update of PHENIX dCache SRM server COMPLETE
7/1/2008
Tue Jul 1 10:34:31 EDT 2008
This item has been posted to rhic-rcf-l@lists.bnl.gov
The PHENIX dCache SRM has been updated and restarted. Please report any problems via the RT StorageManagement queue.
Thanks,
Ofer
Update of PHENIX dCache SRM server
6/30/2008
Mon Jun 30 16:35:26 EDT 2008
This item has been posted to rhic-rcf-l@lists.bnl.gov
Summary:
phnxsrm will be rebooted to apply OS security updates and bug fixes
Duration:
7/1 10am-11am EST
Group Responsible:
Storage Management
Affected Area:
PHENIX dCache SRM services only
Expected User Impact:
Connections to PHENIX dCache SRM will be temporarily unavailable.
Maintenance Type:
Service interruption
Submitted By:
Ofer Rind, rind@bnl.gov
US Atlas lxr web server update completed
6/30/2008
Mon Jun 30 14:08:50 EDT 2008
This item has been posted to usatlas-computing-l@lists.bnl.gov
To all,
The update of the US Atlas lxr web server,
alxr.usatlas.bnl.gov (reserve02.usatlas.bnl.gov)
has been completed successfully. The system is now
available.
John M.
US Atlas: Dcache maintenance cancelled
6/30/2008
Mon Jun 30 11:35:35 EDT 2008
This item has been posted to racf-wlcg-announce-l@lists.bnl.gov, usatlas-users-l@lists.bnl.gov, usatlas-ddm-l@lists.bnl.gov, usatlas-grid-l@lists.bnl.gov, usatlas-prodsys-l@lists.bnl.gov, atlas-project-adc-operations@cern.ch
Summary:
Cancellation of Dcache maintenance on July 1.
Group Responsible:
BNL Dcache
Affected Area:
Dcache
Expected User Impact:
None
Maintenance Type:
Submitted By:
Shigeki Misawa misawa@bnl.gov
Description:
Dcache backup resynchronization and version
upgrade scheduled for July 1 has been cancelled
because of technical problems. Maintenance will
be rescheduled for a later date.
US Atlas lxr web server to be updated
6/30/2008
Mon Jun 30 11:23:19 EDT 2008
This item has been posted to usatlas-computing-l@lists.bnl.gov
To all,
The US Atlas lxr web server, alxr.usatlas.bnl.gov
(reserve02.usatlas.bnl.gov) will be updated with
the latest kernel and system software security
patches today June 30, 2008, at 14:00 EDT.
Since the kernel is being updated the system will
need to be rebooted. The system will be
unavaliable for the time it takes to perform a
reboot.
John M. (mccarthy@bnl.gov)