PDA

View Full Version : Server Maintenance/Upgrade - Friday 29th May



Snowman
29-05-2008, 11:32 AM
The server will be taken offline for a ram upgrade tomorrow night - Friday 29th May between 7pm and 8pm AEST

An extra 2Gb ram will be added to the server which will dramatically increase performance, particularly in peak times.

The maintenance should take no more than 10 minutes time.

Michael
30-05-2008, 07:12 PM
We have been supplied incorrect ram for the upgrade. We will take this up with the supplier and reschedule the upgrade for an evening next week.

I apologize for any inconvenience.

Snowman
03-06-2008, 05:07 PM
This work has been rescheduled for tomorrow morning - Wednesday 4th June between 9am and 10am

There should only be a few minutes interuption to service

Snowman
04-06-2008, 08:42 AM
Tanith is currently performing the ram upgrade and the server will be back online in a few minutes

Snowman
04-06-2008, 09:05 AM
The server is coming back online now.

Unfortunately the old ram does not like the new ram, despite being the same DDR ECC registered type. I believe the clock speed may be slightly different, so we will get a further 2Gb of the new ram sent into the datacentre and will schedule a further upgrade hopefully later this week

Snowman
04-06-2008, 09:46 AM
apologies for the delay, the server was brought back online and was responding to ping correctly but it appears services did not restart correctly

Its currently undergoing a second reboot to correct this and we are just waiting for an fsck to complete

Michael
04-06-2008, 10:22 AM
There is an issue with the server failing to correctly read the currently installed ram. We have a "hands on" tech at the DC reviewing this currently. I will keep you updated.

I apologize for the inconvenience.

Michael
04-06-2008, 10:51 AM
We have found the cause of the load issue. One of the new sticks of ram is faulty therefore the server is running on 1Gb of ram instead of 2Gb. We are replacing the new ram with the old ram shortly and will restore services ASAP.

Snowman
04-06-2008, 11:11 AM
We have taken the server offline again and Tanith has replaced the ram as it does appear one of the chips are faulty

the server is currently running a manual fsck to check file system integrity, as this mornings issues may have caused issues

Once this is complete the server will be back online and there should be no further issues

I do apologise for this mornings events.

if any client would prefer to move to a new server please open a ticket at the helpdesk and we can arrange this

Snowman
04-06-2008, 12:03 PM
As you will know we actually tried the ram upgrade the other night but the wrong ram was supplied.

We scheduled the upgrade for 9am and it should have only taken a few minutes to complete and should have caused little interruption, however the server jhardware didnt like the initial upgrade and then one of the ram sticks was found to be faulty.

because of the frequent restarts that occured this morning file system issues occured which has resulted in the requirement for us to have done several file system fsck checks to correct the errors.

if we had done this late at night while less clients may have been affected we would not have been able to rely on datacentre staff to assist us with the issues that have arisen, and without their swift assistance so far this issue may have been a lot longer than what it will end up being.

The problem lies around the choice of hardware 2 years ago, we were using Dual Xeons on hardware requiring very specific registered ECC DDR ram in matched pairs

we have now seen that this ram type on the then top of the line xeon server configuration is very problematic

we are still working on the server as its currently looping back with further fsck issues that have to be worked thru
<!-- / message --><!-- sig --> __________________

Snowman
04-06-2008, 12:41 PM
The server is now bck online and responding correctly

Tanith is arranging new replacement ram for the faulty ram and as soon as it arrives we will re-schedule it again and get this issue resolved once and for all, and have the server flying from then on

Thankyou to everyone for their patience