So as a lesson for those of you that have never been around Mechanical and Engineering plants:
Data centres are graded using tiers 1-4+ (4+ having guns on the door.) Most data centres in the UK would be around the level of 3. Amongst the other things tier 3 requires the DC is provided power from diverse routes to the rack so that starts at a power sub station and runs all the way to the building into separate sub rooms then into Power Distribution Units then finally to the rack. In the rack there will be two power bars (plug sockets to you) these allow the machines that are placed in the rack to be fed from the two separate sub stations.
One of the power bars will also be hooked up to a UPS battery room. The UPS is not there to actually run the data centre it is there to smooth the power spike when the main power fails and the generators kick in. This is almost instant but not quick enough to stop a server from crashing. So the UPS can be thought of as a giant capacitor. Once the Generators kick in they will charge the batteries and keep the machines up. There will be separate generators for the chillers and plant so every thing is covered and can be ran like this indefinitely.
So if Joe Contractor is going to be an arse and break something he would need to power down the UPS (if that is even possible) Lock the Generators to OFF then wander off to two different PDU racks and switch them off... the whole process starts to look a lot like taking the death star down
Now given that a DC's job is to stay up there is a very stringent work process that makes any contractor that is doing M & E fill out statements of work and an associated risk management statement that explains how they will protect the up time while working.
The contractor cannot possibly think that they are taking both power feeds down at once because the statement of work explains exactly what they will be doing and how.
So no one pulled out a plug! Its not possible. the DC would be fully liable for the work and they won't take that sort of risk...
Now that the myth of someone un plugged me and I broke is dealt with lets talk about data replication....
Big distributed systems use something called data synchronisation that can be used to keep copies of the database in synchronisation at many places indeed. Databases have become so good at doing this EVEN Microsoft SQL server can be set up to create geographical load balanced database clusters that can be ran as live on every node and instantly merge the data between the nodes. The safest method is to have your local application server write to the far end first so that it can always know the transaction made it out the DC before it was committed to the database. That write can commit in many places at once an is also written to a transaction log in case the whole lot dies and it needs to be re-written from the logs. The whole process is almost bullet proof. I am a total and complete sod and even I have been unable to make it break when asked to.
So for Willies story to be true: Whoever is running this would need to have plugged all the servers on to only one PDU socket. Set up both sides of the clusters in the same place then deleted the transaction logs so that they couldn't auto rectify all of which points to a stupid bunch of muppets that has no understanding of basic mechanics or cluster layout. So yes my money is completely on TCS being in the frame for this...
Data centres are graded using tiers 1-4+ (4+ having guns on the door.) Most data centres in the UK would be around the level of 3. Amongst the other things tier 3 requires the DC is provided power from diverse routes to the rack so that starts at a power sub station and runs all the way to the building into separate sub rooms then into Power Distribution Units then finally to the rack. In the rack there will be two power bars (plug sockets to you) these allow the machines that are placed in the rack to be fed from the two separate sub stations.
One of the power bars will also be hooked up to a UPS battery room. The UPS is not there to actually run the data centre it is there to smooth the power spike when the main power fails and the generators kick in. This is almost instant but not quick enough to stop a server from crashing. So the UPS can be thought of as a giant capacitor. Once the Generators kick in they will charge the batteries and keep the machines up. There will be separate generators for the chillers and plant so every thing is covered and can be ran like this indefinitely.
So if Joe Contractor is going to be an arse and break something he would need to power down the UPS (if that is even possible) Lock the Generators to OFF then wander off to two different PDU racks and switch them off... the whole process starts to look a lot like taking the death star down
Now given that a DC's job is to stay up there is a very stringent work process that makes any contractor that is doing M & E fill out statements of work and an associated risk management statement that explains how they will protect the up time while working.
The contractor cannot possibly think that they are taking both power feeds down at once because the statement of work explains exactly what they will be doing and how.
So no one pulled out a plug! Its not possible. the DC would be fully liable for the work and they won't take that sort of risk...
Now that the myth of someone un plugged me and I broke is dealt with lets talk about data replication....
Big distributed systems use something called data synchronisation that can be used to keep copies of the database in synchronisation at many places indeed. Databases have become so good at doing this EVEN Microsoft SQL server can be set up to create geographical load balanced database clusters that can be ran as live on every node and instantly merge the data between the nodes. The safest method is to have your local application server write to the far end first so that it can always know the transaction made it out the DC before it was committed to the database. That write can commit in many places at once an is also written to a transaction log in case the whole lot dies and it needs to be re-written from the logs. The whole process is almost bullet proof. I am a total and complete sod and even I have been unable to make it break when asked to.
So for Willies story to be true: Whoever is running this would need to have plugged all the servers on to only one PDU socket. Set up both sides of the clusters in the same place then deleted the transaction logs so that they couldn't auto rectify all of which points to a stupid bunch of muppets that has no understanding of basic mechanics or cluster layout. So yes my money is completely on TCS being in the frame for this...
Comment