I want to transfer 7 terabytes of data over a 1Gb ethernet network using TCP/IP.
Am I right in thinking that theoretically:
1 Gigabits / second = 1024*1024*1024 = 1073741824 bits/second = 133,217,728 bytes per second
Real network capacity is approx 80% of this, so a maximum of 107,374,182 bytes per second can be transferred. This "raw data" figure includes frame overhead, IPv4 overhead and TCP headers (more on this later).
Ok so far?
My 7TB of data is made up of 7 million files in 5 million directories. Assuming I plan to use "tar" to roll it all up and squirt over the network ...
tar cf - mydir | rsh remotehost tar xf -
Then 7TB in 7 million files makes an average file size of 1MB which is 2048 data blocks (of 512 bytes each - standard tar block size). So the amount of tar data is 7 million files x 2048 blocks + 7 million file header blocks plus 5 million directory blocks = 14,348,000,000 blocks + 7 million extra blocks (say one extra block per file) = 14,355,000,000 blocks = 7.349 TB of data
OK so far?
Googling around it seems that about 95% of the traffic sent over TCP/IP is "payload" with the other 5% being frame overhead, IPv4 headers and TCP headers, so to send my 7.349TB of real data I'll actually need to send 7.736TB of raw data.
7.736TB of raw data at 107,374,182 bytes/second will take 72,000 (approx) seconds = 20 hours
Any networking bods care to comment and tell me if this looks about right?
Thanks!
Am I right in thinking that theoretically:
1 Gigabits / second = 1024*1024*1024 = 1073741824 bits/second = 133,217,728 bytes per second
Real network capacity is approx 80% of this, so a maximum of 107,374,182 bytes per second can be transferred. This "raw data" figure includes frame overhead, IPv4 overhead and TCP headers (more on this later).
Ok so far?
My 7TB of data is made up of 7 million files in 5 million directories. Assuming I plan to use "tar" to roll it all up and squirt over the network ...
tar cf - mydir | rsh remotehost tar xf -
Then 7TB in 7 million files makes an average file size of 1MB which is 2048 data blocks (of 512 bytes each - standard tar block size). So the amount of tar data is 7 million files x 2048 blocks + 7 million file header blocks plus 5 million directory blocks = 14,348,000,000 blocks + 7 million extra blocks (say one extra block per file) = 14,355,000,000 blocks = 7.349 TB of data
OK so far?
Googling around it seems that about 95% of the traffic sent over TCP/IP is "payload" with the other 5% being frame overhead, IPv4 headers and TCP headers, so to send my 7.349TB of real data I'll actually need to send 7.736TB of raw data.
7.736TB of raw data at 107,374,182 bytes/second will take 72,000 (approx) seconds = 20 hours
Any networking bods care to comment and tell me if this looks about right?
Thanks!
Comment