• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Network speed - sanity check please

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Network speed - sanity check please

    I want to transfer 7 terabytes of data over a 1Gb ethernet network using TCP/IP.

    Am I right in thinking that theoretically:

    1 Gigabits / second = 1024*1024*1024 = 1073741824 bits/second = 133,217,728 bytes per second

    Real network capacity is approx 80% of this, so a maximum of 107,374,182 bytes per second can be transferred. This "raw data" figure includes frame overhead, IPv4 overhead and TCP headers (more on this later).

    Ok so far?

    My 7TB of data is made up of 7 million files in 5 million directories. Assuming I plan to use "tar" to roll it all up and squirt over the network ...

    tar cf - mydir | rsh remotehost tar xf -

    Then 7TB in 7 million files makes an average file size of 1MB which is 2048 data blocks (of 512 bytes each - standard tar block size). So the amount of tar data is 7 million files x 2048 blocks + 7 million file header blocks plus 5 million directory blocks = 14,348,000,000 blocks + 7 million extra blocks (say one extra block per file) = 14,355,000,000 blocks = 7.349 TB of data

    OK so far?

    Googling around it seems that about 95% of the traffic sent over TCP/IP is "payload" with the other 5% being frame overhead, IPv4 headers and TCP headers, so to send my 7.349TB of real data I'll actually need to send 7.736TB of raw data.

    7.736TB of raw data at 107,374,182 bytes/second will take 72,000 (approx) seconds = 20 hours

    Any networking bods care to comment and tell me if this looks about right?

    Thanks!

    #2
    Any other users on the network during the transfer?

    Has the network topology been considered?
    "Never argue with stupid people, they will drag you down to their level and beat you with experience". Mark Twain

    Comment


      #3
      Your numbers stack up except:

      7.736 terabytes == 8505821952475 bytes

      So the theoretical transfer time at 107374182 bytes/second is 22hrs.

      I'm not a networking type, this is just running the numbers.
      Scooterscot is right other networking factors may have an effect here. The other thing to consider is will performing a tar on the files introduce another overhead? I expect it will for millions of files but I don't know how to quantify that.
      Moving to Montana soon, gonna be a dental floss tycoon

      Comment


        #4
        Cant really be calculated like that as it totally depends on the topology, what kit is in place, what other traffic etc etc etc as has been said.

        You're realistically not going to get anywhere near the max speeds of a gigabit network as those are only theoretical.

        Best bet is to copy a subset of the data, time it and scale up accordingly.

        Comment


          #5
          I have used this to move a database from one machine to another.

          2 things to watch out for:

          Lots of small files slooooows the whole thing down

          Tar will error on files larger than 4 GB and not transfer them. You need to trap the error(s) and move them manually - not forgetting to change the ownership and so on.
          "All around me I see chaos and confusion, my work here is done...."

          Comment


            #6
            You didn't say whether the files were compressed or not.

            I'd suggest compressing the output of tar as in:

            tar cf - mydir | gzip | rsh remotehost 'gzip | tar xf -'

            or something similar.

            --Jatinder

            Comment


              #7
              Thanks everyone for the comments.

              I'm told that the network link between the two machines will be dedicated with no other traffic. As for other topology issues, I assume you mean whether the traffic will be routed, switched, whatever, then I just don't know.

              I take the point about compression, but I'm wondering if the CPU overhead of that might then become the bottleneck instead of the network. This would need to be tested beforehand to see if the compression will speed the process up or slow it down.

              I'm assured none of the files is > 4Gb, nevertheless, the point is that if the process halts or crashes unexpectedly, then recovery will be a manual hack not an automatic pick up where we left off. This aspect is the part that disturbs me the most.

              You're right that a test on a subset is the only way to get a real metric, but the client has asked for a theoretic "ballpark" number to qualify the situation. If the answer had been "1 week" then the methodology is fatally flawed for this situation. However, an answer of "1 day" means that it makes sense to explore this further.

              Finally, thanks for correcting the maths error!

              Cheers guys - very useful feedback

              Comment


                #8
                Just get one of these,

                http://www-03.ibm.com/systems/storag...130/index.html

                and milan...

                Much easier than fiddling with all that networky stuff
                ‎"See, you think I give a tulip. Wrong. In fact, while you talk, I'm thinking; How can I give less of a tulip? That's why I look interested."

                Comment


                  #9
                  Originally posted by Moscow Mule View Post
                  and milan...
                  don't confuse my quality posts with the other fella's nonsense

                  P.S. even with the tape drive, would take approx 10 hours

                  Comment


                    #10
                    iperf - real network speed
                    rsync - makes 7gb look like 100mb of changes.
                    Always forgive your enemies; nothing annoys them so much.

                    Comment

                    Working...
                    X