Difference between revisions of "How to Transfer Data"

From DeepSense Docs
Jump to: navigation, search
(Mac OSX)
m (fixed info about permissions)
Line 14: Line 14:
 
between two machines.  
 
between two machines.  
  
  '''Example''': rsync -rzvh /path/to/files/ username@login2.deepsense.ca:/data/projectdir/
+
  '''Example''': rsync -rzvhP /path/to/files/ username@login2.deepsense.ca:/data/projectdir/
  
 
The rsync options above are:
 
The rsync options above are:
Line 21: Line 21:
 
* v - verbose: list files copied
 
* v - verbose: list files copied
 
* h - human readable: output numbers in human readable format
 
* h - human readable: output numbers in human readable format
 +
* P - same as --partial --progress.  Show progress while transferring, and keep partial files.
  
  '''Note''': do not use the <code>-p</code>, or <code>-a</code> options, as they preserve file permissions.  This could cause problems with user quotas, as they are based on the owner/group of files.
+
  '''Note''': do not use the <code>-a</code>, <code>-o</code> or <code>-g</code> options, as they preserve file owner/group.  This could cause problems with user quotas, as they are based on the owner/group of files.
  
 
=== Protocol Nodes ===
 
=== Protocol Nodes ===

Revision as of 16:42, 24 October 2019

There are different methods for transferring data to and from the DeepSense platform. Which method you use will depend from where you are transferring the data, as well as the size of the data.

To and From Your Personal Computer

Login Nodes

Since the two login nodes are the primary point of access for the platform, they may be in heavy use. We do not want to overload them unnecessarily for data transfer. Please only use this for small amounts of data.

The most common method for transferring data securely between machines will be scp. This is pretty straightforward to use.

Example: scp -r /path/to/files/ username@login2.deepsense.ca:/data/projectdir/

One can also use rsync (see the man page). This has more options than scp, and can be used to sync files between two machines.

Example: rsync -rzvhP /path/to/files/ username@login2.deepsense.ca:/data/projectdir/

The rsync options above are:

  • r - recursively copy subdirectories
  • z - use compression when copying
  • v - verbose: list files copied
  • h - human readable: output numbers in human readable format
  • P - same as --partial --progress. Show progress while transferring, and keep partial files.
Note: do not use the -a, -o or -g options, as they preserve file owner/group.  This could cause problems with user quotas, as they are based on the owner/group of files.

Protocol Nodes

The protocol nodes (protocol1.deepsense.ca, protocol2.deepsense.ca) are specifically meant for large data transfers. However, they are only accessible via samba.

Mac OSX

Connect via samba on OSX

On a Mac, open finder and hit ⌘-K, or use the menu Go -> Connect to Server. In the dialog box (see image), type the address for either protocol node, and you can login. This will connect you to the /data filesystem.

If you want to use rsync to transfer data via the protocol nodes, you have to mount one. On a Mac, the easiest way is to connect to the protocol node as in the previous paragraph. This will mount it at /Volumes/data/. You can now use rsync to copy files to your project's subdirectory.

Example: rsync ‑rzvh /path/to/files/ /Volumes/data/projectdir/

Windows

On windows computer, you should connect to //protocol1.deepsense.ca/data or //protocol2.deepsense.ca/data. You may also have to change a SMB security level setting as follows (this was necessary in Windows 10):

Control Panel > System and Security > Administrative tools > Local Security Policy > expand Local Policies > Security options > click on Network security: Lan Manager authentication level > Then in the field choose > Send NTLMv2 responses only > click on Apply, then ok and close all.


From the World Wide Web

The standard tool for downloading data from websites is wget. Also available is curl. The two are compared in this StackExchange article.

Between DeepSense Filesystems

You may want to transfer data from your home directory to your data or scratch directories. To do this, you should not use the mv command. Please instead use the cp command (you can delete them from the original filesystem after). When files are copied to a new filesystem, new files are created with the proper group name. Using the mv command will keep the original group name, and can affect the quota reporting.