Difference between revisions of "Storage policies"

From DeepSense Docs
Jump to: navigation, search
(Created page with "== Overview == DeepSense is primarily a platform for big data analytics of oceans research data. However, it is not meant for long term data storage. Data will only be stor...")
 
m (Tightened up the language)
Line 13: Line 13:
 
=== Home directory ===
 
=== Home directory ===
  
Each user has a home directory in /dshome/.  Depending on your working group, it may be in a subdirectory (visitor, research, faculty, grad, etc.).  This is primarily designed for your personal use, and only you have permission to access it.  It is ideal for managing scripts, source code and test data sets.  
+
Each user has a home directory in /dshome/''subdir''/.  The subdirectory (visitor, research, faculty, grad, etc.) will depend on the type of LDAP account you have.  This is primarily designed for your personal use, and only you have permission to access it.  It is ideal for managing scripts, source code and test data sets. It is not meant for large data storage.
  
 
=== Data ===
 
=== Data ===
  
The data directory will house the bulk of your data.  It has a larger default quota.  The data should be stored under a group directory which everyone in your group will have access to.  This is the primary location for transferring large amounts of data.  It is accessible via [[Getting_started#4._Transfer_data|samba]].
+
Each project will have access to a directory in /data/''projectname''.  This will house the bulk of your data, and it has a larger default quota.  Everyone in your project group will have access to this directory.  This is also the primary location for transferring large amounts of data.  It is accessible via [[Getting_started#4._Transfer_data|samba]].
  
 
=== DB data ===
 
=== DB data ===
  
Data that resides in a database of some sort will be stored here.
+
Data that resides in a database will be stored here.
  
 
=== Scratch ===
 
=== Scratch ===
  
Each user will have a directory in /scratch/''username''.  The scratch filesystem is intended to support data used during job execution.  This is temporary space, and is not backed up.  Data which has not been accessed in 60 days may be purged.  Data needed for longer storage should be stored elsewhere.   
+
Each user will have access to a directory in /scratch/''projectname''.  The scratch filesystem is intended to support data used during job execution.  This is temporary space, and is not backed up.  Data which has not been accessed in 60 days may be purged, though we will contact you prior to doing so.  Data needed for longer storage should be stored elsewhere.   
  
 
Delete files that you no longer need as soon as you are done with them, rather than leaving large amounts of data sitting untouched.   
 
Delete files that you no longer need as soon as you are done with them, rather than leaving large amounts of data sitting untouched.   
Line 33: Line 33:
 
{| class="wikitable"
 
{| class="wikitable"
 
|-
 
|-
! Name
 
 
! Filesystem
 
! Filesystem
 +
! Location
 
! Default Quota
 
! Default Quota
 
! Grace Period
 
! Grace Period
Line 58: Line 58:
 
|-
 
|-
 
| Scratch
 
| Scratch
| /scratch/''username''
+
| /scratch/''projectname''
 
| 2Tb and 1M files per user
 
| 2Tb and 1M files per user
 
| 7 days
 
| 7 days
Line 74: Line 74:
 
|}
 
|}
  
The quotas are the default values.  Users requiring additional space should first clean out any old data, and if space is still required, contact [mailto:support@deepsense.ca support].  Please include a paragraph justifying your need.
+
The quotas are the default values.  Users requiring additional space should first clean out any old data, and if space is still required, contact [mailto:support@deepsense.ca support@deepsense.ca].  Please include a paragraph justifying your need.
  
 
There is a soft limit of 90% of the quota.  If you go over this, you will be given the grace period above to clean out old data and try to get below the soft limit.  If after the grace period, you are still above the soft limit, you will not be able to write further data.  You will never be allowed to write more than the actual quota.
 
There is a soft limit of 90% of the quota.  If you go over this, you will be given the grace period above to clean out old data and try to get below the soft limit.  If after the grace period, you are still above the soft limit, you will not be able to write further data.  You will never be allowed to write more than the actual quota.
  
=== Checking space usage ===
+
=== Checking disk usage ===
  
 
You can check how much space you have used on the various filesystems using: <code>/software/scripts/diskusage_report.py</code>.   
 
You can check how much space you have used on the various filesystems using: <code>/software/scripts/diskusage_report.py</code>.   
Line 88: Line 88:
 
=== Backup policy ===
 
=== Backup policy ===
  
As noted above, the home directory, data and db-data filesystems are backed up each evening.  The backup server keeps seven versions of each file.  Once a user deletes a file, the newest version is kept for 30 days, after which recovery will not be possible.
+
As noted above, the home directory, data and db-data filesystems are backed up each evening.  The backup server keeps seven versions of each file.  Once a user deletes a file, the newest version is kept on the backup server for 30 days, after which recovery will not be possible.
  
If you need to recover an older version of a file, or a deleted file, please contact [mailto:support@deepsense.ca support].
+
If you need to recover an older version of a file, or a deleted file, please contact [mailto:support@deepsense.ca support@deepsense.ca].
  
  

Revision as of 19:03, 7 March 2019

Overview

DeepSense is primarily a platform for big data analytics of oceans research data. However, it is not meant for long term data storage. Data will only be stored so long as your project is ongoing. Once a project is completed, it is expected that the users will remove their data in a timely fashion.

Each filesystem has a default quota, with more space available upon request. While some of the filesystems are backed up, it is the responsibility of the user to maintain their original data on their own site.

The filesystems provided are a shared resource. It is expected that users make sensible use of the space, and follow the guidelines outlined here. It is possible the quotas and policies will change in the future, but we will strive to provide plenty of notice.

Filesystems

There are several network filespaces that can be accessed from any of the nodes. They each have separate purposes.

Home directory

Each user has a home directory in /dshome/subdir/. The subdirectory (visitor, research, faculty, grad, etc.) will depend on the type of LDAP account you have. This is primarily designed for your personal use, and only you have permission to access it. It is ideal for managing scripts, source code and test data sets. It is not meant for large data storage.

Data

Each project will have access to a directory in /data/projectname. This will house the bulk of your data, and it has a larger default quota. Everyone in your project group will have access to this directory. This is also the primary location for transferring large amounts of data. It is accessible via samba.

DB data

Data that resides in a database will be stored here.

Scratch

Each user will have access to a directory in /scratch/projectname. The scratch filesystem is intended to support data used during job execution. This is temporary space, and is not backed up. Data which has not been accessed in 60 days may be purged, though we will contact you prior to doing so. Data needed for longer storage should be stored elsewhere.

Delete files that you no longer need as soon as you are done with them, rather than leaving large amounts of data sitting untouched.

Quotas and Policies

Filesystem Location Default Quota Grace Period Backed up? Purged? Notes
Home /dshome/subdir/username 1Tb and 500K files per user 7 days yes no subdir may be visitor, research, grad, etc.
Data /data/projectname 2Tb and 500K files per project 7 days yes no
Scratch /scratch/projectname 2Tb and 1M files per user 7 days no yes Data not accessed in 60 days may be purged
DB data /db‑data ?? ?? yes no Houses Databases

The quotas are the default values. Users requiring additional space should first clean out any old data, and if space is still required, contact support@deepsense.ca. Please include a paragraph justifying your need.

There is a soft limit of 90% of the quota. If you go over this, you will be given the grace period above to clean out old data and try to get below the soft limit. If after the grace period, you are still above the soft limit, you will not be able to write further data. You will never be allowed to write more than the actual quota.

Checking disk usage

You can check how much space you have used on the various filesystems using: /software/scripts/diskusage_report.py. This will show your user quotas. If you are a part of a group, you will have a shared group data folder. To check your group quota, use: /software/scripts/diskusage_report.py -g groupname.

You will not be able to write more data if you are over the hard limit.

Backup policy

As noted above, the home directory, data and db-data filesystems are backed up each evening. The backup server keeps seven versions of each file. Once a user deletes a file, the newest version is kept on the backup server for 30 days, after which recovery will not be possible.

If you need to recover an older version of a file, or a deleted file, please contact support@deepsense.ca.


Data retention

Once a user leaves the DeepSense, or a project is completed, it is expected that user will take their results, and remove their data in a timely fashion. Data may be removed 30 days after the project is completed. For users working on multiple projects, their home directory will remain until they done with all projects.

Best Practices

  • Small amounts of data may be transferred via the login nodes, using scp or rsync
  • Large amounts of data must be transferred via the protocol nodes, using samba
  • Regularly clean scratch files, as this will be used for vast amounts of temporary data
  • Data no longer used should be removed, as DeepSense is not intended for long term storage
  • Check your space usage regularly to make sure you are below your quota
  • When your project/account are closed, please remove all data
  • To help us avoid restricting the storage policies, please use the space conscientiously