Difference between revisions of "Storage policies"

From DeepSense Docs
Jump to: navigation, search
m (added "policy" to data retention title)
m (Checking disk usage)
 
(12 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
== Overview ==
 
== Overview ==
  
DeepSense is primarily a platform for big data analytics of oceans research data.  However, it is not meant for long term data storageData will only be stored so long as your project is ongoingOnce a project is completed, it is expected that the users will remove their data in a timely fashion.  
+
DeepSense is a platform for AI/ML for oceans research data.  All users will have a home directory, and access to various filesystemsEach filesystem has a default [[#Quotas and Policies|quota]], with more space available upon requestWhile some of the filesystems are backed up, it is the responsibility of the user to maintain their original data on their own site.
  
Each filesystem has a default [[#Quotas and Policies|quota]], with more space available upon requestWhile some of the filesystems are backed up, it is the responsibility of the user to maintain their original data on their own site.
+
DeepSense is not meant for long term data storage.  Data will only be stored so long as your project is ongoingOnce a project is completed, it is expected that the users will remove their data in a timely fashion.
  
The filesystems provided are a shared resource.  It is expected that users make sensible use of the space, and follow the guidelines outlined here.  It is possible the quotas and policies will change in the future, but we will strive to provide plenty of notice.
+
DeepSense is not intended to be used for data sharing.  While each user in your project/group will have access to shared space, it won't be accessible by any other users.  We do not host databases for sharing data, or for web access.
 +
 
 +
The filesystems provided are a resource shared by all users.  It is expected that users make sensible use of the space, and follow the guidelines outlined here.  It is possible the quotas and policies will change in the future, but we will strive to provide plenty of notice.
  
 
== Filesystems ==
 
== Filesystems ==
  
There are several network filespaces that can be accessed from any of the nodes.  They each have separate purposes.
+
There are several different filespaces available to users.  They are shared filesystems that can be accessed from any of the nodes.  They each have separate purposes.  See also [[Transferring Data]].
  
 
=== Home directory ===
 
=== Home directory ===
  
Each user has a home directory in /dshome/''subdir''/.  The subdirectory (visitor, research, faculty, grad, etc.) will depend on the type of LDAP account you have.  This is primarily designed for your personal use, and only you have permission to access it.  It is ideal for managing scripts, source code and test data sets. It is not meant for large data storage.
+
Each user has a home directory in /dshome/''subdirectory''/.  The subdirectory (visitor, research, faculty, grad, etc.) will depend on the type of LDAP account you have.  This is primarily designed for your personal use, and only you have permission to access it.  It is ideal for managing scripts, source code and test data sets. It is not meant for large data storage.
  
 
=== Data ===
 
=== Data ===
  
Each project will have access to a directory in /data/''projectname''.  This will house the bulk of your data, and it has a larger default quotaEveryone in your project group will have access to this directory.  This is also the primary location for transferring large amounts of data.  It is accessible via [[Getting_started#4._Transfer_data|samba]].
+
Each user/project will have access to a directory in the data filesystem.   
  
=== DB data ===
+
* If you are a member of a project, the directory will be /data/''projectname'' (or groupname).  Everyone in your project group will have access to this directory.
 +
* If you are an individual student, your directory is /data/''username''.  Only you will have access to this directory.
  
Data that resides in a database will be stored here.
+
The data filesystem will house the bulk of your data, and it has a larger default quota. This is also the primary location for transferring large amounts of data.  It is accessible via [[Getting_started#2._Transfer_data|samba]].
  
 
=== Scratch ===
 
=== Scratch ===
  
Each user will have access to a directory in /scratch/''projectname''.  The scratch filesystem is intended to support data used during job execution.  This is temporary space, and is not backed up.  Data which has not been accessed in 60 days may be purged, though we will contact you prior to doing so.  Data needed for longer storage should be stored elsewhere.   
+
Each user/project will have access to a directory in the scratch filesystem.
 +
 
 +
* If you are a member of a project, the directory will be /scratch/''projectname'' (or groupname)Everyone in your project group will have access to this directory.
 +
* If you are an individual student, your directory is /scratch/''username''.  Only you will have access to this directory.
 +
 
 +
The scratch filesystem is intended to support data used during job execution.  It has a larger default quota, and can support a larger number of files.  '''Note''': this is temporary space, and is not backed up.  Data which has not been accessed in 60 days may be purged, though we will contact you prior to doing so.  Data needed for longer storage should be stored elsewhere.   
  
Delete files that you no longer need as soon as you are done with them, rather than leaving large amounts of data sitting untouched.
+
Delete files that you no longer need as soon as you are done with them, rather than leaving large amounts of data sitting untouched.
  
 
== Quotas and Policies ==
 
== Quotas and Policies ==
Line 59: Line 67:
 
| Scratch
 
| Scratch
 
| /scratch/''projectname''
 
| /scratch/''projectname''
| 2Tb and 1M files per user
+
| 2Tb and 1M files per project
| 7 days
+
| 24 hours
 
| style="text-align: center;" | no
 
| style="text-align: center;" | no
 
| style="text-align: center;" | yes
 
| style="text-align: center;" | yes
 
|  Data not accessed in 60 days may be purged
 
|  Data not accessed in 60 days may be purged
|-
 
| DB data
 
| /db‑data
 
| ??
 
| ??
 
| style="text-align: center;" | yes
 
| style="text-align: center;" | no
 
| Houses Databases
 
 
|}
 
|}
  
The quotas are the default values.  Users requiring additional space should first clean out any old data, and if space is still required, contact [mailto:support@deepsense.ca support@deepsense.ca].  Please include a paragraph justifying your need.
+
The quotas are the default values.  Users requiring additional space should first clean out any old data, and if space is still required, contact ([mailto:support@deepsense.ca support@deepsense.ca]).  Please include a paragraph justifying your need.
  
 
There is a soft limit of 90% of the quota.  If you go over this, you will be given the grace period above to clean out old data and try to get below the soft limit.  If after the grace period, you are still above the soft limit, you will not be able to write further data.  You will never be allowed to write more than the actual quota.
 
There is a soft limit of 90% of the quota.  If you go over this, you will be given the grace period above to clean out old data and try to get below the soft limit.  If after the grace period, you are still above the soft limit, you will not be able to write further data.  You will never be allowed to write more than the actual quota.
 +
 +
'''Note''': If you go over the hard limit will a job is running, it may not be able to write output, and therefore crash.  If it can't write output, it also cannot record the reason for the job crashing.  Please be aware of how much data you have stored, especially if you are near your quota.
  
 
=== Checking disk usage ===
 
=== Checking disk usage ===
  
 
You can check how much space you have used on the various filesystems using: <code>/software/scripts/diskusage_report.py</code>.   
 
You can check how much space you have used on the various filesystems using: <code>/software/scripts/diskusage_report.py</code>.   
This will show your user quotas.  If you are a part of a group, you will have a shared group data folder.  To check your group quota, use:
+
This will show your user quotas, as well as the group quotas for any group you are in.
<code>/software/scripts/diskusage_report.py -g groupname</code>.
 
  
You will not be able to write more data if you are over the hard limit.   
+
You will not be able to write more data if you are over the hard limit.  If you do reach your quota, we will send you an e-mail with this information.
  
 
=== Backup policy ===
 
=== Backup policy ===
  
As noted above, the home directory, data and db-data filesystems are backed up each evening.  The backup server keeps seven versions of each file.  Once a user deletes a file, the newest version is kept on the backup server for 30 days, after which recovery will not be possible.
+
As noted above, the home directory, and data filesystems are backed up each evening (in addition to software, and other things)Our backup server keeps seven versions of each file.  That is, if you change the contents of a file that has been backed up, the server will keep the old version (up to 6 of them), and backup the new version.  Once a user deletes a file, the newest version is kept on the backup server for 30 days, after which recovery will not be possible.  Previous versions of a file deleted by the user will not be saved.
  
If you need to recover an older version of a file, or a deleted file, please contact [mailto:support@deepsense.ca support@deepsense.ca].
+
If you need to recover an older version of a file, or a deleted file, please contact ([mailto:support@deepsense.ca support@deepsense.ca]).
  
  
 
=== Data retention policy ===
 
=== Data retention policy ===
  
When a project starts, it must have completion date.  Once a user leaves DeepSense, a project is completed, or the completion date passes, it is expected that user will take their results, and remove their data in a timely fashion.  Data may be purged 30 days after the project is completed. This can be extended to three months, if required.  To do so, contact [mailto:support@deepsense.ca support@deepsense.ca].  Please include a paragraph justifying your need.
+
When a project starts, it must have completion date.  Once a user leaves DeepSense, a project is completed, or the completion date passes, it is expected that user will take their results, and remove their data in a timely fashion.  Data may be purged 30 days after the project is completed. This can be extended to three months, if required.  To do so, contact ([mailto:support@deepsense.ca support@deepsense.ca]).  Please include a paragraph justifying your need.
  
 
For users working on multiple projects, their home directory will remain until they are done with all projects.
 
For users working on multiple projects, their home directory will remain until they are done with all projects.
Line 103: Line 104:
 
* Small amounts of data may be transferred via the login nodes, using ''scp'' or ''rsync''
 
* Small amounts of data may be transferred via the login nodes, using ''scp'' or ''rsync''
 
* Large amounts of data '''must''' be transferred via the protocol nodes, using samba
 
* Large amounts of data '''must''' be transferred via the protocol nodes, using samba
* Regularly clean scratch files, as this will be used for vast amounts of temporary data
+
* Regularly remove temporary data stored in the scratch filesystem, as this will be used for vast amounts of temporary data
 
* Data no longer used should be removed, as DeepSense is not intended for long term storage
 
* Data no longer used should be removed, as DeepSense is not intended for long term storage
 
* Check your space usage regularly to make sure you are below your quota
 
* Check your space usage regularly to make sure you are below your quota
 
* When your project/account are closed, please remove all data
 
* When your project/account are closed, please remove all data
 
* To help us avoid restricting the storage policies, please use the space conscientiously
 
* To help us avoid restricting the storage policies, please use the space conscientiously

Latest revision as of 17:53, 7 July 2020

Overview

DeepSense is a platform for AI/ML for oceans research data. All users will have a home directory, and access to various filesystems. Each filesystem has a default quota, with more space available upon request. While some of the filesystems are backed up, it is the responsibility of the user to maintain their original data on their own site.

DeepSense is not meant for long term data storage. Data will only be stored so long as your project is ongoing. Once a project is completed, it is expected that the users will remove their data in a timely fashion.

DeepSense is not intended to be used for data sharing. While each user in your project/group will have access to shared space, it won't be accessible by any other users. We do not host databases for sharing data, or for web access.

The filesystems provided are a resource shared by all users. It is expected that users make sensible use of the space, and follow the guidelines outlined here. It is possible the quotas and policies will change in the future, but we will strive to provide plenty of notice.

Filesystems

There are several different filespaces available to users. They are shared filesystems that can be accessed from any of the nodes. They each have separate purposes. See also Transferring Data.

Home directory

Each user has a home directory in /dshome/subdirectory/. The subdirectory (visitor, research, faculty, grad, etc.) will depend on the type of LDAP account you have. This is primarily designed for your personal use, and only you have permission to access it. It is ideal for managing scripts, source code and test data sets. It is not meant for large data storage.

Data

Each user/project will have access to a directory in the data filesystem.

  • If you are a member of a project, the directory will be /data/projectname (or groupname). Everyone in your project group will have access to this directory.
  • If you are an individual student, your directory is /data/username. Only you will have access to this directory.

The data filesystem will house the bulk of your data, and it has a larger default quota. This is also the primary location for transferring large amounts of data. It is accessible via samba.

Scratch

Each user/project will have access to a directory in the scratch filesystem.

  • If you are a member of a project, the directory will be /scratch/projectname (or groupname). Everyone in your project group will have access to this directory.
  • If you are an individual student, your directory is /scratch/username. Only you will have access to this directory.

The scratch filesystem is intended to support data used during job execution. It has a larger default quota, and can support a larger number of files. Note: this is temporary space, and is not backed up. Data which has not been accessed in 60 days may be purged, though we will contact you prior to doing so. Data needed for longer storage should be stored elsewhere.

Delete files that you no longer need as soon as you are done with them, rather than leaving large amounts of data sitting untouched.

Quotas and Policies

Filesystem Location Default Quota Grace Period Backed up? Purged? Notes
Home /dshome/subdir/username 1Tb and 500K files per user 7 days yes no subdir may be visitor, research, grad, etc.
Data /data/projectname 2Tb and 500K files per project 7 days yes no
Scratch /scratch/projectname 2Tb and 1M files per project 24 hours no yes Data not accessed in 60 days may be purged

The quotas are the default values. Users requiring additional space should first clean out any old data, and if space is still required, contact (support@deepsense.ca). Please include a paragraph justifying your need.

There is a soft limit of 90% of the quota. If you go over this, you will be given the grace period above to clean out old data and try to get below the soft limit. If after the grace period, you are still above the soft limit, you will not be able to write further data. You will never be allowed to write more than the actual quota.

Note: If you go over the hard limit will a job is running, it may not be able to write output, and therefore crash. If it can't write output, it also cannot record the reason for the job crashing. Please be aware of how much data you have stored, especially if you are near your quota.

Checking disk usage

You can check how much space you have used on the various filesystems using: /software/scripts/diskusage_report.py. This will show your user quotas, as well as the group quotas for any group you are in.

You will not be able to write more data if you are over the hard limit. If you do reach your quota, we will send you an e-mail with this information.

Backup policy

As noted above, the home directory, and data filesystems are backed up each evening (in addition to software, and other things). Our backup server keeps seven versions of each file. That is, if you change the contents of a file that has been backed up, the server will keep the old version (up to 6 of them), and backup the new version. Once a user deletes a file, the newest version is kept on the backup server for 30 days, after which recovery will not be possible. Previous versions of a file deleted by the user will not be saved.

If you need to recover an older version of a file, or a deleted file, please contact (support@deepsense.ca).


Data retention policy

When a project starts, it must have completion date. Once a user leaves DeepSense, a project is completed, or the completion date passes, it is expected that user will take their results, and remove their data in a timely fashion. Data may be purged 30 days after the project is completed. This can be extended to three months, if required. To do so, contact (support@deepsense.ca). Please include a paragraph justifying your need.

For users working on multiple projects, their home directory will remain until they are done with all projects.

Best Practices

  • Small amounts of data may be transferred via the login nodes, using scp or rsync
  • Large amounts of data must be transferred via the protocol nodes, using samba
  • Regularly remove temporary data stored in the scratch filesystem, as this will be used for vast amounts of temporary data
  • Data no longer used should be removed, as DeepSense is not intended for long term storage
  • Check your space usage regularly to make sure you are below your quota
  • When your project/account are closed, please remove all data
  • To help us avoid restricting the storage policies, please use the space conscientiously