Storage policies

From DeepSense Docs
Jump to: navigation, search

Overview

DeepSense is a platform for AI/ML for oceans research data. All users will have a home directory, and access to various filesystems. Each filesystem has a default quota, with more space available upon request. While some of the filesystems are backed up, it is the responsibility of the user to maintain their original data on their own site.

DeepSense is not meant for long term data storage. Data will only be stored so long as your project is ongoing. Once a project is completed, it is expected that the users will remove their data in a timely fashion.

DeepSense is not intended to be used for data sharing. While each user in your project/group will have access to shared space, it won't be accessible by any other users. We do not host databases for sharing data, or for web access.

The filesystems provided are a resource shared by all users. It is expected that users make sensible use of the space, and follow the guidelines outlined here. It is possible the quotas and policies will change in the future, but we will strive to provide plenty of notice.

Filesystems

There are several different filespaces available to users. They are shared filesystems that can be accessed from any of the nodes. They each have separate purposes. See also Transferring Data.

Home directory

Each user has a home directory in /dshome/subdirectory/. The subdirectory (visitor, research, faculty, grad, etc.) will depend on the type of LDAP account you have. This is primarily designed for your personal use, and only you have permission to access it. It is ideal for managing scripts, source code and test data sets. It is not meant for large data storage.

Data

Each user/project will have access to a directory in the data filesystem.

  • If you are a member of a project, the directory will be /data/projectname (or groupname). Everyone in your project group will have access to this directory.
  • If you are an individual student, your directory is /data/username. Only you will have access to this directory.

The data filesystem will house the bulk of your data, and it has a larger default quota. This is also the primary location for transferring large amounts of data. It is accessible via samba.

Scratch

Each user/project will have access to a directory in the scratch filesystem.

  • If you are a member of a project, the directory will be /scratch/projectname (or groupname). Everyone in your project group will have access to this directory.
  • If you are an individual student, your directory is /scratch/username. Only you will have access to this directory.

The scratch filesystem is intended to support data used during job execution. It has a larger default quota, and can support a larger number of files. Note: this is temporary space, and is not backed up. Data which has not been accessed in 60 days may be purged, though we will contact you prior to doing so. Data needed for longer storage should be stored elsewhere.

Delete files that you no longer need as soon as you are done with them, rather than leaving large amounts of data sitting untouched.

Quotas and Policies

Filesystem Location Default Quota Grace Period Backed up? Purged? Notes
Home /dshome/subdir/username 1Tb and 500K files per user 7 days yes no subdir may be visitor, research, grad, etc.
Data /data/projectname 2Tb and 500K files per project 7 days yes no
Scratch /scratch/projectname 2Tb and 1M files per project 24 hours no yes Data not accessed in 60 days may be purged

The quotas are the default values. Users requiring additional space should first clean out any old data, and if space is still required, contact (support@deepsense.ca). Please include a paragraph justifying your need.

There is a soft limit of 90% of the quota. If you go over this, you will be given the grace period above to clean out old data and try to get below the soft limit. If after the grace period, you are still above the soft limit, you will not be able to write further data. You will never be allowed to write more than the actual quota.

Note: If you go over the hard limit will a job is running, it may not be able to write output, and therefore crash. If it can't write output, it also cannot record the reason for the job crashing. Please be aware of how much data you have stored, especially if you are near your quota.

Checking disk usage

You can check how much space you have used on the various filesystems using: /software/scripts/diskusage_report.py. This will show your user quotas. If you are a part of a group, you will have a shared group data folder. To check your group quota, use: /software/scripts/diskusage_report.py ‑g groupname.

You will not be able to write more data if you are over the hard limit. If you do reach your quota, we will send you an e-mail with this information.

Backup policy

As noted above, the home directory, and data filesystems are backed up each evening (in addition to software, and other things). Our backup server keeps seven versions of each file. That is, if you change the contents of a file that has been backed up, the server will keep the old version (up to 6 of them), and backup the new version. Once a user deletes a file, the newest version is kept on the backup server for 30 days, after which recovery will not be possible. Previous versions of a file deleted by the user will not be saved.

If you need to recover an older version of a file, or a deleted file, please contact (support@deepsense.ca).


Data retention policy

When a project starts, it must have completion date. Once a user leaves DeepSense, a project is completed, or the completion date passes, it is expected that user will take their results, and remove their data in a timely fashion. Data may be purged 30 days after the project is completed. This can be extended to three months, if required. To do so, contact (support@deepsense.ca). Please include a paragraph justifying your need.

For users working on multiple projects, their home directory will remain until they are done with all projects.

Best Practices

  • Small amounts of data may be transferred via the login nodes, using scp or rsync
  • Large amounts of data must be transferred via the protocol nodes, using samba
  • Regularly remove temporary data stored in the scratch filesystem, as this will be used for vast amounts of temporary data
  • Data no longer used should be removed, as DeepSense is not intended for long term storage
  • Check your space usage regularly to make sure you are below your quota
  • When your project/account are closed, please remove all data
  • To help us avoid restricting the storage policies, please use the space conscientiously