Storage policies

From DeepSense Docs
Jump to: navigation, search

Overview

DeepSense is primarily a platform for big data analytics of oceans research data. However, it is not meant for long term data storage. Data will only be stored so long as your project is ongoing. Once a project is completed, it is expected that the users will remove their data in a timely fashion.

Each filesystem has a default quota, with more space available upon request. While some of the filesystems are backed up, it is the responsibility of the user to maintain their original data on their own site.

The filesystems provided are a shared resource. It is expected that users make sensible use of the space, and follow the guidelines outlined here. It is possible the quotas and policies will change in the future, but we will strive to provide plenty of notice.

Filesystems

There are several network filespaces that can be accessed from any of the nodes. They each have separate purposes. See also Transferring Data.

Home directory

Each user has a home directory in /dshome/subdir/. The subdirectory (visitor, research, faculty, grad, etc.) will depend on the type of LDAP account you have. This is primarily designed for your personal use, and only you have permission to access it. It is ideal for managing scripts, source code and test data sets. It is not meant for large data storage.

Data

Each project will have access to a directory in /data/projectname. This will house the bulk of your data, and it has a larger default quota. Everyone in your project group will have access to this directory. This is also the primary location for transferring large amounts of data. It is accessible via samba.

DB data

Data that resides in a database will be stored here.

Scratch

Each user will have access to a directory in /scratch/projectname. The scratch filesystem is intended to support data used during job execution. This is temporary space, and is not backed up. Data which has not been accessed in 60 days may be purged, though we will contact you prior to doing so. Data needed for longer storage should be stored elsewhere.

Delete files that you no longer need as soon as you are done with them, rather than leaving large amounts of data sitting untouched.

Quotas and Policies

Filesystem Location Default Quota Grace Period Backed up? Purged? Notes
Home /dshome/subdir/username 1Tb and 500K files per user 7 days yes no subdir may be visitor, research, grad, etc.
Data /data/projectname 2Tb and 500K files per project 7 days yes no
Scratch /scratch/projectname 2Tb and 1M files per user 7 days no yes Data not accessed in 60 days may be purged
DB data /db‑data ?? ?? yes no Houses Databases

The quotas are the default values. Users requiring additional space should first clean out any old data, and if space is still required, contact (support@deepsense.ca). Please include a paragraph justifying your need.

There is a soft limit of 90% of the quota. If you go over this, you will be given the grace period above to clean out old data and try to get below the soft limit. If after the grace period, you are still above the soft limit, you will not be able to write further data. You will never be allowed to write more than the actual quota.

Checking disk usage

You can check how much space you have used on the various filesystems using: /software/scripts/diskusage_report.py. This will show your user quotas. If you are a part of a group, you will have a shared group data folder. To check your group quota, use: /software/scripts/diskusage_report.py -g groupname.

You will not be able to write more data if you are over the hard limit.

Backup policy

As noted above, the home directory, data and db-data filesystems are backed up each evening. The backup server keeps seven versions of each file. Once a user deletes a file, the newest version is kept on the backup server for 30 days, after which recovery will not be possible.

If you need to recover an older version of a file, or a deleted file, please contact (support@deepsense.ca).


Data retention policy

When a project starts, it must have completion date. Once a user leaves DeepSense, a project is completed, or the completion date passes, it is expected that user will take their results, and remove their data in a timely fashion. Data may be purged 30 days after the project is completed. This can be extended to three months, if required. To do so, contact (support@deepsense.ca). Please include a paragraph justifying your need.

For users working on multiple projects, their home directory will remain until they are done with all projects.

Best Practices

  • Small amounts of data may be transferred via the login nodes, using scp or rsync
  • Large amounts of data must be transferred via the protocol nodes, using samba
  • Regularly clean scratch files, as this will be used for vast amounts of temporary data
  • Data no longer used should be removed, as DeepSense is not intended for long term storage
  • Check your space usage regularly to make sure you are below your quota
  • When your project/account are closed, please remove all data
  • To help us avoid restricting the storage policies, please use the space conscientiously