Storage policies
Contents
Overview
DeepSense is primarily a platform for big data analytics of oceans research data. However, it is not meant for long term data storage. Data will only be stored so long as your project is ongoing. Once a project is completed, it is expected that the users will remove their data in a timely fashion.
Each filesystem has a default quota, with more space available upon request. While some of the filesystems are backed up, it is the responsibility of the user to maintain their original data on their own site.
The filesystems provided are a shared resource. It is expected that users make sensible use of the space, and follow the guidelines outlined here. It is possible the quotas and policies will change in the future, but we will strive to provide plenty of notice.
Filesystems
There are several network filespaces that can be accessed from any of the nodes. They each have separate purposes.
Home directory
Each user has a home directory in /dshome/. Depending on your working group, it may be in a subdirectory (visitor, research, faculty, grad, etc.). This is primarily designed for your personal use, and only you have permission to access it. It is ideal for managing scripts, source code and test data sets.
Data
The data directory will house the bulk of your data. It has a larger default quota. The data should be stored under a group directory which everyone in your group will have access to. This is the primary location for transferring large amounts of data. It is accessible via samba.
DB data
Data that resides in a database of some sort will be stored here.
Scratch
Each user will have a directory in /scratch/username. The scratch filesystem is intended to support data used during job execution. This is temporary space, and is not backed up. Data which has not been accessed in 60 days may be purged. Data needed for longer storage should be stored elsewhere.
Delete files that you no longer need as soon as you are done with them, rather than leaving large amounts of data sitting untouched.
Quotas and Policies
Name | Filesystem | Default Quota | Grace Period | Backed up? | Purged? | Notes |
---|---|---|---|---|---|---|
Home | /dshome/subdir/username | 1Tb and 500K files per user | 7 days | yes | no | subdir may be visitor, research, grad, etc. |
Data | /data/projectname | 2Tb and 500K files per project | 7 days | yes | no | |
Scratch | /scratch/username | 2Tb and 1M files per user | 7 days | no | yes | Data not accessed in 60 days may be purged |
DB data | /db‑data | ?? | ?? | yes | no | Houses Databases |
The quotas are the default values. Users requiring additional space should first clean out any old data, and if space is still required, contact support. Please include a paragraph justifying your need.
There is a soft limit of 90% of the quota. If you go over this, you will be given the grace period above to clean out old data and try to get below the soft limit. If after the grace period, you are still above the soft limit, you will not be able to write further data. You will never be allowed to write more than the actual quota.
Checking space usage
You can check how much space you have used on the various filesystems using: /software/scripts/diskusage_report.py
.
This will show your user quotas. If you are a part of a group, you will have a shared group data folder. To check your group quota, use:
/software/scripts/diskusage_report.py -g groupname
.
You will not be able to write more data if you are over the hard limit.
Backup policy
As noted above, the home directory, data and db-data filesystems are backed up each evening. The backup server keeps seven versions of each file. Once a user deletes a file, the newest version is kept for 30 days, after which recovery will not be possible.
If you need to recover an older version of a file, or a deleted file, please contact support.
Data retention
Once a user leaves the DeepSense, or a project is completed, it is expected that user will take their results, and remove their data in a timely fashion. Data may be removed 30 days after the project is completed. For users working on multiple projects, their home directory will remain until they done with all projects.
Best Practices
- Small amounts of data may be transferred via the login nodes, using scp or rsync
- Large amounts of data must be transferred via the protocol nodes, using samba
- Regularly clean scratch files, as this will be used for vast amounts of temporary data
- Data no longer used should be removed, as DeepSense is not intended for long term storage
- Check your space usage regularly to make sure you are below your quota
- When your project/account are closed, please remove all data
- To help us avoid restricting the storage policies, please use the space conscientiously