Cray has a diverse set of computer systems set up in its computer center. Most of these systems are set up in a similar configurations, partly because they all run the UNICOS or UNICOS/mk operating systems. The purpose of this document is to explain how to best use some of these common features.
The system message of the day prints out each time you log in. It often contains important system operation information like if home directory files are being backed up or what version of the compilers are installed as default.
System administrators often post news items in addition to the message of the day into the /usr/news directory, which can be read via the news command.
One on-line resource for Cray documentation direct from Cray is the Cray Software Publications Library. It contains some (but not all) of the manuals available in hardbound form. I've noticed that some of the manuals do not show up in the index, even though using the Search function for keywords of interest finds them. So you might want to try Search if you don't find the information you are seeking.
Another good resource is the on-line man pages. You'll need to log into one of the systems to view the man pages.
Some documentation is common between SGI and Cray, and can be found at the SGI Technical Publications web site. There's not too much Cray-specific stuff there, but information on MPI and SHMEM should be very similar on Cray and SGI systems. Many scientific library routines (e.g., FFT) are similar or identical.
The df command will give you actual file system sizes.
Here is an abbreviated sample df command output:
% df /tmp (/dev/dsk/tmp ): 99846400 .5K blocks ( 97.6%) 1339703 I-nodes /usr/tmp (/dev/dsk/usr_tmp ): 241700064 .5K blocks ( 97.8%) 7728934 I-nodes /ptmp (/dev/dsk/ptmp ): 98803008 .5K blocks ( 77.1%) 249028 I-nodes /rain/u7 (/dev/dsk/u7 ): 3386688 .5K blocks ( 64.1%) 80362 I-nodes
There are four file systems that you will be interested in (among the dozens that may show up from df):
Your home directory. In the example above, /rain/u7. This mysterious last digit is always the same as the last digit of your user id (see the output of the id command).
A temporary working area.
A scratch area for executing jobs and other large files.
A scratch area for executing jobs.
File systems with tmp in the name are periodically scrubbed, meaning that files which have not been accessed recently are deleted. /tmp and /usr/tmp usually are scrubbed of files not accessed for over 24 hours, and /ptmp is usually scrubbed of files not accessed for 5 days.
This is where you should normally work if you are manipulating large files, or a large number of files. As the name implies, it is temporary space, and files not used for 5 days are scrubbed. A directory /ptmp/login_name is automatically created for you, and environment variable $PTMPDIR is set to this directory.
This is a large, fast file system for temporary files. $TMPDIR is automatically set to a unique directory in /tmp when you log in (and for every batch job that you submit). You can also create your own directories (e.g., /tmp/login_name) here. This file system is usually is scrubbed with a one day expiration, and often when the system is rebooted for maintenance.
This generally the fastest file system, used for scratch files for executing jobs. Don't store files that you want to keep around long here. Gets scrubbed on a one day basis and upon system reboot.
Most often you don't need to worry about using /tmp and /usr/tmp, unless you're running a program with large scratch files. Most often you will work out of your home directory (small files) or /ptmp/login_name (large files). On many (but not all) systems, your home directory is backed up to tape (in case of disk failure), but /ptmp is never backed up (you must do this yourself). It's smart to back up any valuable data in your home directory to iss, the Archive File Server.
Access to files from other systems (such as your home directory on another system) is available via NFS as /cray/host/path. For example, /cray/wind/u7/kjt is my home directory on wind. For transferring large files, ftp is usually the fastest, with rcp a close second, while access via NFS-mounted file systems is the slowest.
For disk quotas, use the quota command. If it says that quotas are not active, then there is no specific limit. Here is an abbreviated sample quota command output:
% quota
File system: /ptmp
User: kjt, Id: 2257
File blocks (512 bytes) Inodes
User Quota: 3200000 ( 0.0%) 50000 ( 0.0%)
Warning: 2880000 ( 0.0%) 45000 ( 0.0%)
Usage: 8 1
File system: /rain/u7
User: kjt, Id: 2257
File blocks (512 bytes) Inodes
User Quota: 80000 ( 67.4%) 2000 ( 37.9%)
Warning: 72000 ( 74.9%) 1800 ( 42.1%)
Usage: 53920 758
For CPU quotas, use the udbsee command. By default, it prints out your UDB (user database) entry. You'll see cpuquota and cpuquotaused, which are monthly values in CPU seconds. If the quota values are not printed, then there is no quota (i.e., unlimited CPU time). Here is an abbreviated sample udbsee command output:
% udbsee
create :kjt: uid :2257:
cpuquotaused :711316.8:
jcpulim[b] :unlimited:
jmemlim[b] :unlimited:
pcpulim[b] :unlimited:
pmemlim[b] :unlimited:
jcpulim[i] :unlimited:
pcpulim[i] :1000:
pmemlim[i] :32000:
The [b] limits are limits that apply to batch jobs, while the [i] limits apply to interactive logins. Likewise, j limits apply to jobs (or login sessions) as a whole, but p limits apply to each process individually.
Typically, the batch limits are not set, but resource limits are enforced by the batch queueing system (NQE). In the example above, there is a 1000 second limit on CPU time for interactive processes (about 16 minutes), and the maximum process size is 32000 clicks, or 15.6 megawords. A click is an arcane and dense unit of measure equal to 512 words, or 4096 bytes.
For archiving and backing up your own work, use the ARCHIVE directory on the iss system. This file system is automatically migrated to and from tape, so it can contain a large amount of data. Generally only put large files (e.g., a tar file of your /ptmp/login_name work directory) here. Note that the iss ARCHIVE directory is not backed up, so data could be lost if a disk or tape error occurs.
To run a program which takes longer than a few minutes of CPU time or more than a few megawords of memory, you need to create a batch job consisting of a small shell script submitted via the qsub command. The file you submit will usually have some parameters at the top to specify the requested maximum memory and maximum run time. These parameters determine the queue in which your job is placed, and may affect the priority given to your job.
Here is an example job script.
#QSUB -lm 32mw #QSUB -lt 1:00:00 #QSUB -eo cd $QSUB_WORKDIR ./a.out
The -lm sets the maximum memory used by a process, and -lt sets the maximum amount of CPU time used by a process (in this case 32 megawords and 1 hour, respectively). $QSUB_WORKDIR is set by NQE to the directory from which you executed the qsub command. Your job script always executes under /bin/sh unless you add a line like #!/bin/csh to the top of the file.
The modules system of organizing and selecting software is used extensively within Cray. It is particularly useful to allow access to old, current, and new software releases. A typical setup for csh looks like this.
source /opt/modules/modules/init/csh module load PrgEnv nqe
This gives you access to the Programming Environment (compilers, linkers, and tools), and NQE (the batch queueing system). What the module command is doing is adding the bin directories for commands like f90, ld, and qsub to your $PATH environment variable.
To build MPI programs, you will also want to load the module mpt.