SRA Toolkit


SRAToolkit Encryption

The portion of the toolkit that handles encryption and decryption is a module/library called 'krypto'. This is significant to users of the NCBI utilities only in that that name will show up in various places. The name is with a 'k' to differenctiate from the common 'crypto' library name particularly in Unix like platforms using Open SSL.

Configuration

Encrypted file formats

Currently the SRA Toolkit utilities only use encryption in two file formats defined by NCBI. In the future it is expected to be used in more ways as the need arises.
To meet Federal Information Processing Standards (FIPS) the SRA Toolkit uses the Advanced Encryption Standard (AES) for the actual encryption of data.

Early Encryption Format - ncbi_enc

The first encryption format supported by the SRA Toolkit typically uses the extension ".ncbi_enc". It was used to encrypt files that were being sent to computers not on the NCBI network. It is now a deprecated format because a new requirement in file validation could not be supported. The SRA Toolkit fully supports reading and decrypting this format but no longer encrypts into this format.

Current Encryption Format - nenc

The current NCBI encryption format was designed for efficient processing without explicitly decrypting a file to the disk and to allow content validation without needing the password used to encrypt the file.

Password File

The toolkit starts with the belief that passwords put on command lines have an inherent security risk in that command line parameters can be seen by other users of a computer through Unix commands like 'ps'. To answer that weakness the SRA Toolkit uses passwords read from secure files or the environment. The preference being password files.
The SRA Toolkit uses its configuration to locate the password via an entry named 'krypto/pwfile'.
The contents of an encryption password file is normally some sort of text, but that is not a strict requirement. The password is up to 4096 bytes, which with UTF-8 or other encodings might be fewer characters. The only characters that can not be in this are the Carriage Return (CR) or Line Feed (LF) also called Newline ASCII control bytes. The cryptographic library will use the first part of the file up to the CR or LF. If the file is longer that 4096 bytes and does not have a CR or LF nefre the 4097th byte the library will call the file invalid.
Bytes after the first CR or LF are reserved for future use; perhaps as previously used passwords.

Encrypted command line arguments

Encryption specification in an URL

Some SRA Toolkit utilities use an Universal Resource Identifier (URI) syntax to identify files. The first incarnation of this is the "ncbi-file" scheme. This is an extension to the standard "file" scheme for an URL. The extension is addition of a query string. Two keys have been defined at this point 'encrypt' (or 'enc') and 'pwfile'. Encrypt has no value while the value for the pwfile is the path to the password file. The syntax for the hierarchical part of the URI is the same as the file scheme for the platform on which the Toolkit is running. As an example "ncbi-file:/home/usr/me/read1.nenc?encrypt&pwfile=/home/usr/me/password" has the 'ncbi-file' scheme, a hierarchical part of '/home/usr/me/read1.nenc' and a query part of 'encrypt&pwfile=/home/usr/me/password'. This is a file named 'read1.nenc' in the Unix common home directory for user 'me' and encrypted using a password that is in the file in the same directory named 'password'. If the program was being run on Windows that could have looked more like "ncbi-file:C:\Users\me?encrypt&pwfile=C:\Users\me\password" with its different syntax for a hierachical part.

Encryption Configuration

Password File - krypto/pwfile

Encryption (krypto) expects a symbol that identifies the location of the default password file. The path should be in the native file system notation.

Encryption Tools

nenctool

This tool will encrypt, decrypt or re-encrypt a single file.

nencvalid

This tool will verify the data integrity of an encrypted file. One would typically run this tool after the down load of an encrypted archive or other large file to ensure the download was complete and successful.

vdb-passwd

This tool uses configuration to find the designated password file and change the contents.