Type:
interface
Header:
align/iterator.h
Revision History:
2012-Feb-02 | • | initial |
2012-Feb-28 | • | reflect API modifications to constructors, structures |
2012-May-04 | • | documented negative placement starting coords |
Contents:
The PlacementIterator is an interface that allows for walking a window of placements along the reference of a single run. On each iteration, one or more placements become available at a position until the placements are exhausted within the window.
The placement record is described as an open structure as part of the requirement to allow a user to extend this record.
open structure of PlacementRecord - to be extended by user
struct PlacementRecord { DLNode n; int64_t id; const ReferenceObj *ref; INSDC_coord_zero pos; INSDC_coord_len len; int32_t mapq; };
n
the structure is designed for inclusion in a doubly-linked list
id
the row-id of the placement (alignment) within its alignment table
ref
object representing reference sequence
each record gets its own counted reference
pos
the starting position of the placement on the reference
coordinates are zero-based
NB - pos can be negative (see below)
len
the length of the placement on the reference
mapq
stated mapping quality of alignment
The idea of this structure is to provide an interface both for its consumer and its producer, to be handled by the iterator.
When the iterator is used only for walking placements but not for looking within at the actual alignment, this structure is unlikely to be extended, since it gives the user the ability to quickly determine spatial relationships without detail at each position.
However, when zooming in on base-per-base alignments, the mode of operation will shift toward creation of richly populated records that can be individually examined at the resolution of a single base position.
The inclusion of mapq here is for the purposes of denormalization, giving the earliest possible filtering.
There is a case when alignment placements may be given with a negative
starting coordinate. This happens when an alignment has been found to
wrap around a circular reference and terminate at a lower coordinate
than where it starts. These alignments are linearized
by
subtracting the length of the circular reference from the starting
coordinate. This keeps the start < end.
cast a placement record to one of two possible extension objects
allows up to three independent classes to be combined
void PlacementRecordWhack ( const PlacementRecord *self );
If the user provided a whack function (automatically stored within the object), it will be called to clean up and dispose of the record.
Otherwise, the implementation will simply call free() to release memory.
douse a placement record
calls user code if provided
void PlacementRecordWhack ( const PlacementRecord *self );
If the user provided a whack function (automatically stored within the object), it will be called to clean up and dispose of the record.
Otherwise, the implementation will simply call free() to release memory.
ask the alignment manager to create an iterator from individual components
rc_t AlignMgrMakePlacementIterator ( const AlignMgr *self, PlacementIterator **iter, uint64_t ref_pos, uint32_t ref_len, int64_t starting_ref_row, uint32_t ref_row_count, const VCursor *ref, const VCursor *align, bool secondary, rc_t ( * CC populate ) ( PlacementRecord **rec, const VCursor *align, int64_t id, uint64_t pos, uint32_t len, void *data ), void *data, void ( * CC whack ) ( void *obj ) );
iter - OUT
return parameter for the iterator
ref_pos
starting position of alignment in reference coordinates
ref_len
length of projection onto reference in reference space
starting_ref_row
starting row within ref cursor
externally determined to include desired window
ref_row_count
the number of rows to read from ref cursor
ref
cursor onto REFERENCE table of run
will be modified as necessary to include required columns
will be opened by iterator
align
cursor onto either PRIMARY_ALIGNMENT or
SECONDARY_ALIGNMENT of table of run
which one is indicated by secondary param
will be modified as necessary to include required columns
will be opened by iterator
secondary
boolean true if
align cursor is on SECONDARY_ALIGNMENT table
populate - NULL OKAY
optional callback function to generate richly
populated PlacementRecord
data - OPAQUE
user data sent in callback to
populate function
whack - NULL OKAY
optional destructor/deallocator function
may be ignored if populate is
NULL
The user will translate the position and length of the window onto the reference into a range of row-ids within the REFERENCE table. This range should be sufficiently ample to discover placements that may begin BEFORE the window but still intersect with it.
The user will create two read-only cursors for a given cSRA object - one on the REFERENCE table and another on one of the two possible alignment tables, depending upon whether primary or secondary alignments are being examined. These will be used to construct the iterator object.
Indication of whether the align table is primary or secondary affects the iterator's query onto the reference table, which is why it is supplied as a stand-alone parameter.
If the user intends to examine a placement in any greater detail than its id, position and length projected upon the reference, then a callback function should be supplied. This function will allocate a structure having as its first member a PlacementRecord and should initialize any additional members within the function:
struct MyPlacementRecord { PlacementRecord dad; const INSDC_dna_text *read; }; static rc_t MyPopulateFunc ( PlacementRecord **recp, const VCursor *align, int64_t id, uint64_t pos, uint32_t len, void *data ) { rc_t rc; struct MyPlacementRecord *rec; /* allocate structure - error handling omitted... */ rec = malloc ( sizeof * rec ); /* id, pos and len are provided for convenience, but I don't have to use them or fill out dad. */ /* initialize my part of the record */ rc = read_and_copy_READ ( align, & rec -> read ); /* return to iterator */ * recp = & rec -> dad; return rc; } static void MyWhackFunc ( void *obj ) { struct MyPlacementRecord *rec = obj; free ( rec -> read ); free ( rec ); }
As shown above, a custom populate function will often beg a custom destructor/deallocator function. NB: if you provide such a function, it MUST deallocate the object.
duplicate an existing reference
rc_t PlacementIteratorAddRef ( const PlacementIterator *self );
The object is defined as being reference counted. In VDB-2, references are direct pointers to objects and the objects maintain a reference counter.
release an existing reference
potentially whacks object
rc_t PlacementIteratorRelease ( const PlacementIterator *self );
The object is defined as being reference counted. In VDB-2, references are direct pointers to objects and the objects maintain a reference counter.
NULL pointers are ignored.
check the next available position on reference having
one or more placements
returns position and optionally length
rc_t PlacementIteratorNextAvailPos ( const PlacementIterator *self, uint64_t *pos, uint64_t *len );
pos - OUT
the reference position where the next available placement starts
NB - can be negative if the alignment wraps around
len - OUT, NULL OKAY
optional parameter returning the length of the next available placement
This message returns information about the next available placement, or if none are available, causes the iterator to search for more in its open cursors.
If no further placements are found, a non-zero return code will be issued. TBD
The exact position returned is used to read placement records using either NextRecordAt or NextIdAt.
The optional returned length is useful for performing a merge-sort on the available placements from several iterators. This message may be safely invoked any number of times, where the only side-effect possible is a single attempt at retrieving more data (on the initial invocation).
retrieve and consume next available PlacementRecord
rc_t PlacementIteratorNextRecordAt ( PlacementIterator *self, uint64_t pos, const PlacementRecord **rec );
pos
the exact position returned by
NextAvailPos
identifies location being queried
rec - OUT
return parameter for the next available placement
at pos
This message allows a single record to be obtained on each invocation, where the intent is that the caller will loop until no further records are found at the stated position.
By looping, the code is not forced to create lists of placements that align at the exact same starting point, which further allows using multiple iterators in a sort-merge configuration.
As mentioned before, the record is designed to be held in a doubly-linked list and freed independently. The caller obtains locally sorted records from this iterator and places them into the list.
retrieve information from the next available PlacementRecord
douse the record upon return
rc_t PlacementIteratorNextIdAt ( PlacementIterator *self, uint64_t pos, int64_t *row_id, uint64_t *len );
pos
the exact position returned by
NextAvailPos
identifies location being queried
row_id - OUT
return parameter for the next placement's id
len - OUT, NULL OKAY
optional return parameter for the next placement's length
This message simply extracts information held within internal records. See NextRecordAt.