3.1 (harebrained bio)

These are libraries for bioinformatics/computational biology. (harebrained bio sequences) provides procedures dealing with biological sequences. (harebrained bio collections) provides a general container format to be used throughout (harebrained bio). (harebrained bio read) contains sublibraries to read common file formats in bioinformatics (e.g. FASTA).

The following graph shows the hierarchy of record types defined by (harebrained bio). The arrow denotes a "parent of" relationship of single heritance.

3.1.1 (harebrained bio sequences)

record-type
<bioseq> : bioseq?
= (define-record-type <bioseq> (parent <named>))

The record type to represent a biological sequence.

record-type
<dna> : dna? = (define-record-type <dna> (parent <bioseq>))

A specialised record-type to represent DNA sequences. It is a child of <bioseq>.

record-type
<rna> : rna? = (define-record-type <rna> (parent <bioseq>))

A specialised record-type to represent RNA sequences. It is a child of <bioseq>.

record-type
<protein> : protein?
= (define-record-type <protein> (parent <bioseq>))

A specialised record-type to represent amino acid sequences. It is a child of <bioseq>.

procedure
(bioseq-length bs) → integer?
bs : bioseq?

Return the length of a bioseq

procedure
(bioseq-ref bs i) → symbol?
bs : bioseq?
i : integer?

Return the symbol at position i of the bioseq.

procedure
(bioseq->string bs) → string?
bs : bioseq?

Return the sequence represented by bs as a string.

procedure
(dna-complement bs) → dna?
bs : dna?

Return the complement of bs.

procedure
(bioseq-reverse bs) → bioseq?
bs : bioseq?

Return the reverse sequence of bs.

procedure
(dna-reverse-complement bs) → dna?
bs : dna?

Return the reverse complement of bs.

procedure
(bioseq-subsequence bs i j) → bioseq?
  bs : bioseq?
  i : integer?
  j : integer?

Return the subsequence of bs from i (included) to j (excluded). Indices are 0 based.

procedure
(dna->rna dna) → rna?
dna : dna?

Transcribe a DNA sequence into RNA by substituting all T nucleotides with U. Returns a bioseq of type <rna>.

procedure
(rna->dna dna) → dna?
dna : rna?

Transcribe a RNA sequence into DNA by substituting all U nucleotides with T. Returns a bioseq of type <dna>.

procedure
(bioseq-counts bs) → hashtable?
bs : bioseq?

Return a hashtable mapping the counts of each distinct symbol of the sequence.

procedure
(bioseq-rename bs name) → bioseq?
bs : bioseq?
name : string?

Return a bioseq of same type with name name.

3.1.2 (harebrained bio collections)

record-type
<biocol> : biocol?
= (define-record-type <biocol> (parent <named>))

A biocollection (short biocol) is a container data structure, that allows access by numeric index as well as string based index. As such it is both a sequence of values as well as a hash. A biocol maintains the insertion order of its elements. It is design to contain instances of <named> types, however this is not a requirement. Procedures that treat a <biocol> like a hash are listed in Hash based procedures and procedures that treat a <biocol> like a sequence are listed in Sequence based procedures below.

procedure
(make-biocol name values [names]) → biocol?
  name : string?
  values : (or/c list? vector?)
   names : (or/c (list-of string?) (vector-of string?))
= (map name values)

Procedure to create a <biocol> instance. The first argument is the name of the resulting instance. values is either a list or vector of objects to be inserted into the collection. The optional names argument is a list or vector of strings. If it is omitted, then we assume values only contain elements of type <named> and the name will be automatically extracted by applying the procedure name on each element.

procedure
(biocol-name bc) → string?
bc : biocol?

Return the name of a <biocol>.

procedure
(biocol-rename bc name) → biocol?
bc : biocol?
name : string?

Return a new <biocol> with name name.

procedure
(biocol-copy bc) → biocol?
bc : biocol?

Return a deep copy of bc.

(eq? bc (biocol-copy bc))

should return false for every biocol bc.

procedure
(biocol-ref bc index [default]) → any?
  bc : biocol?
  index : (or/c integer? string?)
   default : thunk?
= (lambda() (error 'biocol-ref "key not found" index))

Retrieve the value at index index. index can be a numeric index into the sequence of values or a string key within the association.

Hash based procedures

procedure
(biocol-contains? bc index) → boolean?
bc : biocol?
index : string?

Returns #t if biocol bc contains the key index and #f otherwise.

procedure
(biocol-set bc key val) → biocol?
  bc : biocol?
  key : string?
  val : any?

Either add a new association between key and val to the hash (and append val to the sequence) or set the value at key to a new value (and update the object in the sequence).

Sequence based procedures

procedure
(biocol-length bc) → integer?
bc : biocol?

Return the length of biocol bc i.e. the number of elements in biocol bc.

3.1.3 (harebrained bio read)

The set of libraries under (harebrained bio read) collect procedures to read common biological file formats.

(harebrained bio read fasta)

procedure
(read-fasta file) → (or/c bioseq? biocol?)
file : (or/c port? string?)

Read a file in FASTA format. The file argument can either be a path to a file, or a textual input port or a binary input port (as from a gzip connection) in which case it is assumed to be UTF-8 encoded. If the file contains a single sequence, then a bioseq is returned. Else a biocol is returned. The names of each of the sequences parsed are the complete refline (i.e. everything after the ’>’ symbol).

procedure
(make-bioseq seq [name]) → bioseq?
seq : string?
name : string? = ""

Create a bioseq from a string with name name. If name is omitted the empty string is used.

contents ← prev up next →

1	Getting Started
2	User Guide
3	Library Reference
4	Cookbook
5	Release notes

3.1.1	(harebrained bio sequences)
3.1.2	(harebrained bio collections)
3.1.3	(harebrained bio read)