chado.feature package

Module contents

Contains possible interactions with the Chado Features

class chado.feature.FeatureClient(engine, metadata, session, ci)

Bases: chado.client.Client

Access to the chado features

delete_features(organism_id=None, analysis_id=None, name=None, uniquename=None)

Get all or some features

Parameters:
  • organism_id (int) – organism_id filter
  • analysis_id (int) – analysis_id filter
  • name (str) – name filter
  • uniquename (str) – uniquename filter
Return type:

list of dict

Returns:

Features information

get_feature_analyses(feature_id)

Get analyses associated with a feature

Parameters:feature_id (int) – Id of the feature
Return type:list
Returns:Feature analyses
get_feature_cvterms(feature_id)

Get cvterms associated with a feature

Parameters:feature_id (int) – Id of the feature
Return type:list
Returns:Feature cvterms
get_features(organism_id=None, analysis_id=None, name=None, uniquename=None)

Get all or some features

Parameters:
  • organism_id (int) – organism_id filter
  • analysis_id (int) – analysis_id filter
  • name (str) – name filter
  • uniquename (str) – uniquename filter
Return type:

list of dict

Returns:

Features information

load_fasta(fasta, organism_id, sequence_type='contig', analysis_id=None, re_name=None, re_uniquename=None, match_on_name=False, update=False, db=None, re_db_accession=None, rel_type=None, re_parent=None, parent_type=None)

Load features from a fasta file

Parameters:
  • fasta (str) – Path to the Fasta file to load
  • organism_id (int) – Organism ID
  • sequence_type (str) – Sequence type
  • analysis_id (int) – Analysis ID
  • re_name (str) – Regular expression to extract the feature name from the fasta sequence id (first capturing group will be used).
  • re_uniquename (str) – Regular expression to extract the feature name from the fasta sequence id (first capturing group will be used).
  • match_on_name (bool) – Match existing features using their name instead of their uniquename
  • update (bool) – Update existing feature with new sequence instead of throwing an error
  • db (int) – External database to cross reference to.
  • re_db_accession (str) – Regular expression to extract an external database accession from the fasta sequence id (first capturing group will be used).
  • rel_type (str) – Relation type to parent feature (‘part_of’ or ‘derives_from’).
  • re_parent (str) – Regular expression to extract parent uniquename from the fasta sequence id (first capturing group will be used).
  • parent_type (str) – Sequence type of the parent feature
Return type:

dict

Returns:

Number of inserted sequences

load_featureprops(tab_file, analysis_id, organism_id, prop_type, feature_type=None, match_on_name=False)

Load feature properties from a tabular file (Column1: feature name or uniquename, Column2: property value)

Parameters:
  • tab_file (str) – Path to the tabular file to load
  • analysis_id (int) – Analysis ID
  • organism_id (int) – Organism ID
  • prop_type (str) – Type of the feature property (cvterm will be created if it doesn’t exist)
  • feature_type (str) – Type of the target features in sequence ontology (will speed up loading if specified)
  • match_on_name (bool) – Match features using their name instead of their uniquename
Return type:

dict

Returns:

Number of inserted featureprop

load_gff(gff, analysis_id, organism_id, landmark_type=None, re_protein=None, re_protein_capture='^(.*?)$', fasta=None, no_seq_compute=False, quiet=False, add_only=False, protein_id_attr=None)

Load features from a gff file

Parameters:
  • gff (str) – Path to the Fasta file to load
  • analysis_id (int) – Analysis ID
  • organism_id (int) – Organism ID
  • landmark_type (str) – Type of the landmarks (will speed up loading if provided, e.g. contig, should be a term of the Sequence ontology)
  • re_protein (str) – Replacement string for the protein name using capturing groups defined by –re_protein_capture
  • re_protein_capture (str) – Regular expression to capture groups in mRNA name to use in –re_protein (e.g. “^(.*?)-R([A-Z]+)$”, default=”^(.*?)$”)
  • protein_id_attr (str) – Attribute containing the protein uniquename. It is searched at the mRNA level, and if not found at CDS level.
  • fasta (str) – Path to a Fasta containing sequences for some features. When creating a feature, if its sequence is in this fasta file it will be loaded. Otherwise for mRNA and polypeptides it will be computed from the genome sequence (if available), otherwise it will be left empty.
  • no_seq_compute (bool) – Disable the computation of mRNA and polypeptides sequences based on genome sequence and positions.
  • quiet (bool) – Hide progress information
  • add_only (bool) – Use this flag if you’re not updating existing features, but just adding new features to the selected analysis and organism. It will speedup loading, and reduce memory usage, but might produce errors in case of already existing feature.
Return type:

None

Returns:

None

load_go(input, organism_id, analysis_id, query_type='polypeptide', match_on_name=False, name_column=2, go_column=5, re_name=None, skip_missing=False)

Load GO annotation from a tabular file

Parameters:
  • input (str) – Path to the input tabular file to load
  • organism_id (int) – Organism ID
  • analysis_id (int) – Analysis ID
  • query_type (str) – The feature type (e.g. ‘gene’, ‘mRNA’, ‘polypeptide’, ‘contig’) of the query. It must be a valid Sequence Ontology term.
  • match_on_name (bool) – Match features using their name instead of their uniquename
  • name_column (int) – Column containing the feature identifiers (2, 3, 10 or 11; default=2).
  • go_column (int) – Column containing the GO id (default=5).
  • re_name (str) – Regular expression to extract the feature name from the input file (first capturing group will be used).
  • skip_missing (bool) – Skip lines with unknown features or GO id instead of aborting everything.
Return type:

dict

Returns:

Number of inserted GO terms