chado.feature package¶

Module contents¶

Contains possible interactions with the Chado Features

class chado.feature.FeatureClient(engine, metadata, session, ci)¶

Bases: chado.client.Client

Access to the chado features

delete_features(organism_id=None, analysis_id=None, name=None, uniquename=None)¶

Get all or some features

Parameters:	organism_id (int) – organism_id filter analysis_id (int) – analysis_id filter name (str) – name filter uniquename (str) – uniquename filter
Return type:	list of dict
Returns:	Features information

get_feature_analyses(feature_id)¶

Get analyses associated with a feature

Parameters:	feature_id (int) – Id of the feature
Return type:	list
Returns:	Feature analyses

get_feature_cvterms(feature_id)¶

Get cvterms associated with a feature

Parameters:	feature_id (int) – Id of the feature
Return type:	list
Returns:	Feature cvterms

get_features(organism_id=None, analysis_id=None, name=None, uniquename=None)¶

Get all or some features

Parameters:	organism_id (int) – organism_id filter analysis_id (int) – analysis_id filter name (str) – name filter uniquename (str) – uniquename filter
Return type:	list of dict
Returns:	Features information

load_fasta(fasta, organism_id, sequence_type='contig', analysis_id=None, re_name=None, re_uniquename=None, match_on_name=False, update=False, db=None, re_db_accession=None, rel_type=None, re_parent=None, parent_type=None)¶

Load features from a fasta file

Parameters:	fasta (str) – Path to the Fasta file to load organism_id (int) – Organism ID sequence_type (str) – Sequence type analysis_id (int) – Analysis ID re_name (str) – Regular expression to extract the feature name from the fasta sequence id (first capturing group will be used). re_uniquename (str) – Regular expression to extract the feature name from the fasta sequence id (first capturing group will be used). match_on_name (bool) – Match existing features using their name instead of their uniquename update (bool) – Update existing feature with new sequence instead of throwing an error db (int) – External database to cross reference to. re_db_accession (str) – Regular expression to extract an external database accession from the fasta sequence id (first capturing group will be used). rel_type (str) – Relation type to parent feature (‘part_of’ or ‘derives_from’). re_parent (str) – Regular expression to extract parent uniquename from the fasta sequence id (first capturing group will be used). parent_type (str) – Sequence type of the parent feature
Return type:	dict
Returns:	Number of inserted sequences

load_featureprops(tab_file, analysis_id, organism_id, prop_type, feature_type=None, match_on_name=False)¶

Load feature properties from a tabular file (Column1: feature name or uniquename, Column2: property value)

Parameters:	tab_file (str) – Path to the tabular file to load analysis_id (int) – Analysis ID organism_id (int) – Organism ID prop_type (str) – Type of the feature property (cvterm will be created if it doesn’t exist) feature_type (str) – Type of the target features in sequence ontology (will speed up loading if specified) match_on_name (bool) – Match features using their name instead of their uniquename
Return type:	dict
Returns:	Number of inserted featureprop

load_gff(gff, analysis_id, organism_id, landmark_type=None, re_protein=None, re_protein_capture='^(.*?)$', fasta=None, no_seq_compute=False, quiet=False, add_only=False, protein_id_attr=None)¶

Load features from a gff file

Parameters:	gff (str) – Path to the Fasta file to load analysis_id (int) – Analysis ID organism_id (int) – Organism ID landmark_type (str) – Type of the landmarks (will speed up loading if provided, e.g. contig, should be a term of the Sequence ontology) re_protein (str) – Replacement string for the protein name using capturing groups defined by –re_protein_capture re_protein_capture (str) – Regular expression to capture groups in mRNA name to use in –re_protein (e.g. “^(.?)-R([A-Z]+)$”, default=”^(.?)$”) protein_id_attr (str) – Attribute containing the protein uniquename. It is searched at the mRNA level, and if not found at CDS level. fasta (str) – Path to a Fasta containing sequences for some features. When creating a feature, if its sequence is in this fasta file it will be loaded. Otherwise for mRNA and polypeptides it will be computed from the genome sequence (if available), otherwise it will be left empty. no_seq_compute (bool) – Disable the computation of mRNA and polypeptides sequences based on genome sequence and positions. quiet (bool) – Hide progress information add_only (bool) – Use this flag if you’re not updating existing features, but just adding new features to the selected analysis and organism. It will speedup loading, and reduce memory usage, but might produce errors in case of already existing feature.
Return type:	None
Returns:	None

load_go(input, organism_id, analysis_id, query_type='polypeptide', match_on_name=False, name_column=2, go_column=5, re_name=None, skip_missing=False)¶

Load GO annotation from a tabular file

Parameters:	input (str) – Path to the input tabular file to load organism_id (int) – Organism ID analysis_id (int) – Analysis ID query_type (str) – The feature type (e.g. ‘gene’, ‘mRNA’, ‘polypeptide’, ‘contig’) of the query. It must be a valid Sequence Ontology term. match_on_name (bool) – Match features using their name instead of their uniquename name_column (int) – Column containing the feature identifiers (2, 3, 10 or 11; default=2). go_column (int) – Column containing the GO id (default=5). re_name (str) – Regular expression to extract the feature name from the input file (first capturing group will be used). skip_missing (bool) – Skip lines with unknown features or GO id instead of aborting everything.
Return type:	dict
Returns:	Number of inserted GO terms