chado.feature package¶
Module contents¶
Contains possible interactions with the Chado Features
-
class
chado.feature.
FeatureClient
(engine, metadata, session, ci)¶ Bases:
chado.client.Client
Access to the chado features
-
delete_features
(organism_id=None, analysis_id=None, name=None, uniquename=None)¶ Get all or some features
Parameters: - organism_id (int) – organism_id filter
- analysis_id (int) – analysis_id filter
- name (str) – name filter
- uniquename (str) – uniquename filter
Return type: list of dict
Returns: Features information
-
get_feature_analyses
(feature_id)¶ Get analyses associated with a feature
Parameters: feature_id (int) – Id of the feature Return type: list Returns: Feature analyses
-
get_feature_cvterms
(feature_id)¶ Get cvterms associated with a feature
Parameters: feature_id (int) – Id of the feature Return type: list Returns: Feature cvterms
-
get_features
(organism_id=None, analysis_id=None, name=None, uniquename=None)¶ Get all or some features
Parameters: - organism_id (int) – organism_id filter
- analysis_id (int) – analysis_id filter
- name (str) – name filter
- uniquename (str) – uniquename filter
Return type: list of dict
Returns: Features information
-
load_fasta
(fasta, organism_id, sequence_type='contig', analysis_id=None, re_name=None, re_uniquename=None, match_on_name=False, update=False, db=None, re_db_accession=None, rel_type=None, re_parent=None, parent_type=None)¶ Load features from a fasta file
Parameters: - fasta (str) – Path to the Fasta file to load
- organism_id (int) – Organism ID
- sequence_type (str) – Sequence type
- analysis_id (int) – Analysis ID
- re_name (str) – Regular expression to extract the feature name from the fasta sequence id (first capturing group will be used).
- re_uniquename (str) – Regular expression to extract the feature name from the fasta sequence id (first capturing group will be used).
- match_on_name (bool) – Match existing features using their name instead of their uniquename
- update (bool) – Update existing feature with new sequence instead of throwing an error
- db (int) – External database to cross reference to.
- re_db_accession (str) – Regular expression to extract an external database accession from the fasta sequence id (first capturing group will be used).
- rel_type (str) – Relation type to parent feature (‘part_of’ or ‘derives_from’).
- re_parent (str) – Regular expression to extract parent uniquename from the fasta sequence id (first capturing group will be used).
- parent_type (str) – Sequence type of the parent feature
Return type: dict
Returns: Number of inserted sequences
-
load_featureprops
(tab_file, analysis_id, organism_id, prop_type, feature_type=None, match_on_name=False)¶ Load feature properties from a tabular file (Column1: feature name or uniquename, Column2: property value)
Parameters: - tab_file (str) – Path to the tabular file to load
- analysis_id (int) – Analysis ID
- organism_id (int) – Organism ID
- prop_type (str) – Type of the feature property (cvterm will be created if it doesn’t exist)
- feature_type (str) – Type of the target features in sequence ontology (will speed up loading if specified)
- match_on_name (bool) – Match features using their name instead of their uniquename
Return type: dict
Returns: Number of inserted featureprop
-
load_gff
(gff, analysis_id, organism_id, landmark_type=None, re_protein=None, re_protein_capture='^(.*?)$', fasta=None, no_seq_compute=False, quiet=False, add_only=False, protein_id_attr=None)¶ Load features from a gff file
Parameters: - gff (str) – Path to the Fasta file to load
- analysis_id (int) – Analysis ID
- organism_id (int) – Organism ID
- landmark_type (str) – Type of the landmarks (will speed up loading if provided, e.g. contig, should be a term of the Sequence ontology)
- re_protein (str) – Replacement string for the protein name using capturing groups defined by –re_protein_capture
- re_protein_capture (str) – Regular expression to capture groups in mRNA name to use in –re_protein (e.g. “^(.*?)-R([A-Z]+)$”, default=”^(.*?)$”)
- protein_id_attr (str) – Attribute containing the protein uniquename. It is searched at the mRNA level, and if not found at CDS level.
- fasta (str) – Path to a Fasta containing sequences for some features. When creating a feature, if its sequence is in this fasta file it will be loaded. Otherwise for mRNA and polypeptides it will be computed from the genome sequence (if available), otherwise it will be left empty.
- no_seq_compute (bool) – Disable the computation of mRNA and polypeptides sequences based on genome sequence and positions.
- quiet (bool) – Hide progress information
- add_only (bool) – Use this flag if you’re not updating existing features, but just adding new features to the selected analysis and organism. It will speedup loading, and reduce memory usage, but might produce errors in case of already existing feature.
Return type: None
Returns: None
-
load_go
(input, organism_id, analysis_id, query_type='polypeptide', match_on_name=False, name_column=2, go_column=5, re_name=None, skip_missing=False)¶ Load GO annotation from a tabular file
Parameters: - input (str) – Path to the input tabular file to load
- organism_id (int) – Organism ID
- analysis_id (int) – Analysis ID
- query_type (str) – The feature type (e.g. ‘gene’, ‘mRNA’, ‘polypeptide’, ‘contig’) of the query. It must be a valid Sequence Ontology term.
- match_on_name (bool) – Match features using their name instead of their uniquename
- name_column (int) – Column containing the feature identifiers (2, 3, 10 or 11; default=2).
- go_column (int) – Column containing the GO id (default=5).
- re_name (str) – Regular expression to extract the feature name from the input file (first capturing group will be used).
- skip_missing (bool) – Skip lines with unknown features or GO id instead of aborting everything.
Return type: dict
Returns: Number of inserted GO terms
-