Chado Library¶

A Python library for interacting with a Chado database.

Installation¶

$ pip install chado

# On first use you'll need to create a config file to connect to the database, just run:

$ chakin init
Welcome to Chado's Chakin! (茶巾)
PGHOST: xxxx
PGDATABASE: xxxx
PGUSER: xxxx
PGPASS:
PGPORT: 5432
PGSCHEMA: public

This will create a chakin config file in ~/.chakin.yml

Examples¶

from chado import ChadoInstance
ci = ChadoInstance(dbhost="localhost", dbname="chado", dbuser="chado", dbpass="chado", dbschema="public", dbport=5432)

# Create human species
org = ci.organism.add_organism(genus="Homo", species="sapiens", common="Human", abbr="H.sapiens")

# Then display the list of organisms
orgs = ci.organism.get_organisms()

for org in orgs:
    print('{} {}'.format(org.genus, org.species))

# Create an analysis
an = ci.analysis.add_analysis(name="My cool analysis",
                                   program="Something",
                                   programversion="1.0",
                                   algorithm="Google",
                                   sourcename="src",
                                   sourceversion="2.1beta",
                                   sourceuri="http://example.org/",
                                   date_executed="2018-02-03")

# And load some data
ci.feature.load_fasta(fasta="./test-data/genome.fa", analysis_id=an['analysis_id'], organism_id=orgs[0]['organism_id'])
ci.feature.load_gff(gff="./test-data/annot.gff", analysis_id=an['analysis_id'], organism_id=orgs[0]['organism_id'])

Or with the Chakin client:

$ my_org=`chakin organism add_organism --species sapiens Homo Human H.sapiens  | jq -r '.organism_id'`

$ chakin organism get_organisms
[
    {
        "organism_id": 1133,
        "genus": "Homo",
        "species": "sapiens",
        "abbreviation": "H.sapiens",
        "common_name": "Human",
        "comment": null
    }
]

# Then load some data
$ my_analysis=`chakin analysis add_analysis \
    "My cool analysis" \
    "Something" \
    "v1.0" \
    "src" | jq -r '.analysis_id'`


$ chakin feature load_fasta \
    --analysis_id $my_analysis \
    --sequence_type contig \
    ./test-data/genome.fa $my_org

History¶

2.3.9
- URL decode GFF ids when loading blast/interpro/others
2.3.8
- Fix connection closed error when loading big interproscan files
2.3.7
- Fix loading of expression data when first column header is not empty
2.3.6
- Fix loading of GO terms from GFF
2.3.5
- Fix has_table() calls with recent sqlalchemy versions
2.3.4
- Now requires biopython >=1.78
- Fixes biopython sequence usage in recent biopython
2.3.3
- Now requires python >= 3.6
- Better error reporting for blast loader
2.3.2
- Fix interproscan loader only loading the first result of XML v5
- Fix interproscan loader failing to load IPR by name
2.3.1
- Fix data loading in Tripal database
2.3.0
- Fix non working –re_parent option in fasta loader
- allow connection using a preformatted url (needed by galaxy tools using pgutil)
- added loading of Blast and InterProScan data
- moved chakin feature load_go to chakin load go
- fix sequence computing when landmark sequence is available in the db
- add more options to match features in expression matrix loader (query_type, match_on_name, re_name, skip_missing)
2.2.6
- fix requirement name for psycopg2 (name change for version >=2.8)
2.2.5
- Added support for units in expression loaders
- Fix error in load_gff when no source is specified
2.2.4
- Fix broken –skip_missing option for load_go
2.2.3
- Throw a warning instead of an exception when a GFF target feature does not exist
2.2.2
- Bug fixes and improvements to the expression module
2.2.1
- Minor release to fix broken package at pypi, no code change
2.2.0
- Added feature.load_go() to load GO annotation (blast2go results)
- Added feature.get_feature_analyses() to fetch the analyses associated with a feature
- Added feature.get_feature_cvterms() to fetch the cvterms associated with a feature
- Added support for biomaterial/expression data (as used by tripal_analysis_expression)
- New –protein_id_attr option for feature.load_gff()
2.1.5
- bugfix: fix features deletion when deleting an analysis
2.1.4
- bugfix: fix sporadic errors with AnalysisFeature class declaration
2.1.3
- bugfix: make –species a mandatory arg for organism creation
- bugfix: fix features deletion when deleting an analysis or an organism
- update chado docker image
2.1.2
- skip whole database schema reflection for simple tasks (analysis and organism management)
- fix polypeptide creation for genes beginning at position 0
- fix various small bugs in phylogeny and featureprop loading
- fix bug in cvterm creation
- fix crashes in gbk/gff exporters
2.1.1
- newick: remove prefix from node labels too
- newick: fix errors with named internal nodes
2.1
- auto reflect db schema
- add phylogeny module
- load features from fasta
- load features from gff3
- load featureprops from tabular file
- make chakin util commands work when db is offline
- add unit tests
2.0
- “Chakin” CLI utility
- Complete package restructure
- Nearly all functions renamed

License¶

Available under the MIT License