import os
from typing import List
from pprint import pprint
from biokb_taxtree import get_session
from biokb_taxtree.db import models
os.environ.pop("CONNECTION_STR", None) # to make sure no environment variable is used
pass
Query and search data¶
BioKb-TaxTree uses SQLAlchemy to define the database schema for storing chemical compound data from the TaxTree database. The following diagram illustrates the main entities and their relationships:
The central entity in the data model is the Compound, which represents a chemical compound in TaxTree. Other key entities include ChemicalData, Structure, Name, Relation, Reference, DatabaseAccession, Source, Comment, and Status. Each entity has its own set of attributes and relationships with other entities.
Overview¶
You can query the database using SQLAlchemy.
First import the data using the import_data function. You can skip this, if you have already done so. This will download the TaxTree data files, parse them, and populate the database. Depending on your system and internet connection, this may take some time.
from biokb_taxtree import import_data
import_data()
Example Query¶
from sqlalchemy import select
import pandas as pd
with get_session() as session:
stmt = (
select(
models.Node.tax_id,
models.Name.name_txt.label("organism_name"),
)
.select_from(models.Name)
.join(models.Node)
.where(models.Name.name_txt.ilike("%Asteraceae%"))
.limit(3)
)
organism = session.execute(stmt).all()
df = pd.DataFrame(organism)
df