FDA-ARGOS FAQs

From HIVE Lab
Jump to navigation Jump to search
What is the ArgosDB and how is it organized?

ArgosDB was developed as a result of expanded funding for the FDA-ARGOS project, which is described in detail in the About page of this website. The database stores cross-kingdom QC attributes of clinically relevant organisms organized into respective datasets. The current datasets (as of March 2025) are: ngsQC_ARGOS, ngsQC_ARGOS_unreviewed, assemblyQC_ARGOS, assemblyQC_ARGOS_unreviewed, biosampleMeta_ARGOS, and biosampleMeta_ARGOS unreviewed. The original four key datasets are ngsQC_HIVE, assemblyQC_HIVE, siteQC_HIVE, and biosampleMeta_HIVE. These datasets are associated with core QC protocols, which are documented via Biocompute Objects (BCO) and organized under their BCO IDs.

When new QC data has been produced per an organism of interest, the respective dataset(s) is/are appended, which is documented per data release in the FDA-ARGOS GitHub (https://github.com/FDA-ARGOS). All datasets are in alignment with the current data dictionary (v1.6 as of March 2025), which guides the QC process for that dataset as well as the column headers for a given dataset. Datasets are available as either .tsv or FASTA files if associated with a genome assembly, and all datasets and BCOs are available for download. All data provenance and curation are captured and reproducible via their BCO. Additional datasets include data from the original FDA BioProject, the Data Dictionary, Drug Resistance Mutations, Genome Assemblies (multiple), and a mapping key that assists in linking all the available data via important accessions.

A total of 24 datasets are available as of 03/2025.

How can I view or access the previous versions of the data?

First, go to the Release History tab on the ARGOS home page. Next, click details on the desired data object. Then select the desired version and data from the version transition dropbox on the top left corner and view metrics such as field count, fields added, fields removed, row count, row count prev, rows count change, ID count, IDs added, and IDs removed.

What does the schema version in the datasets refer to?

The schema relates to the organization of the ARGOS data within the data model. The version is reflective of the FDA-ARGOS data dictionary version that is currently applied to all updated datasets. As of March 2025, the current schema is v1.6 and can be found in the FDA-ARGOs GitHub.

Does Argos have a tutorial on how to use the site?

Yes! Please follow the below basic instructions of how to navigate the DB:

How to find and search a dataset

On data.argosdb.org home page, you can search a dataset by entering the keyword in Search Datasets.

Keywords can be BCO ID, organism name, or even a term that describes biological processes. In the following example, three results appear upon the search for ebola.

To further narrow down the result, select filters on the left side bar. Alternatively, users can search datasets by selecting relevant filters on the left side bar.

data.argosdb Home Page
'ebola' searched in the search bar of the ARGOS database
How to select a dataset

Next, to select a dataset, click on view details under DETAILS. Previous released versioned datasets are available upon clicking the dropdown button

view of an example dataset after clicking on the '...view details' link on the homepage. The dropdown menu at the top lets you select data versions.
How to view and download the BCO that is corresponding to its dataset

To download the dataset, click on the DOWNLOADS tab and select the download format for the target dataset. BCO JSON will be downloaded and automatically opened as a .txt file upon clicking on Download BCO. Dataset will be downloaded and automatically opened either as .tsv or .csv file upon clicking on Download dataset file.

BCO JSON tab of the dataset.
Downloads tab for the dataset. BCO and table can be downloaded here.