Linux Route

Retrieving sequencing data via Linux requires finding the accession numbers and setting up a suitable environment. If you’re familiar with the command line, you can use the SRA Toolkit. After identifying the accession numbers of the chosen FASTQ files, employ commands like fastq-dump to download them directly to your local machine.

Retrieve an SRA accession list

  1. Visit the NCBI Sequence Read Archive (SRA) website (https://www.ncbi.nlm.nih.gov/sra).

  2. In the search bar, input β€œPRJNA834801” and press Enter.

  3. The search results will display the project page for PRJNA834801. Click on it to access the project details.

  4. Look for information regarding the experimental design, sample annotations, or metadata that differentiate Control and Non-Control groups within the project.

  5. Use the SRA Toolkit or the NCBI SRA Run Selector (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA834801) to filter and download ten random Control and Non-Control FASTQ files or go ahead and download them all if you have the space. Utilize filters such as sample attributes, experimental factors, or other annotations to select the specific files corresponding to the Control and Non-Control groups of interest.

  6. Download the list of selected accession numbers list for further analysis or processing in your research.

Once you have selected the FASTQ files of interest from the NCBI Sequence Read Archive (SRA), there are several ways to download them.

Download the accession list

In this example, I will use Linux that I have set up with fastq-dump and the prerequisites.

SRA Run Selector Web Interface: Using the NCBI SRA Run Selector interface (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA834801), you can select the desired files and then click on the β€œAccession List” button to access a list of accession numbers. Afterward, utilize the fastq-dump command from the SRA Toolkit or the NCBI SRA Toolkit web browser to download the files directly.

Provided below is the accession list (list of sequence identifiers) of a randomized selection of sequence runs within the selected project.

Set up SRA Toolkit on Linux

To set up the SRA Toolkit and its prerequisites on Linux, follow these steps:

For Debian/Ubuntu-based systems

sudo apt-get update sudo apt-get install sra-toolkit

For Red Hat-based systems

sudo yum install sra-toolkit

Install prerequisites

Ensure you have the required dependencies installed. These typically include wget, curl, and libxml-libxml-perl:

For Debian/Ubuntu-based systems

sudo apt-get install wget curl libxml-libxml-perl

For Red Hat-based systems

sudo yum install wget curl perl-libxml-perl

Verify installation

  1. After installation, verify if the SRA Toolkit is correctly installed by running the command:

fastq-dump --version
  1. This command should display the installed SRA Toolkit version information if the installation was successful.

Once installed, you can use the fastq-dump command to download FASTQ files from the NCBI Sequence Read Archive (SRA) using their accession numbers or URLs.

Download selected accession files

After verifying the installation of the SRA Toolkit, to download the selected SRA files using an accession list, follow these steps:

  1. Accession List Preparation:

    • Create a text file containing the accession numbers of the SRA files you want to download. Each accession number should be on a separate line in the file.

  2. Download SRA Files:

    • Open a terminal window and navigate to the directory where you have the accession list text file.

  3. Use SRA Toolkit to Download:

    • Run the prefetch command from the SRA Toolkit, providing the accession list file as input. For example:

prefetch --option-file accession_list.txt

Replace accession_list.txt with the actual filename containing your accession numbers. For example: PRJNA834801_accession_subselection.txt.

  1. Conversion to FASTQ:

    • Once the prefetch command completes, use the fastq-dump command to convert the downloaded SRA files to FASTQ format. For instance:

fastq-dump --split-files SRRXXXXXX

Replace SRRXXXXXX with the specific accession numbers obtained from the SRA. This command will convert the downloaded SRA files to FASTQ format. Adjust as necessary for your downloaded files.

These commands will download the selected SRA files based on the accession list you’ve prepared and subsequently convert them to FASTQ format for further analysis or processing in your research.

Download selected accession files to a specific location

To download the SRA files to a specific location, such as a directory named β€˜sra_downloads’ on a mounted D drive, follow these modified steps:

  1. Accession List Preparation:

    • Create a text file containing the accession numbers of the SRA files you want to download. Each accession number should be on a separate line in the file.

  2. Download SRA Files to a Specific Location:

    • Open a terminal window and navigate to the directory where you have the accession list text file.

    • Use the prefetch command from the SRA Toolkit, specifying the target directory using the -O option. For example:

prefetch --option-file accession_list.txt -O /mnt/d/sra_downloads

Replace accession_list.txt with the actual filename containing your accession numbers. The -O flag followed by the directory path /mnt/d/sra_downloads indicates the specific location where the downloaded files will be stored.

Convert SRA files to Fastq to a specific location

  1. Conversion to Fastq:

    • After downloading the SRA files, navigate to the directory where they are stored (/mnt/d/sra_downloads in this example).

    • Use the fastq-dump command to convert the downloaded SRA files to fastq format. For instance:

fastq-dump --split-files -O /mnt/d/sra_downloads SRRXXXXXX

Replace SRRXXXXXX with the specific accession number obtained from the SRA. This command will convert the downloaded SRA file to FASTQ format and place it in the specified directory.

These commands will download the selected SRA files based on the accession list to the specified location on your mounted D drive and convert them to FASTQ format for further analysis. Adjust paths and commands according to your system setup and requirements.

Download and convert multiple SRA files at once

To convert multiple downloaded SRA files to FASTQ format within the same fastq-dump command, you can do the following:

  1. Download SRA Files:

    • Use prefetch to download the SRA files to the specified directory on your D drive:

prefetch --option-file accession_list.txt -O /mnt/d/sra_downloads
  1. Conversion to FASTQ for Multiple Files:

    • Navigate to the directory where the downloaded SRA files are stored (/mnt/d/sra_downloads).

    • Use the fastq-dump command with the –split-files option to convert multiple SRA files to FASTQ format in one command:

fastq-dump --split-files *.sra

This command utilizes the wildcard *.sra to specify that all SRA files present in the current directory should be converted to FASTQ format.

Make sure to navigate to the correct directory where your downloaded SRA files are stored before running the fastq-dump command using the wildcard to convert all the SRA files to FASTQ simultaneously. Adjust paths and file extensions according to your specific setup if needed.

Last updated

Logo

EzBioCloudΒ© 2024. All Rights Reserved