Linux Route
Last updated
Last updated
EzBioCloudΒ© 2024. All Rights Reserved
Retrieving sequencing data via Linux requires finding the accession numbers and setting up a suitable environment. If youβre familiar with the command line, you can use the SRA Toolkit. After identifying the accession numbers of the chosen FASTQ files, employ commands like fastq-dump to download them directly to your local machine.
Visit the NCBI Sequence Read Archive (SRA) website (https://www.ncbi.nlm.nih.gov/sra).
In the search bar, input βPRJNA834801β and press Enter.
The search results will display the project page for PRJNA834801. Click on it to access the project details.
Look for information regarding the experimental design, sample annotations, or metadata that differentiate Control and Non-Control groups within the project.
Use the SRA Toolkit or the NCBI SRA Run Selector (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA834801) to filter and download ten random Control and Non-Control FASTQ files or go ahead and download them all if you have the space. Utilize filters such as sample attributes, experimental factors, or other annotations to select the specific files corresponding to the Control and Non-Control groups of interest.
Download the list of selected accession numbers list for further analysis or processing in your research.
Once you have selected the FASTQ files of interest from the NCBI Sequence Read Archive (SRA), there are several ways to download them.
In this example, I will use Linux that I have set up with fastq-dump and the prerequisites.
SRA Run Selector Web Interface: Using the NCBI SRA Run Selector interface (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA834801), you can select the desired files and then click on the βAccession Listβ button to access a list of accession numbers. Afterward, utilize the fastq-dump command from the SRA Toolkit or the NCBI SRA Toolkit web browser to download the files directly.
Provided below is the accession list (list of sequence identifiers) of a randomized selection of sequence runs within the selected project.
To set up the SRA Toolkit and its prerequisites on Linux, follow these steps:
Ensure you have the required dependencies installed. These typically include wget, curl, and libxml-libxml-perl:
For Debian/Ubuntu-based systems
For Red Hat-based systems
After installation, verify if the SRA Toolkit is correctly installed by running the command:
This command should display the installed SRA Toolkit version information if the installation was successful.
Once installed, you can use the fastq-dump command to download FASTQ files from the NCBI Sequence Read Archive (SRA) using their accession numbers or URLs.
After verifying the installation of the SRA Toolkit, to download the selected SRA files using an accession list, follow these steps:
Accession List Preparation:
Create a text file containing the accession numbers of the SRA files you want to download. Each accession number should be on a separate line in the file.
Download SRA Files:
Open a terminal window and navigate to the directory where you have the accession list text file.
Use SRA Toolkit to Download:
Run the prefetch command from the SRA Toolkit, providing the accession list file as input. For example:
Replace accession_list.txt with the actual filename containing your accession numbers. For example: PRJNA834801_accession_subselection.txt.
Conversion to FASTQ:
Once the prefetch command completes, use the fastq-dump command to convert the downloaded SRA files to FASTQ format. For instance:
Replace SRRXXXXXX with the specific accession numbers obtained from the SRA. This command will convert the downloaded SRA files to FASTQ format. Adjust as necessary for your downloaded files.
These commands will download the selected SRA files based on the accession list youβve prepared and subsequently convert them to FASTQ format for further analysis or processing in your research.
To download the SRA files to a specific location, such as a directory named βsra_downloadsβ on a mounted D drive, follow these modified steps:
Accession List Preparation:
Create a text file containing the accession numbers of the SRA files you want to download. Each accession number should be on a separate line in the file.
Download SRA Files to a Specific Location:
Open a terminal window and navigate to the directory where you have the accession list text file.
Use the prefetch command from the SRA Toolkit, specifying the target directory using the -O option. For example:
Replace accession_list.txt with the actual filename containing your accession numbers. The -O flag followed by the directory path /mnt/d/sra_downloads indicates the specific location where the downloaded files will be stored.
Conversion to Fastq:
After downloading the SRA files, navigate to the directory where they are stored (/mnt/d/sra_downloads in this example).
Use the fastq-dump command to convert the downloaded SRA files to fastq format. For instance:
Replace SRRXXXXXX with the specific accession number obtained from the SRA. This command will convert the downloaded SRA file to FASTQ format and place it in the specified directory.
These commands will download the selected SRA files based on the accession list to the specified location on your mounted D drive and convert them to FASTQ format for further analysis. Adjust paths and commands according to your system setup and requirements.
To convert multiple downloaded SRA files to FASTQ format within the same fastq-dump command, you can do the following:
Download SRA Files:
Use prefetch to download the SRA files to the specified directory on your D drive:
Conversion to FASTQ for Multiple Files:
Navigate to the directory where the downloaded SRA files are stored (/mnt/d/sra_downloads).
Use the fastq-dump command with the βsplit-files option to convert multiple SRA files to FASTQ format in one command:
This command utilizes the wildcard *.sra to specify that all SRA files present in the current directory should be converted to FASTQ format.
Make sure to navigate to the correct directory where your downloaded SRA files are stored before running the fastq-dump command using the wildcard to convert all the SRA files to FASTQ simultaneously. Adjust paths and file extensions according to your specific setup if needed.