Running and Verification
Procedure
- Use PuTTY to log in to the server as the root user.
- Download the case files.
wget http://labshare.cshl.edu/shares/schatzlab/www-data/ectools/w303/Illumina_500bp_2x300_R1.fastq.gz wget http://labshare.cshl.edu/shares/schatzlab/www-data/ectools/w303/Pacbio.fasta.gz
- Decompress the case files.
gzip Pacbio.fasta.gz –d gzip Illumina_500bp_2x300_R1.fastq.gz -d
- Process the data.
SelectLongestReads sum 600000000 longest 0 o Illumina_50x.fastq f Illumina_500bp_2x300_R1.fastq

SelectLongestReads sum 260000000 longest 0 o Pacbio_20x.fasta f Pacbio.fasta

- Create an Illumina_data directory and copy the generated FASTQ file to the Illumina_data directory.
mkdir Illumina_data && cp Illumina_50x.fastq Illumina_data/
- Create an Pacbio_data directory and copy the generated FASTQ file to the Pacbio_data directory.
mkdir Pacbio_data && cp Pacbio_20x.fasta Pacbio_data/
- Create a step1 directory and switch to the directory.
mkdir step1 && cd step1
- Assemble a Contigs sequence using the Illumina fragment library data.
SparseAssembler LD 0 k 51 g 15 NodeCovTh 1 EdgeCovTh 0 GS 12000000 f ../Illumina_data/Illumina_50x.fastq

SparseAssembler LD 1 NodeCovTh 2 EdgeCovTh 1 k 51 g 15 GS 12000000 f ../Illumina_data/Illumina_50x.fastq

The following files are generated:

- Find the overlap between the Contigs sequence and Pacbio reads and perform layout.
DBG2OLC k 17 AdaptiveTh 0.0001 KmerCovTh 2 MinOverlap 20 RemoveChimera 1 Contigs Contigs.txt f ../Pacbio_data/Pacbio_20x.fasta
Information similar to the following is displayed:

The following files are generated:

- Use the python and shell scripts in the /opt/biosoft/DBG2OLC/utility/ directory to invoke Sparc of the blasr and consensus modules for calculation.
- Modify the split_and_run_sparc.sh script.
vi split_and_run_sparc.sh
- Press i to enter the insert mode, comment out line 27, and add line 28.

- Press Esc, type :wq!, and press Enter to save the file and exit.
- Run the following command in the step1 directory:
cp ../Pacbio_20x.fasta . cat Contigs.txt Pacbio_20x.fasta > ctg_pb.fasta mkdir consensus_dir split_and_run_sparc.sh backbone_raw.fasta DBG2OLC_Consensus_info.txt ctg_pb.fasta ./consensus_dir 2 >cns_log.txt

The following files are generated:

- Modify the split_and_run_sparc.sh script.
Parent topic: DBG2OLC Porting Guide (openEuler 20.03)
