Global Asm4pg assembly report for: pepper_trio

Asm4pg Parameters

  • Hifiasm mode : trio

  • If hi-c or trio mode :

    • parent1/r1 sample_combined_R1.fastq.gz
    • parent2/r2 sample_combined_R2.fastq.gz
  • Hifiasm purge force : 3

  • Purge_dups executed: No

  • Genome scafolded: No

  • Busco lineage: eudicots_odb10

  • Genome ploidy: 2

  • Kmers size: 21

Raw data QC

Reads statistics

# number of contigs:     11829089
# total contigs length:  201930935177
# mean contig size:      17070.71
# contig size first quartile: 12047
# median contig size:         16879
# contig size third quartile: 21801
# longest contig:             62198
# shortest contig:            45
# contigs > 500 nt:           11828257 (99.99 %)
# contigs > 1K nt:            11827770 (99.99 %)
# contigs > 10K nt:           9780945 (82.69 %)
# contigs > 100K nt:          0 (0.00 %)
# contigs > 1M nt:            0 (0.00 %)
# N50:                   19794
# L50:                   4057061
# N80:                   14260
# L80:                   7619242

QC on final assembly

Assembly statistics

Hap 1

# number of contigs:     225
# total contigs length:  3093706697
# mean contig size:      13749807.54
# contig size first quartile: 17342
# median contig size:         32746
# contig size third quartile: 131209
# longest contig:             270751445
# shortest contig:            1727
# contigs > 500 nt:           225 (100.00 %)
# contigs > 1K nt:            225 (100.00 %)
# contigs > 10K nt:           216 (96.00 %)
# contigs > 100K nt:          60 (26.67 %)
# contigs > 1M nt:            36 (16.00 %)
# N50:                   189955480
# L50:                   7
# N80:                   122568117
# L80:                   13

Hap 2

# number of contigs:     313
# total contigs length:  3104393956
# mean contig size:      9918191.55
# contig size first quartile: 41870
# median contig size:         58748
# contig size third quartile: 101756
# longest contig:             263434072
# shortest contig:            14035
# contigs > 500 nt:           313 (100.00 %)
# contigs > 1K nt:            313 (100.00 %)
# contigs > 10K nt:           313 (100.00 %)
# contigs > 100K nt:          78 (24.92 %)
# contigs > 1M nt:            39 (12.46 %)
# N50:                   228317904
# L50:                   7
# N80:                   122680256
# L80:                   12

K-mer profiles

Hap 1 Hap 2

K-mer completeness and error rate

Completeness

pepper_trio_hap1    all 1747538069  1862114111  93.847
pepper_trio_hap2    all 1747171574  1862114111  93.8273
both    all 1849264697  1862114111  99.31
pepper_trio_hap1    pepper_trio_P1-db.hapmer    86647986    86662834    99.9829
pepper_trio_hap1    pepper_trio_P2-db.hapmer    14145   93664470    0.0151018
pepper_trio_hap2    pepper_trio_P1-db.hapmer    16519   86662834    0.0190612
pepper_trio_hap2    pepper_trio_P2-db.hapmer    91915238    93664470    98.1324
both    pepper_trio_P1-db.hapmer    86648522    86662834    99.9835
both    pepper_trio_P2-db.hapmer    91915638    93664470    98.1329

Error rate

pepper_trio_hap1    4807    3093702197  71.3082 7.39906e-08
pepper_trio_hap2    8155    3104387696  69.0277 1.25092e-07
Both    12962   6198089893  70.018  9.95853e-08

BUSCO score

Hap 1

# BUSCO version is: 5.7.1 
# The lineage dataset is: eudicots_odb10 (Creation date: 2024-01-08, number of genomes: 31, number of BUSCOs: 2326)
# Summarized benchmarking in BUSCO notation for file /home/lpiat/work/asm_article_benchmark/asm_trio/results/pepper_trio_results/02_final_assembly/hap1/pepper_trio_final_hap1.fasta
# BUSCO was run in mode: euk_genome_min
# Gene predictor used: miniprot

    ***** Results: *****

    C:97.9%[S:94.9%,D:3.0%],F:1.6%,M:0.5%,n:2326,E:3.5%    
    2277    Complete BUSCOs (C) (of which 80 contain internal stop codons)         
    2207    Complete and single-copy BUSCOs (S)    
    70  Complete and duplicated BUSCOs (D)     
    37  Fragmented BUSCOs (F)              
    12  Missing BUSCOs (M)             
    2326    Total BUSCO groups searched        

Assembly Statistics:


Dependencies and versions:
    hmmsearch: 3.1
    bbtools: 39.01
    miniprot_index: 0.13-r248
    miniprot_align: 0.13-r248
    python: sys.version_info(major=3, minor=7, micro=12, releaselevel='final', serial=0)
    busco: 5.7.1

Hap 2

# BUSCO version is: 5.7.1 
# The lineage dataset is: eudicots_odb10 (Creation date: 2024-01-08, number of genomes: 31, number of BUSCOs: 2326)
# Summarized benchmarking in BUSCO notation for file /home/lpiat/work/asm_article_benchmark/asm_trio/results/pepper_trio_results/02_final_assembly/hap2/pepper_trio_final_hap2.fasta
# BUSCO was run in mode: euk_genome_min
# Gene predictor used: miniprot

    ***** Results: *****

    C:97.0%[S:94.2%,D:2.8%],F:1.6%,M:1.4%,n:2326,E:3.5%    
    2255    Complete BUSCOs (C) (of which 80 contain internal stop codons)         
    2190    Complete and single-copy BUSCOs (S)    
    65  Complete and duplicated BUSCOs (D)     
    37  Fragmented BUSCOs (F)              
    34  Missing BUSCOs (M)             
    2326    Total BUSCO groups searched        

Assembly Statistics:


Dependencies and versions:
    hmmsearch: 3.1
    bbtools: 39.01
    miniprot_index: 0.13-r248
    miniprot_align: 0.13-r248
    python: sys.version_info(major=3, minor=7, micro=12, releaselevel='final', serial=0)
    busco: 5.7.1

Telomeres

Telomeres present in assembly #### Hap 1

##########
225 sequences to analyze for telomeric repeats (TTAGGG/CCCTAA) in file /home/lpiat/work/asm_article_benchmark/asm_trio/results/pepper_trio_results/02_final_assembly/hap1/pepper_trio_final_hap1.fasta
##########

h1tg000002l      Forward (start of sequence)     TAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCT
h1tg000005l      Forward (start of sequence)     AAACCCTAAACCCTAAACCCTAAACCCTGAACCCTAAACCCTAAACCCTA
h1tg000007l      Reverse (end of sequence)   GGTTTAGGGTTTAGGGTTAGGGTTTAGGGATAGGGATTTCAGGGTTTAGG
h1tg000008l      Reverse (end of sequence)   GTTTAGGGTTTAGGTTTAGGTTTAGGGTTTAGGTTTAGGGTTTAGGGTTT
h1tg000009l      Reverse (end of sequence)   GGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGG
h1tg000010l      Forward (start of sequence)     AAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTA
h1tg000016l      Forward (start of sequence)     AAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTA
h1tg000019l      Reverse (end of sequence)   TTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGTTTAGGGTTTAGGGTTT
h1tg000020l      Forward (start of sequence)     CCTAAACCCTAAACCCTAACCCTAAACCCTAAACCCTAAACCCTAAACCC
h1tg000021l      Forward (start of sequence)     ACCCTAACCCTAAACCCTAAAACCTAAACCCTAAACCCTAAACCCTAAAC
h1tg000023l      Forward (start of sequence)     ACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAA
h1tg000026l      Forward (start of sequence)     CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCC
h1tg000032l      Forward (start of sequence)     CAAACCTAAACCCAATAAAACCCTAACCCTAACCCTAAACCCTAAACCCT

Telomeres found: 13 (9 forward, 4 reverse)

Hap 2

##########
313 sequences to analyze for telomeric repeats (TTAGGG/CCCTAA) in file /home/lpiat/work/asm_article_benchmark/asm_trio/results/pepper_trio_results/02_final_assembly/hap2/pepper_trio_final_hap2.fasta
##########

h2tg000006l      Forward (start of sequence)     CCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCTAAACCC
h2tg000008l      Reverse (end of sequence)   AGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTAG
h2tg000015l      Forward (start of sequence)     CCTAAACCCTAAACCCTAAACCCTAAATCCCGTAAACCCTAACCCTAAAC
h2tg000017l      Forward (start of sequence)     CTAAACCCTAAACCCTAACCTAACCCTAAACCCTAAACCCTAAACCCTAA
h2tg000018l      Forward (start of sequence)     CCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAAC
h2tg000020l      Forward (start of sequence)     CCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACC
h2tg000027l      Reverse (end of sequence)   GGGTTTAGGGTTTAGGGTTTAGGGTTAGGGTTTAGGGTTTAGGGTTTTAG
h2tg000033l      Forward (start of sequence)     CCTAAACCCTGAAATCCCTATCCCTAAACCCTAACCCTAAACCCTAAACC
h2tg000035l      Forward (start of sequence)     CCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAAC
h2tg000037l      Forward (start of sequence)     TAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCT
h2tg000039l      Forward (start of sequence)     CCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACC
h2tg000047l      Forward (start of sequence)     TAACCCTAAACCTAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAA
h2tg000079l      Forward (start of sequence)     AAAACCCTAAACCCTAAACCCTAAACCCTAAAACCCTAAACCTCTAACCC
h2tg000082l      Forward (start of sequence)     CTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCTAAACCCT
h2tg000098l      Forward (start of sequence)     CCCTAAACCCTAACCCTAAATACCCTAAACCCTAACCCCCTAAACCCTAA
h2tg000144l      Forward (start of sequence)     CCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACC
h2tg000169l      Forward (start of sequence)     CCTAAACCTAAACCCTTAACCCTAAACCCTAAACCCTAAACCTCTAAACC
h2tg000188l      Reverse (end of sequence)   GGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGGGTTTAGG
h2tg000220l      Forward (start of sequence)     CCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACCCTAAACC

Telomeres found: 19 (16 forward, 3 reverse)

Transposable element analysis

Hap 1

LTR recap

==================================================
file name: tmp_hap.fasta            
sequences:           225
total length: 3093706697 bp  (3093706697 bp excl N/X-runs)
GC level:         34.96 %
bases masked: 1728517665 bp ( 55.87 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements       1962721   1728517665 bp   55.87 %
   SINEs:                0            0 bp    0.00 %
   Penelope              0            0 bp    0.00 %
   LINEs:                0            0 bp    0.00 %
    CRE/SLACS            0            0 bp    0.00 %
     L2/CR1/Rex          0            0 bp    0.00 %
     R1/LOA/Jockey       0            0 bp    0.00 %
     R2/R4/NeSL          0            0 bp    0.00 %
     RTE/Bov-B           0            0 bp    0.00 %
     L1/CIN4             0            0 bp    0.00 %
   LTR elements:    1962721   1728517665 bp   55.87 %
     BEL/Pao             0            0 bp    0.00 %
     Ty1/Copia      291155    171016422 bp    5.53 %
     Gypsy/DIRS1    615267    570505650 bp   18.44 %
       Retroviral        0            0 bp    0.00 %

DNA transposons          0            0 bp    0.00 %
   hobo-Activator        0            0 bp    0.00 %
   Tc1-IS630-Pogo        0            0 bp    0.00 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac              0            0 bp    0.00 %
   Tourist/Harbinger     0            0 bp    0.00 %
   Other (Mirage,        0            0 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:            0            0 bp    0.00 %

Total interspersed repeats:  1728517665 bp   55.87 %


Small RNA:               0            0 bp    0.00 %

Satellites:              0            0 bp    0.00 %
Simple repeats:          0            0 bp    0.00 %
Low complexity:          0            0 bp    0.00 %
==================================================

LAI

Chr From    To  Intact  Total   raw_LAI LAI
whole_genome    1   3093706697  0.0049  0.5541  0.89    6.52

Hap 2

LTR recap

==================================================
file name: tmp_hap.fasta            
sequences:           313
total length: 3104393956 bp  (3104393956 bp excl N/X-runs)
GC level:         34.97 %
bases masked: 1782671732 bp ( 57.42 %)
==================================================
               number of      length   percentage
               elements*    occupied  of sequence
--------------------------------------------------
Retroelements       1963285   1782671732 bp   57.42 %
   SINEs:                0            0 bp    0.00 %
   Penelope              0            0 bp    0.00 %
   LINEs:                0            0 bp    0.00 %
    CRE/SLACS            0            0 bp    0.00 %
     L2/CR1/Rex          0            0 bp    0.00 %
     R1/LOA/Jockey       0            0 bp    0.00 %
     R2/R4/NeSL          0            0 bp    0.00 %
     RTE/Bov-B           0            0 bp    0.00 %
     L1/CIN4             0            0 bp    0.00 %
   LTR elements:    1963285   1782671732 bp   57.42 %
     BEL/Pao             0            0 bp    0.00 %
     Ty1/Copia      214995    130467566 bp    4.20 %
     Gypsy/DIRS1    586277    582670353 bp   18.77 %
       Retroviral        0            0 bp    0.00 %

DNA transposons          0            0 bp    0.00 %
   hobo-Activator        0            0 bp    0.00 %
   Tc1-IS630-Pogo        0            0 bp    0.00 %
   En-Spm                0            0 bp    0.00 %
   MuDR-IS905            0            0 bp    0.00 %
   PiggyBac              0            0 bp    0.00 %
   Tourist/Harbinger     0            0 bp    0.00 %
   Other (Mirage,        0            0 bp    0.00 %
    P-element, Transib)

Rolling-circles          0            0 bp    0.00 %

Unclassified:            0            0 bp    0.00 %

Total interspersed repeats:  1782671732 bp   57.42 %


Small RNA:               0            0 bp    0.00 %

Satellites:              0            0 bp    0.00 %
Simple repeats:          0            0 bp    0.00 %
Low complexity:          0            0 bp    0.00 %
==================================================

LAI

Chr From    To  Intact  Total   raw_LAI LAI
whole_genome    1   3104393956  0.0051  0.5696  0.89    6.52