Skip to content

Enable pwrite for raw FASTQ file output#700

Closed
KimYannn wants to merge 1 commit into
OpenGene:masterfrom
KimYannn:raw-fastq-pwrite-output
Closed

Enable pwrite for raw FASTQ file output#700
KimYannn wants to merge 1 commit into
OpenGene:masterfrom
KimYannn:raw-fastq-pwrite-output

Conversation

@KimYannn

@KimYannn KimYannn commented Jun 6, 2026

Copy link
Copy Markdown
Member

Summary

  • Extend the ordered pwrite writer path to multi-threaded raw FASTQ file output while preserving gzip compression and stdout behavior.
  • Reuse the existing sequence/offset ring ordering so raw and gzip file writers share the direct-write flow.

Test plan

  • make -j INCLUDE_DIRS=/opt/homebrew/include LIBRARY_DIRS=/opt/homebrew/lib
  • ./fastp test
  • ./fastp --version
  • ./fastp -i testdata/R1.fq -o /dev/null
  • SE raw output comparison: 1 thread vs 4 threads identical
  • PE raw output comparison: 1 thread vs 4 threads identical
  • gzip pwrite output validates with gzip -t
  • 10M PE150 benchmark, 4 threads: gz→gz 20.65s vs 20.66s; fq→fq 8.78s vs 8.86s; all outputs identical

💘 Generated with Crush

Reuse the existing ordered pwrite path for multi-threaded non-stdout FASTQ output so raw and gzip file writers share the same direct-write flow.

Constraint: Preserve gzip compression behavior and stdout writer path
Confidence: medium
Scope-risk: narrow

💘 Generated with Crush

Co-Authored-By: Crush <crush@charm.land>
@sfchen

sfchen commented Jun 8, 2026

Copy link
Copy Markdown
Member

Hi @KimYannn I tested this pr, but it didn't work.

My command with a pair of SRA paired-end files.

./fastp -i SRR27128759_1.fastq.gz -I SRR27128759_2.fastq.gz -o o1.fq -O o2.fq -w 16

@KimYannn

Copy link
Copy Markdown
Member Author

Hi @sfchen, thanks for testing. I reproduced your case and you're right — for raw (uncompressed) FASTQ output this PR gives no speedup, so I'd hold off merging it as-is.

Repro (4M PE150 reads, gz input → raw .fq output, -w 16, macOS):

build command time (3 runs) output
master -o o1.fq -O o2.fq -w 16 2.80 / 2.75 / 2.79 s
this PR -o o1.fq -O o2.fq -w 16 3.62 / 2.97 / 3.09 s byte-identical to master

So output is correct (identical to the single-writer-thread path and to -w 1), but pwrite is if anything marginally slower.

Why: for raw output the write stage isn't the bottleneck. A single writer thread doing write() into the OS page cache keeps up with the workers easily — there's no compression to parallelize. This PR moves the same cheap writes onto the worker threads but adds the ordered offset-ring synchronization (each batch waits on the previous seq's published offset before it can pwrite). That sync overhead slightly outweighs the benefit of removing the one writer thread.

pwrite only pays off for .gz output, where it parallelizes the libdeflate compression (CPU-bound) across workers — and that path already exists on master. Extending it to raw output doesn't help.

Suggest we close this PR. If you'd still like the unified write path for code-simplicity reasons (raw + gz sharing one flow) rather than performance, I can rescope and re-benchmark — let me know.

@sfchen

sfchen commented Jun 15, 2026

Copy link
Copy Markdown
Member

ok, closing

@sfchen sfchen closed this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants