Enable pwrite for raw FASTQ file output by KimYannn · Pull Request #700 · OpenGene/fastp

KimYannn · 2026-06-06T13:42:25Z

Summary

Extend the ordered pwrite writer path to multi-threaded raw FASTQ file output while preserving gzip compression and stdout behavior.
Reuse the existing sequence/offset ring ordering so raw and gzip file writers share the direct-write flow.

Test plan

make -j INCLUDE_DIRS=/opt/homebrew/include LIBRARY_DIRS=/opt/homebrew/lib
./fastp test
./fastp --version
./fastp -i testdata/R1.fq -o /dev/null
SE raw output comparison: 1 thread vs 4 threads identical
PE raw output comparison: 1 thread vs 4 threads identical
gzip pwrite output validates with gzip -t
10M PE150 benchmark, 4 threads: gz→gz 20.65s vs 20.66s; fq→fq 8.78s vs 8.86s; all outputs identical

💘 Generated with Crush

Reuse the existing ordered pwrite path for multi-threaded non-stdout FASTQ output so raw and gzip file writers share the same direct-write flow. Constraint: Preserve gzip compression behavior and stdout writer path Confidence: medium Scope-risk: narrow 💘 Generated with Crush Co-Authored-By: Crush <crush@charm.land>

sfchen · 2026-06-08T06:43:23Z

Hi @KimYannn I tested this pr, but it didn't work.

My command with a pair of SRA paired-end files.

./fastp -i SRR27128759_1.fastq.gz -I SRR27128759_2.fastq.gz -o o1.fq -O o2.fq -w 16

KimYannn · 2026-06-15T05:57:25Z

Hi @sfchen, thanks for testing. I reproduced your case and you're right — for raw (uncompressed) FASTQ output this PR gives no speedup, so I'd hold off merging it as-is.

Repro (4M PE150 reads, gz input → raw .fq output, -w 16, macOS):

build	command	time (3 runs)	output
master	`-o o1.fq -O o2.fq -w 16`	2.80 / 2.75 / 2.79 s	—
this PR	`-o o1.fq -O o2.fq -w 16`	3.62 / 2.97 / 3.09 s	byte-identical to master

So output is correct (identical to the single-writer-thread path and to -w 1), but pwrite is if anything marginally slower.

Why: for raw output the write stage isn't the bottleneck. A single writer thread doing write() into the OS page cache keeps up with the workers easily — there's no compression to parallelize. This PR moves the same cheap writes onto the worker threads but adds the ordered offset-ring synchronization (each batch waits on the previous seq's published offset before it can pwrite). That sync overhead slightly outweighs the benefit of removing the one writer thread.

pwrite only pays off for .gz output, where it parallelizes the libdeflate compression (CPU-bound) across workers — and that path already exists on master. Extending it to raw output doesn't help.

Suggest we close this PR. If you'd still like the unified write path for code-simplicity reasons (raw + gz sharing one flow) rather than performance, I can rescope and re-benchmark — let me know.

sfchen · 2026-06-15T06:54:55Z

ok, closing

sfchen closed this Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable pwrite for raw FASTQ file output#700

Enable pwrite for raw FASTQ file output#700
KimYannn wants to merge 1 commit into
OpenGene:masterfrom
KimYannn:raw-fastq-pwrite-output

KimYannn commented Jun 6, 2026

Uh oh!

sfchen commented Jun 8, 2026

Uh oh!

KimYannn commented Jun 15, 2026

Uh oh!

sfchen commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

KimYannn commented Jun 6, 2026

Summary

Test plan

Uh oh!

sfchen commented Jun 8, 2026

Uh oh!

KimYannn commented Jun 15, 2026

Uh oh!

sfchen commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants