Splitting peptides is slow. If I recall correctly, we only achieved +- 2MB/s. A simple reimplementation in sed was equally slow. It's worth experimenting with faster implementations because this is becoming a bottleneck.
One option might be to stop using regular expressions and just iterate over all characters of the string to determine the split sites. The current implementation can be used as a fallback when a user supplies its own pattern.
Original issue by @bmesuere on Sun Dec 20 2015 at 15:12.