Using Scala to Read Really, Really Large Files – Part 8: Summary

 | 

For those readers who like jumping to the last page of a novel to decide if they want to read it or not, head down to the next section for the results. If you’d like to dive back in to a particular part, head back over to the Table of Contents.

For those readers who’ve read the whole series: thank you!

Table of Results: Normal Operating Conditions

Library Ergonomics Safety Performance (mm:ss ± %) Memory Usage (mb ± %)
Scala Standard Libraries 😐 😕 00:36.643 ±  1.91 % 916.20 ±  7.59 %
better-files 😀 😕 00:36.818 ±  2.46 % 920.19 ±  9.03 %
Akka Streams 😐 😀 00:55.586 ±  2.30 % 1434.66 ±  0.42 %
fs2-io 😀 😀 01:28.475 ±  1.04 % 681.90 ±  0.45 %
FS2 (core) 😀 😀 00:41.183 ±  4.43 % 949.69 ± 13.13 %
Java Standard Libraries 😕 😞 01:17.751 ±  4.42 % 321.51 ±  7.93 %

Table of Results: Memory Constrained Conditions

Library Ergonomics Safety Performance (mm:ss ± %) Memory Usage (mb ± %)
Scala Standard Libraries 😐 😕 02:02.973 ±  8.83 % 365.64 ± 0.06 %
better-files 😀 😕 02:04.564 ±  3.09 % 365.59 ± 0.07 %
Akka Streams 😐 😀 03:51.666 ±  4.68 % 367.27 ± 0.56 %
fs2-io 😀 😀 11:46.379 ± 26.66 % 365.86 ± 0.05 %
FS2 (core) 😀 02:24.048 ± 31.37 % 365.77 ± 0.07 %
Java Standard Libraries 😕 😞 03:00.161 ± 23.98 % 328.89 ± 9.71 %

Recommendations

  • The Scala Standard Library: solid, but there are better options
  • better-files: great for simple one-offs, but doesn’t scale with complexity
  • Akka Streams: when you absolutely must have backpressure, but not worth the speed & memory penalty in most cases
  • fs2-io: keep an eye on this one, but pass for now
  • FS2 (core): a solid default
  • Java: only if there’s no access to a Scala compiler, or memory on the target machine is extremely constrained

A Sanity Check

Even the slowest chewed through 18 million records over the course of a 3.4G file in under a minute and a half, under normal conditions, so the differences probably aren’t worth redoing code that already exists.

A green-field project on the other hand …