Using Scala to Read Really, Really Large Files – Part 6: FS2 (core)


In the Part 5 we evaluated FS2, a functional streams library which aims to be safe and simple, using the fs2-io helper module. In this part, we’ll evaluate FS2, using only the core module.


object FS2Core extends FileReader {
  override def consume(path: Path): Result =
      .flatMap {
        Stream.unfold(_) { reader =>
          Option(reader.readLine()).map(_ -> reader)
      .fold(LineMetricsAccumulator.empty)(_ addLine _)

  override def description: String = "fs2-core"

Ergonomics 😀

Once written, this version is easier to work with than using the fs2-io helpers: there’s no implicits to keep track of, and relying on the Java BufferedReader simplifies the transformations.

Writing it was a bit of a chore, as it’s not as well documented as the version using fs2-io. It’s simply not as discoverable. A good example is unfold, which makes sense as the inverse of fold (e.g. (A, A => Option[B]) => C[B] is a sensible inverse of (C[B], A, (A, B) => A) => A, but only in retrospect).

Safety 😀

The way the resource is handled is quite nice, and there are quite a few options. An equally valid way to create the initial stream would be to explicitly specify the way to close the resource using Stream.bracket.


Otherwise, the same emphasis on safety present in the fs2-io helpers
is also present here – or more likely, the emphasis on safety in
fs2-core was inherited by fs2-io.


The performance was very close to the Scala Standard Library and better-files, and faster than Akka Streams – possibly enough to back up their claims that Akka is overcomplicated.

Interestingly enough, there appears to be a trade off reversal: the fs2-core version is fast with greater variation and the fs2-io version is slower with minimal variation.

In a pleasant deviation from the norm, fs2-core was one of the few that both provided a nice DSL and didn’t suffer greatly under constrained memory conditions.

library env wall clock (mm:ss ± %)  % of best in env  % of best  % of reference  % change from local
Scala StdLib local 00:36.643 ±  1.91 % 100.00 % 100.00 % 20.34 % 0.00 %
fs2-core local 00:41.183 ±  4.43 % 112.39 % 112.39 % 22.86 % 0.00 %
Akka Streams local 00:55.586 ±  2.30 % 151.69 % 151.69 % 30.85 % 0.00 %
Scala StdLib EC2 02:02.973 ±  8.83 % 100.00 % 335.59 % 68.26 % 235.59 %
fs2-core EC2 02:24.048 ± 31.37 % 117.14 % 393.10 % 79.96 % 249.77 %
Java StdLib EC2 03:00.161 ± 23.98 % 146.50 % 491.66 % 100.00 % 131.71 %

Memory Usage

Memory usage is close enough to the Scala Standard Library measurements that, most of the time, they’ll be nicely comparable. The worst-case appears to be marginally better than the Akka Streams best-case, but the range of variations in memory usage is quite large.

library env peak memory used (mb ± %)  % of best in env  % of best  % of reference
Java StdLib EC2 328.89 ±  9.71 % 100.00 % 102.30 % 100.00 %
Scala StdLib EC2 365.64 ±  0.06 % 111.17 % 113.73 % 111.17 %
fs2-core EC2 365.77 ±  0.07 % 111.21 % 113.77 % 111.21 %
Akka Streams EC2 367.27 ±  0.56 % 111.67 % 114.23 % 111.67 %
Scala StdLib local 916.20 ±  7.59 % 284.97 % 284.97 % 278.57 %
fs2-core local 949.69 ± 13.13 % 295.39 % 295.39 % 288.75 %
Akka Streams local 1434.66 ±  0.42 % 446.23 % 446.23 % 436.21 %


FS2 appears to be well-positioned to be a good default choice. It’s not the fastest, or most memory efficient, but it does have a good balance between the two.

See in git repo

Up next: a visit to the Land of Nouns
Or: jump right to the summary