Using Scala to Read Really, Really Large Files – Part 6: FS2 (core)

 | 

In the Part 5 we evaluated FS2, a functional streams library which aims to be safe and simple, using the fs2-io helper module. In this part, we’ll evaluate FS2, using only the core module.

Implementation

object FS2Core extends FileReader {
  override def consume(path: Path): Result =
    Stream
      .resource(Resource.fromAutoCloseable(IO(Files.newBufferedReader(path))))
      .flatMap {
        Stream.unfold(_) { reader =>
          Option(reader.readLine()).map(_ -> reader)
        }
      }
      .compile
      .fold(LineMetricsAccumulator.empty)(_ addLine _)
      .map(_.asResult)
      .unsafeRunSync()

  override def description: String = "fs2-core"
}

Ergonomics 😀

Once written, this version is easier to work with than using the fs2-io helpers: there’s no implicits to keep track of, and relying on the Java BufferedReader simplifies the transformations.

Writing it was a bit of a chore, as it’s not as well documented as the version using fs2-io. It’s simply not as discoverable. A good example is unfold, which makes sense as the inverse of fold (e.g. (A, A => Option[B]) => C[B] is a sensible inverse of (C[B], A, (A, B) => A) => A, but only in retrospect).

Safety 😀

The way the resource is handled is quite nice, and there are quite a few options. An equally valid way to create the initial stream would be to explicitly specify the way to close the resource using Stream.bracket.

Stream.bracket(IO(Files.newBufferedReader(path)))(_.close().pure[IO])

Otherwise, the same emphasis on safety present in the fs2-io helpers
is also present here – or more likely, the emphasis on safety in
fs2-core was inherited by fs2-io.

Performance

The performance was very close to the Scala Standard Library and better-files, and faster than Akka Streams – possibly enough to back up their claims that Akka is overcomplicated.

Interestingly enough, there appears to be a trade off reversal: the fs2-core version is fast with greater variation and the fs2-io version is slower with minimal variation.

In a pleasant deviation from the norm, fs2-core was one of the few that both provided a nice DSL and didn’t suffer greatly under constrained memory conditions.

library env wall clock (mm:ss ± %)  % of best in env  % of best  % of reference  % change from local
Scala StdLib local 00:36.643 ±  1.91 % 100.00 % 100.00 % 20.34 % 0.00 %
fs2-core local 00:41.183 ±  4.43 % 112.39 % 112.39 % 22.86 % 0.00 %
Akka Streams local 00:55.586 ±  2.30 % 151.69 % 151.69 % 30.85 % 0.00 %
Scala StdLib EC2 02:02.973 ±  8.83 % 100.00 % 335.59 % 68.26 % 235.59 %
fs2-core EC2 02:24.048 ± 31.37 % 117.14 % 393.10 % 79.96 % 249.77 %
Java StdLib EC2 03:00.161 ± 23.98 % 146.50 % 491.66 % 100.00 % 131.71 %

Memory Usage

Memory usage is close enough to the Scala Standard Library measurements that, most of the time, they’ll be nicely comparable. The worst-case appears to be marginally better than the Akka Streams best-case, but the range of variations in memory usage is quite large.

library env peak memory used (mb ± %)  % of best in env  % of best  % of reference
Java StdLib EC2 328.89 ±  9.71 % 100.00 % 102.30 % 100.00 %
Scala StdLib EC2 365.64 ±  0.06 % 111.17 % 113.73 % 111.17 %
fs2-core EC2 365.77 ±  0.07 % 111.21 % 113.77 % 111.21 %
Akka Streams EC2 367.27 ±  0.56 % 111.67 % 114.23 % 111.67 %
Scala StdLib local 916.20 ±  7.59 % 284.97 % 284.97 % 278.57 %
fs2-core local 949.69 ± 13.13 % 295.39 % 295.39 % 288.75 %
Akka Streams local 1434.66 ±  0.42 % 446.23 % 446.23 % 436.21 %

Conclusion

FS2 appears to be well-positioned to be a good default choice. It’s not the fastest, or most memory efficient, but it does have a good balance between the two.

See in git repo

Up next: a visit to the Land of Nouns
Or: jump right to the summary