Using Scala to Read Really, Really Large Files – Part 5: FS2 (with IO helpers)

 | 

FS2 takes the stance that the Reactive Streams approach is mutable, unsafe, and generally more complicated than it needs to be. To this end, it aims to provide an expressive, safe, and composable DSL for defining and manipulating streaming I/O.

It is included primarily because it’s part of the Cats ecosystem, so it plays nicely with our other libraries. Because this is a library we’re less familiar with, two approaches are being evaluated. This turned out to be a good thing as there are important differences in usability and performance between the two versions.

Because it’s presented as the default way to do file IO, the first FS2 version we’ll evaluate uses the fs2-io helper module.

Implementation

object FS2IO extends FileReader {
  override def consume(path: Path): Result = {
    implicit val executionContext: ExecutionContext = ExecutionContext.global
    implicit val contextShift:     ContextShift[IO] = IO.contextShift(executionContext)
    val readingExecutionContext =
      Resource.make(IO(ExecutionContext.fromExecutorService(Executors.newFixedThreadPool(2))))(ec => IO(ec.shutdown()))

    Stream
      .resource(readingExecutionContext)
      .flatMap { EC =>
        io.file
          .readAll[IO](path, EC, 4096)
          .through(text.utf8Decode)
          .through(text.lines)
      }
      .filter(_.nonEmpty)
      .compile
      .fold(LineMetricsAccumulator.empty)(_ addLine _)
      .map(_.asResult)
      .unsafeRunSync()
  }

  override def description: String = "fs2-io"
}

Ergonomics 😀

The DSL is quite interesting, and easy enough to work with once you wrap your head around the idea that everything is a Stream. The implicits are tolerable mostly because they don’t have a lifecycle that needs to be explicitly managed.

Safety 😀

The DSL has been structured with a noticeable emphasis on safety. The decoding and handling of the file is represented in the type signature of each stage, so it’s impossible to get to runtime with a missing step.

Performance

Unfortunately, the performance was underwhelming – particularly when memory was constrained. Hopefully this will improve as the library matures. In the performance arena, it’s not yet able to topple Akka.

library env wall clock (mm:ss ± %)  % of best in env  % of best  % of reference  % change from local
Akka Streams local 00:55.586 ±  2.30 % 151.69 % 151.69 % 30.85 % 0.00 %
fs2-io local 01:28.475 ±  1.04 % 241.45 % 241.45 % 49.11 % 0.00 %
Java StdLib EC2 03:00.161 ± 23.98 % 146.50 % 491.66 % 100.00 % 131.71 %
Akka Streams EC2 03:51.666 ±  4.68 % 188.39 % 632.21 % 128.59 % 316.77 %
fs2-io EC2 11:46.379 ± 26.66 % 574.42 % 1927.69 % 392.08 % 698.39 %

Memory Usage

Surprisingly, memory usage was the lowest out of all the Scala implementations and one of the most consistent. Unfortunately that only goes so far, and the constrained-memory test environment had only about half what fs2-io used locally, leading to the extremely degraded performance noted in the last section.

library env peak memory used (mb ± %)  % of best in env  % of best  % of reference
Java StdLib EC2 328.89 ± 9.71 % 100.00 % 102.30 % 100.00 %
Scala StdLib EC2 365.64 ± 0.06 % 111.17 % 113.73 % 111.17 %
fs2-io EC2 365.86 ± 0.05 % 111.24 % 113.80 % 111.24 %
fs2-io local 681.90 ± 0.45 % 212.09 % 212.09 % 207.33 %
Scala StdLib local 916.20 ± 7.59 % 284.97 % 284.97 % 278.57 %

Conclusion

The IO helpers for FS2 provide an interesting possibility for lightly resource-constrained environments. It’s certainly one to keep an eye on to see if the performance improves as it matures – less on it’s own merits, and more because of what we found out when testing the core fs2 library.

See in git repo

Up next: FS2, take 2