Categories
Scala

Why Scala?

In January 2013, we engineers at Avention’s Austin office were given an opportunity to build the next generation of the company’s products from scratch. The products involve lots of big data analysis, complex use of search engines, web UI’s, and the ability to scale to a very large existing user base. The company gave the founding members of the Austin engineering team a blank check to bring in the technologies we wanted, the tools we wanted, the methodologies we wanted, and to build the team we wanted. It was a great opportunity.

One of our first decisions was to pick a language for the products. Over the years at various other companies, our team members had built major products in Java, Ruby, Python, Groovy, and even C++. While we could have used any of these languages, we chose instead to go with Scala, a language none of us had ever used. This blog post examines why we chose to go this route.

We evaluated languages on the following criteria:

  • Developer Productivity.We would be working on a tight schedule. It was highly desirable to use a language that would allow us to write code as quickly as possible and get it into production as quickly as possible.
  • Risk of Bugs.One of our prime directives was to do no harm to the existing Avention base of users. Avention had a huge base of users on its legacy products, and those users would be moving over to the new products. The last thing we wanted was to make those users angry by releasing buggy software. The language we picked would have to lend itself to creating high quality code with reduced risk of bugs.
  • Libraries.The language would need good libraries for things like accessing databases, web frameworks, search engines, concurrency, big data analytics, and so on. The greatest language in the world is useless without a strong ecosystem.
  • Hirability.We were going to be recruiting a number of developers for our team in a highly competitive Austin market where good developers are in high demand. It was critical that we be able to hire a number of developers to work in whatever language we picked.

We’ll see below how Scala is unique among the languages we evaluated in that it scores highly across all of these criteria.

Developer Productivity

Some programming languages are more expressive than others, meaning that you can get more done with fewer lines of code. In general, if you can accomplish a task with fewer lines of code, you can complete the task more quickly. This idea that expressivity leads to higher productivity goes back to the 1960’s and Fred Brooks’ seminal book The Mythical Man Month. Brooks argues that the average number of lines of code a programmer writes per day is constant no matter what language is used. So, if a program takes 10,000 lines in one language and 20,000 lines in another, Brooks asserts that it would take twice as long to write the program in the second language as it would in the first.

Why do expressive languages lead to higher productivity? It’s not that programmers are typing away as fast as they possibly can all day, and that less typing makes them faster. The biggest advantage of expressive languages is that they help keep you in the zone. The more you have to type in to accomplish the next step in the algorithm, the more you get distracted from your train of thought. The less you have to type in, the faster you get back to your train of thought.

Additionally, expressive languages have a couple of other benefits. First, in non-expressive languages important logic sometimes gets hidden in a big mess of boilerplate code. You have to read through a lot of junk to get to the key points in the code. Or worse, you may not notice some of the key logic points in the code at all because they’re so buried. With expressive languages, the key logic is much easier to spot. Second, expressive languages allow you to view more code at once on your monitor without having to scroll around as much.

We progammers all have this sense that some languages lead to higher productivity than others. That’s why we code in modern high-level languages rather than Assembly. But could some high-level languages lead to higher productivity than others? We’ll see with some examples below how statically typed languages like Java are less expressive than dynamically typed languages like Python, Ruby, JavaScript, Groovy, and the various Lisps. We’ll use Python as a fairly representative dynamically typed language. We’ll also see how functional languages like Scala stack up.

A Class to Hold a Point

Let’s start by creating a Java class to store the X/Y coordinates for a point:

class Point {
  private int x;
  private int y;

  public Point(int x, int y) {
    setX(x);
    setY(y);
  }

  public int getX() {
    return x;
  }

  public void setX(int x) {
    this.x = x;
  }

  public int getY() {
    return y;
  }

  public void setY(int y) {
    this.y = y;
  }

  @Override
  public boolean equals(Object other) {
    if (other instanceof Point) {
      Point otherPoint = (Point) other;
      return otherPoint.getX() == getX() &&
        otherPoint.getY() == getY();
    } else {
      return false;
    }
  }

  @Override
  public int hashCode() {
    return (new Integer[] {getX(), getY()}).hashCode();
  }
}

That’s a lot of code for a simple class. Here’s the same thing in Python:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __eq__(self, other):
        if isinstance(other, Point):
            return other.x == self.x and other.y == self.y
        else:
            return False

    def __hash__(self):
        return hash((self.x, self.y))

That’s a lot better. Unlike in Java, we didn’t need to define the member variables outside of the constructor; setting the member variables inside the constructor is sufficient. We also didn’t need to write getter and setter methods. If at some point in the future, we want to add a constraint that x has to be in the range (-100, 100), we could add a setter in the Python code, and we wouldn’t have to change any other code that uses Point to use this setter. That’s different from Java. In Java, suppose we hadn’t written getters and setters and all of our code just referenced the x value as somePoint.x. If we later wanted to enforce the (-100, 100) range, we’d have to add a setter and modify all of our code to use it. That’s a pain, and that’s why idiomatic Java encourages getters and setters from the beginning.

But imagine for a second what the Java and Python code would look like if we had setters that enforce the range of x. That would be one extra line of logic buried in the middle of all that boilerplate Java code. Someone looking at the code base might not even notice the constraint was there. But in the Python, it would be really obvious that we’ve added a constraint on x.

Here’s the equivalent code in Scala:

final case class Point(var x: Int, var y: Int)

That’s it! One line of code. And this code does everything that the Java and Python examples are doing. We’ve got member variables x and y, a constructor that sets x and y, getter and setter methods, and equals() and hashCode() methods. In the future, we could choose to override the setter for x to enforce the range. All in one simple line of code!

Building a Map from Words to Sentences

Suppose we have a method words() that tokenizes a sentence into a list of unique words. Suppose we want to build a new method that takes in a list of sentences and returns a map from each word to the sentences containing that word. Here’s the Java code:

import java.util.List;
import java.util.ArrayList;
import java.util.Map;
import java.util.HashMap;

...

public Map<String, List<String>> makeWordMap(
    List<String> sentences) {

  Map<String, List<String>> result =
    new HashMap<String, List<String>>();

  for (String sentence: sentences) {
    for (String word: words(sentence)) {
      List<String> sentencesForWord = result.get(word);
      if (sentencesForWord == null) {
        sentencesForWord = new ArrayList<String>();
        result.put(word, sentencesForWord);
      }
      sentencesForWord.add(sentence);
    }
  }

  return result;
}

...

List<String> sentences = new ArrayList<String>();
sentences.add("This is the first sentence.");
sentences.add("This is the second sentence.");
sentences.add("This is the third sentence.");

Map<String, List<String>> wordMap = makeWordMap(sentences);

A few observations about the Java code:

  • We have to explicitly import the list and map types, even though these types are used in a high percentage of all Java code.
  • When we define variables result and sentences, we have to list the type parameters on both the left side of the equals sign and then again on the right side.
  • Even though Java knows that makeWordMap() returns a map from strings to lists of strings, when we call makeWordMap(), we still have to specify the full type with type parameters when defining our wordMap variable. Shouldn’t Java be smart enough to know the variable’s type?
  • Because the get() method on Java’s Map interface doesn’t allow us to specify a default value, we have to add an if() statement in the middle of makeWordMap() to cover the case when the current word isn’t found in the map.
  • When we build up our test data at the bottom, the ArrayList constructor doesn’t allow us to construct the list with a set of strings. We have to call sentences.add() over and over to actually populate the list.

The Java code is pretty verbose. Let’s see how Python does.

def makeWordMap(sentences):
    result = {}
    for sentence in sentences:
        for word in words(sentence):
            sentencesForWord = result.get(word, [])
            sentencesForWord.append(sentence)
            result[word] = sentencesForWord
    return result

...

sentences = [
    "This is the first sentence.",
    "This is the second sentence.",
    "This is the third sentence."
]

wordMap = makeWordMap(sentences)

Much better. No imports. The code isn’t cluttered up with type parameters. We don’t need the if() statement within makeWordMap() since Python’s map lookup method get() takes a default value. Our test list of sentences is much easier to build; just surround a comma-separated list of strings in brackets and we have a list. The Python is a lot easier to type in, and it’s a lot easier to look at the code and understand what it’s doing.

Finally, here’s the Scala:

def makeWordMap(sentences: List[String]):
    Map[String, List[String]] = {

  val initMap = Map.empty[String, List[String]]

  sentences.foldLeft(initMap) { (map1, sentence) =>
    words(sentence).foldLeft(map1) { (map2, word) =>
      map2 +
        (word ->; (sentence :: map2.getOrElse(word, Nil)))
    }
  }
}

...

val sentences = List(
  "This is the first sentence.",
  "This is the second sentence.",
  "This is the third sentence.")

val wordMap = makeWordMap(sentences)

For the Scala code, we’ve switched from an imperative solution with mutable variables to a functional solution with immutable variables. The code loops through each sentence. Then it loops over each word in the current sentence. Using getOrElse(), it looks up the sentences for the current word in the map. If the word isn’t in the map, getOrElse() returns an empty list of sentences (Nil). The current sentence is added to the list of sentences, and then the new list of sentences is identified with the current word in a new map.

Like the Python code, the Scala is very compact. There are no imports. Only a few types and type parameters are required. We don’t need any if() statements to handle the case where the current word isn’t in the map yet. Building up a list of test data is pretty easy; we just enclose the sample sentences in List() instead of in brackets.

Plotting Points

Suppose we have a function that generates some points and then calls out to a plotter object to plot those points. We’ll keep the definition of the plotter object abstract so that we can plot to the screen, a printer, a file, or anything else we can dream up. In Java, this means creating a CanPlot interface.

interface CanPlot {
  void plot(Point point);
}

...

void plotSquares(int n, CanPlot plotter) {
  for (int x = 0; x &lt;= n; x++) {
    plotter.plot(new Point(x, x * x));
  }
}

Because Python supports duck typing, the Python implementation doesn’t need an interface. All we need is

def plotSquares(n, plotter):
    for x in range(0, n + 1):
        plotter.plot(Point(x, x * x))

Like Java, Scala needs a formal declaration of what methods are available on the plotter. The only difference from Java is that Scala calls this a trait rather than an interface. So the Scala code is basically the same as the Java:

trait CanPlot {
  def plot(point: Point): Unit
}

def plotSquares(n: Int, plotter: CanPlot) {
  (0 to n) foreach { x =>
    plotter.plot(Point(x, x * x))
  }
}

In this example, Scala is no better than Java, and Python is the clear winner.

Looking up Songs by an Artist

Let’s look at one last example where Scala is a clear winner over both Java and Python. Suppose we have an Artist object that has a getAlbum() method. Method getAlbum() is passed an album name, and it returns an Album object. The Album object then has a getSong() method that is passed a song name and returns a Song object. We want to write a method that takes an Artist, an album name, and a song name as parameters and returns the corresponding song’s length in seconds. Here’s the Java:

Integer getSongLength(
    Artist artist,
    String albumName,
    String songName) {

  Album album = artist.getAlbum(albumName);
  if (album != null) {
    Song song = album.getSong(songName);
    if (song != null) {
      return song.getLength();
    } else {
      return null;
    }
  } else {
    return null;
  }
}

All those if() statements to check for errors make the code pretty ugly. Just by glancing at the code, it’s hard to immediately see that all we’re really doing is calling getAlbum() and getSong() sequentially. The Python code also suffers from obfuscation due to error checking:

def getSongLength(artist, albumName, songName):
    album = artist.getAlbum(albumName)
    if album:
        song = album.getSong(songName)
        if song:
            return song.getLength()

The Python isn’t quite as bad as the Java because we don’t have to specify the else cases. That’s because Python functions return None if they terminate without hitting a return statement. Regardless, we still have a couple of nested if() statements obfuscating the most important parts of the logic.

Finally, here’s the Scala:

def getSongLength(
    artist: Artist,
    albumName: String,
    songName: String): Option[Int] = {

  for {
    album <- artist.getAlbum(albumName)
    song  <- album.getSong(songName)
  } yield song.getLength
}

We’ve switched all of our methods to return Option’s. That is, artist.getAlbum() returns Some[Album] if the album is found and None otherwise. Similarly, album.getSong() returns Some[Song] if the song is found and None otherwise. That allows getSongLength() to run in a for() comprehension without having to explicitly check any error conditions.

But even though we haven’t explicitly checked any errors, the errors are in fact being checked. That is, if the album can’t be found, getSongLength() returns None. Similarly if the song can’t be found on the album, getSongLength() again returns None. Only if both the album and song are found does getSongLength() return a Some[Int] holding the length.

Without the if() statements for error checking cluttering things up, it’s easy to see the main logic. That is, it’s easy to see that we’re getting an album from the artist, getting a song from the album, and returning the resulting song’s length.

Bottom line: We’ve looked at 4 examples. In each case, Java was the least expressive language. In some of the cases, Scala was the most expressive, while in others Python was the winner. But it’s pretty clear that Scala and Python in general are much more expressive than Java. We’d expect developers using these languages to be much more productive than Java developers. And recall that we used Python as a representative example of all the modern dynamically typed languages. So we would expect a Ruby, Groovy, JavaScript, or Lisp programmer to be pretty productive as well. It’s also worth noting that there’s another functional language called Haskell that scores very well on developer productivity for much the same reasons that Scala does well.

Based on developer productivity, then, we eliminated Java from consideration at Avention. That left Scala, Haskell, Python, Ruby, Groovy, JavaScript, and Lisp as candidates.

Risk of Bugs

Just like some languages lead to greater productivity than others, some languages lead to lower risks of bugs than others. Yes, this is the old debate about unit tests versus compile-time type checking, or more appropriately unit tests alone on the one hand versus compile-time type checking plus unit tests on the other hand. Simply put, when you use statically typed languages like Java, the compiler finds a subset of the bugs for you and forces you to fix them before your program will run. You still need unit tests to find the rest of the bugs, but at least the compiler gives you some help. When you use dynamically typed languages like Python, Ruby, JavaScript, Groovy, or Lisp you get no help from the compiler. You only get run-time errors that are harder to find, even if you invest in thorough unit tests.

Imagine a large code base with hundreds of thousands of lines. Suppose you want to add a parameter to a method or change a method’s return type. How can you be sure you’ve fixed every reference to that method? In Java it’s easy. The code won’t compile until you’ve fixed every reference. Furthermore, the compiler will quickly tell you exactly what spots in your code must be fixed. In a large Python code base, you have to grep through the code hoping you’ve found every instance. Then, all you have as a safety net are your unit tests. They probably take a lot longer to run than the Java compiler. When they fail, tracking down what went wrong is not as easy or fast as going through the compiler output in Java. But most importantly, you never quite know if your unit tests found all the problems created by updating the method or if there still are some hidden problems just waiting for your biggest customer to find.

This debate has gone on for years. Are unit tests sufficient or does a combination of unit tests and compile-time type checking produce code with fewer bugs? In his 2011 Master’s Thesis, Evan Farrer set out study this question. He started with four open source Python projects that all had extensive unit tests. He ported each of these projects line-for-line over to Haskell. Like Java, Haskell is a compiled, statically typed language. The idea was that by porting the code line-for-line and compiling it, he could see what type errors the Haskell compiler caught that had snuck through the Python unit tests. He uncovered a total of 17 type errors using the Haskell compiler. So for at least these four programs, unit testing alone was not as effective at finding bugs as a combination of unit testing along with compile-time type checking.

So, until functional programming started to catch on, there were proponents of statically typed languages in one corner shouting “lower risk of bugs, lower risk of bugs”, while proponents of dynamically typed languages were in the opposite corner shouting “productivity, productivity”. It didn’t seem possible to get both low risk of bugs and high productivity. You had to choose sides between these two seemingly incompatible values. The internet is littered with debates between the two camps.

But then along came functional programming languages like Scala and Haskell. (Actually Haskell has been around since 1990. But nobody in the mainstream seemed to notice its beauty and power until the last 5 or 10 years.) As we saw in the previous section, these languages are very expressive, leading to high productivity. But, they are also statically typed and provide all of the compile-time type checking of languages like Java. That is, if you use Scala or Haskell, you get high productivity and lower risk of bugs. You get the best of both worlds, and you can sit back and relax with a smug grin on your face while the Java guys continue to fight it out with the Python, Ruby, JavaScript, Groovy, and Lisp guys.

But it’s even better than that. Languages like Scala and Haskell actually do a better job at compile-time type checking than languages like Java. That is, there are classes of errors that Scala and Haskell catch during compilation that languages like Java typically only catch at runtime. Let’s consider null pointer exceptions.

When is the last time you debugged a null pointer exception? What about a null pointer exception in production code? If your experience is anything like mine, NPE’s consume a lot of your time, and odds are you’ve seen one within the last week, if not within the last day.

The idea of null pointers was introduced by Tony Hoare back in 1965 as part of ALGOL. Years later in a 2009 presentation at QCon, Hoare calls null pointers his billion dollar mistake. A billion dollars is his estimate of the financial impact of NPE’s. I’m guessing that’s a gross underestimate.

To see why NPE’s are so common, consider the following Java type signature from the previous section:

Integer getSongLength(
    Artist artist,
    String albumName,
    String songName)

According to the type signature, the method receives three inputs (an Artist and two String’s), and it returns an Integer. The problem is that the type signature is a big lie. There are actually two possibilities for the return value. The method can return either an Integer or null. But the signature doesn’t tell you anything about the null. How do you know it can return a null? You have to read the code, or maybe someone was nice and alerted you to the null in a comment. Put simply, you just have to know that it can return an Integer or a null. And of course, the operations you can perform on an Integer and a null are quite different.

Let’s look at this another way. Suppose you got confused and thought getSongLength() returned a Song object. If you started using the result like a Song (perhaps calling a Song method), the compiler would give you an error. That’s good. But suppose you get confused and think the return value is an Integer when it’s actually null. You won’t get any help from the compiler at all. But at runtime, you’ll get the dreaded NPE. You’re on your own to know when you have to check for null; the compiler won’t help you one bit.

To make matters worse, it may be that today a particular method does not ever return null. But 6 months in the future, someone changes the method so that it returns null. Or worse, someone changes a method that’s called by a method that’s called by a method that’s called by a method that’s called by this particlar method, and that change in some non-obvious way causes it to sometimes return null. Again, you get no help from the compiler, and your code, which used to work just fine, now starts throwing NPE’s in production.

And of course, the type signature above also lies when it comes to method inputs. For getSongLength(), the compiler will happily let us pass a null for the artist, albumName, or songName whether or not the method implementation actually allows null values.

In Scala, the Option type provides a nice alternative to using null. Option is an abstract type that takes a type parameter. I’ll refer to it as Option[A], where A can be a String, an Artist, a Song, or any other type at all. Option[A] has two subtypes: Some[A] and None. The idea is that if a method like getSongLength() doesn’t always return a value, wrap the result in Option rather than returning null. So in Scala, the getSongLength() signature would look like this:

def getSongLength(
    Artist artist,
    String albumName,
    String songName): Option[Int]

The actual return value will be a Some[Int] if the song is found or a None otherwise. Here’s the key point: The signature now includes information that the method may not return a value. The caller is required to be aware of this possibility of not returning a value. For example, if you call getSongLength() and try to use the result like an Int by adding another number to it, the compiler will give you an error that the addition method is undefined on Option’s. You are required to test whether the resulting Option[Int] is actually a Some[Int] or a None. If it’s a Some[Int], you can unpackage it to get the Int out. Only after this unpackaging can you add another number to it.

Trying to add a number directly to the resulting Option[Int] in Scala is analogous to forgetting to check for null in Java. The difference is that the compiler catches the bug in Scala, while the bug isn’t caught until runtime in Java.

And of course the same thing holds for input parameters to methods. If a parameter is optional, wrap it’s type in an Option rather than allowing null’s to be passed in. Bottom line: In Scala, you should establish coding rules early on that you should never use null under any circumstances. Use Option instead, and all of your NPE’s will disappear.

Haskell takes this one step further. While in Scala, the convention is to not use null, Haskell requires it. Haskell literally has no concept of null. In Haskell, if a function is declared as returning an object of type Foo, you can be certain that it’s returning a Foo and nothing else. No possibility of lying. So in this sense, Haskell provides even better compile-time type checking than Scala. In practice, though, using Option as a convention in Scala seems to work quite well.

It’s worth noting that types like Option, Some, and None could be written in Java. But using these types instead of null in Java would be cumbersome. Scala and Haskell give you all sorts of tools like map, flatMap, filter, and for() comprehensions that make it easy to work with Option’s. As an example, refer back to the Scala getSongLength() example from the previous section. In that example, a for() comprehension hides all the details of checking for Some versus None.

Libraries

We’ve now seen that functional languages like Scala and Haskell are unique in that they score well both with respect to developer productivity and lowering the risk of bugs. But Scala has a huge advantage over Haskell: Scala runs on the JVM. In fact, Scala compiles down to .class files just like Java does, making Scala completely interoperable with Java. Not only can you call Java code from Scala and vice versa, but you can even write plugins in Scala to Java-based systems. Want to write a servlet completely in Scala or a Scala plugin to elasticsearch? No problem. One way to think of Scala is that it is just Java with a more expressive syntax.

Running on the JVM opens up a world of robust, well-tested, and heavily used libraries to Scala developers. Whether you’re building web applications, using a search engine like Solr or elasticsearch, talking to a relational or obscure NoSQL database, the JVM gives you great libraries to work with. The same goes for big data analytics. Scala works seamlessly with Hadoop, Storm, Spark, Mahout, and so on.

In short, Scala doesn’t attempt to reinvent the wheel, but rather runs in an environment that allows it take advantage of all the mainstream Java libraries built up over the last 15 years or so. That makes Scala a very practical language.

For comparison, while Haskell is a mathematically beautiful and pure language, it does attempt to reinvent the wheel when it comes to libraries. Many Haskell libraries have been created over the last few years. But all too often, those libraries are incomplete, or buggy, or poorly documented. Or the libraries just aren’t there. In fairness, Haskell developers are working hard to close the library gap and are making great progress. But they have a lot of ground to make up, and they’re shooting for a moving target. The result is that if you build a large project in Haskell, you’re likely to spend a lot of time fighting with libraries or worse, writing libraries that you could have had for free if you were on the JVM.

This library gap between Scala and the JVM on the one hand and Haskell with its custom libraries on the other hand is what makes Scala a very pragmatic, safe language to use. Haskell is more of an academic, risky language, perhaps not as well suited for real-world applications. It’s hard to look your boss in the eye and tell him that Haskell is a safe choice. It’s a lot easier to tell him that Scala is just a nicer syntax for Java. Haskell does have some nice advantages over Scala. For example, in the previous section we saw that Scala by convention encourages you to not use null, while Haskell requires it. But these Haskell advantages are way overshadowed by the library gap.

Hirability

Hiring good developers is tough. There aren’t a lot of them out there, and they’re in high demand. Furthermore, there aren’t a lot of developers on the market who know Scala. So if you go to a recruiter and ask for candidates who already know Scala, you won’t end up interviewing many, if any, candidates at all. It seems like Scala might be a barrier to hiring.

There are two basic approaches to building a team of developers. The first approach is to hire a bunch of average developers. We’ll call this the commodity developer approach. With this approach, you increase your total development capacity by hiring more and more programmers. Because you’re hiring a bunch of developers, you want them to be cheap and easily replaceable. You might be using offshore developers to save money. If you go the commodity route, programmers are easily replaceable, and some turnover is ok. You want to pick a technology stack that is simple and that lots of developers know. That way, you can hire only developers who know your stack, and they can get up to speed quickly. Scala is a poor choice for the commodity approach because 1) not a lot of developers know it, and 2) it’s complicated to learn. Mastering the functional techniques that make Scala so powerful may be beyond the capabilities of an average, commodity developer. Stacks based on Java or PHP make much more sense.

But there’s an inherent problem with the commodity approach. The following graph shows how your total development capacity increases as you add more developers to your team:

Development Capacity

As you add more and more developers to your team, each new developer adds less and less to your total development capacity. Put another way, the bigger your team, the lower the productivity of each individual developer. Why? Because the bigger the team, the more the developers spend their time communicating with each other and the less time they spend writing code. Also, as your team gets bigger, you’ll have to pull your best developers out from writing code at all; you’ll need them as managers of the lesser developers instead. This phenomenon has been well understood since the 1960’s when The Mythical Man Month was published.

This brings us to the second approach to building a team of developers: Keep the team small and focus on making each developer as productive as possible. How do you maximize developer productivity? Start with letting the developers use the best tools, the best hardware, the best methodologies, and the best languages. We’ve already seen how Scala leads to higher productivity than other languages. But the most important way to maximize developer productivity is to set high standards and only hire the very best, elite developers. This second approach is the exact opposite of the commodity developer approach. You want to build a small team of elite developers so that you’re in the sweet spot of the development capacity curve. Elite developers are hard to find and even harder to hire. Turnover is deadly; you want to hang onto your developers once you find them.

With the elite approach, you can be more willing to hire developers who don’t already know your technology stack. Since you’re only hiring a few developers and you’re hoping to keep them for a long time, you can afford to train your new developers on a new language. Furthermore, if you’re only hiring really smart developers who have strong records of learning new technologies, they’ll quickly pick up any language you throw at them, including Scala. So if you go with the elite developer approach and are willing to hire developers who don’t know your stack already, use of Scala won’t hinder your ability to build a team.

But it turns out that not only will Scala not hinder your hiring efforts, it will actually help them. Elite developers are hard to find and are in high demand. When an elite developer decides to look for a new job, you can bet that he’ll have a bunch of companies going after him. You’ll be just one of many companies trying to hire that developer. You need some way of standing out in the crowd. Do you want to be just another one of the n Java or Ruby shops making their pitch to the developer? Using Scala will make your company really stand out. Why? Two things elite developers really like are playing with new technologies and working with other elite developers. Having an opportunity to take a job using a new language will be very intriguing to many of the best developers on the market. And use of Scala is a key indicator to candidates that you’ve got a really strong team.

So, if you’re out to build a small team of great developers, Scala scores really well on hirability.

Conclusions

Scala is unique among languages in that it grades well on developer productivity, risk of bugs, libraries, and hirability. We’ve seen how Java scores poorly on productivity. Dynamically typed scripting languages like Python score poorly on risk of bugs. Haskell scores poorly on libraries.

Scala is very expressive, allowing you to write a small amount of code that does a lot of work. It has powerful compile-time type checking. Furthermore, it encourages techniques like using Option’s instead of null that allow the compiler to catch lots of errors that would be caught at runtime in other statically typed languages like Java. With Scala, you have access to all of the JVM libraries. And, if your goal is to build a small team of really strong developers, Scala will help you stand out from the crowd in your recruiting efforts.

By Clint Miller

CTO at CognitOps

27 replies on “Why Scala?”

Great post! I agree in much points of your conclusions.

Recently in my team we’ve a similar scenario with a new project. We are considering Scala as a very reasonable language (we’ve some experienced Java developers), but we’re cannot use Scala at the moment, and the main reason was the hireability. It’s difficult to find experienced developers, and in that case they will be considerable expensive.

Personally I don’t agree with this kind of policy, but in some big companies is really difficult to change the philosophy

Nice post!

You have a “bug” in your Java getSongLength method – if passed artist object is null you will get NPE. 🙂

If you apply NULL OBJECT Pattern, your Java code could look like this:

    public Integer getSongLength(String artistName, String albumName, String songName) {
        Artist artist = getArtist(artistName);
        Album album = artist.getAlbum(albumName);
        Song song = album.getSong(songName);
        return song.getLength();
    }

Full source:

package com.blog;

import static junit.framework.Assert.assertEquals;

public class SongTest {

    public static void main(String[] args) {
        SongTest st = new SongTest();

        assertEquals(0, st.getSongLength("artistName", "albumName", "songName").intValue());

    }

    public Integer getSongLength(String artistName, String albumName, String songName) {
        Artist artist = getArtist(artistName);
        Album album = artist.getAlbum(albumName);
        Song song = album.getSong(songName);
        return song.getLength();
    }

    public Artist getArtist(String artistName) {
        // assume artist was not found - return NULL artist
        return Artist.NULL;
    }
}


interface Artist {
    public Album getAlbum(String name);

    public static final Artist NULL = new Artist() {
        public Album getAlbum(String name) {
            return Album.NULL;
        }
    };
}

interface Album {
    public Song getSong(String name);

    public static final Album NULL = new Album() {
        public Song getSong(String name) {
            return Song.NULL;
        }
    };
}

interface Song {
    public Integer getLength();

    public static final Song NULL = new Song() {
        public Integer getLength() {
            return 0;
        }
    };
}

Note that you have to be super careful when using case classes with variable fields. Here’s a warning example:

“`
scala> case class Point(var x: Int)
defined class Point

scala> val p = Point(1)
p: Point = Point(1)

scala> val m = collection.immutable.HashSet.empty + p
m: scala.collection.immutable.HashSet[Point] = Set(Point(1))

scala> m(p)
res8: Boolean = true

scala> p.x = 10
p.x: Int = 10

scala> m(p)
res9: Boolean = false

scala> m(Point(1))
res10: Boolean = false

scala> m(Point(10))
res11: Boolean = false

scala> m
res12: scala.collection.immutable.HashSet[Point] = Set(Point(10))
“`

Good point. I should have mentioned in the article that I only used var’s in the case class for parity between the Scala, Java, and Python. In practice, I can’t think of a good reason to ever have a case class with mutable members.

Never is a strong word. It’s pretty hard to write something like a cache or a connection pool without var’s. But, var’s should only be used when absolutely necessary, and you should access them as low in your call stack as possible so that as much of the top of your stack is pure functions as possible (more on that in my next blog post).

But to be clear, my use of var in the Point class is not an appropriate use of var. I only did that so that the code was functionally identical to the Java and Python code.

There are two things, that irritated me while reading your article:
1. Disservice made to Java by ignoring good practices like switching from nulls to Optional (either from Guava or better from Java 8) or Null objects and ignoring the advantages of good language verbosity,
2. Ignoring the subject of source code maintainability which is kinda crucial to the applications that are going to be used and changed for more than just few months.

It has eventually pushed me into writing response on my blog which can be found here: http://bandrzejczak.blogspot.com/2015/01/why-not-scala.html

I’d be happy to respond to all concerns 🙂

I asked if there’s really a difference between Java Optional and Scala Option on StackOverflow: stackoverflow.com/q/21714594/1360888 (comment system does not allow me to use links)
While there’s not a syntactical difference, Optional is added on top of Java standard libraries, that is, you cannot use it many cases (read collections).

I’m interested in your point for view, going to read why-not-scala.

Reblogged this on Power to Build and commented:
This is a great post. I am still “looking” into Scala. This gave a nice introduction comparing it with 2 languages I’ve seen/used – Java and Python. Thanks! Look forward to more like this.

This is a great post. I am still “looking” into Scala. This gave a nice introduction comparing it with 2 languages I’ve used – Java and Python. Thanks! Look forward to more like this.

Thanks for putting together the Haskell versions. I’m always amazed at what a cool language Haskell is.

Perhaps not as readable, but you can cut the Scala version of the word counter down even further:

sentences.flatMap(s => s.split(" ").map(w => w -> s)).groupBy(x => x._1).mapValues(_.map(_._2))

Hi Clint, thank you for sharing this article because it’s really nice.
Anyhow, I wish to write you my personal experience: in my company we are using Scala in production since 2013, and I can assure you that maintainability has become a problem because the language is really expressive and smart developers in a hurry tend to wrap code way too much. Not to mention the bad habit of not to write tests (it’s too slow…).
Don’t get me wrong: I perfectly know that this is not problem of Scala itself, but due to its power, I strongly warn anyone about “going fast, writing concise” because it’s really too easy with Scala.

I think the really cool thing about Scala is pattern matching:

def main(args:Array[String]) {
args.toList match {
case “copy” :: src :: dest :: Nil => copy(src, dest)
case “add” :: item :: to :: Nil => add(item, to)
case _ => “Invalid input”
}
}

Each time I write some command line utility this always makes me really happy.

This is really a nice and long post.

Thanks for sharing.

Someone refer Java as US. and python as UK. Scala as Hungary.

My point is scala is really the only plausible way for me to try functional programming and plug it into my job’s JVM environment. Try to make me a non-commodity while keeping my boss not complaining too much.

Some advantages of scala’s functional aspects are missing in the post but it is fair enough given the purpose of this post.

For the instance where you say Python is the better option, you could actually remove the trait and rely on Scala’s Structural Typing in the parameter: i.e. def plotSquares(n: Int, plotter: {def plot(point: Point): Unit})

It’s worth mentioning that using a Java library in Scala will expose you to the danger of getting nulls. So you have to wrap all third party library calls in Option() constructor.

Leave a reply to SamV Cancel reply