Scalar UDFs with Optional (Option, Some, None)

If you are using Spark SQL and you need specific logic to act on each row, it’s recommended to use UDF (User Defined Functions) to not include complex logic in the query.

Right, but what happens when some values are null and you don’t want to include if blocks that makes your code less readable? Well It’s possible to include Optional scala methods to let your UDFs in a cleaner way.

Let see some examples:

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.udf

val spark = SparkSession
  .builder()
  .appName("Spark SQL UDF scalar example")
  .getOrCreate()

val applyLogicByCountry = (country: String, amount: Long) => {
      Option(country) match {
        case Some("Peru") | Some("Spain") => amount * 2
        case Some("Brazil") | None | Some("Portugal") => amount * 3
        case _ => amount
      }
 }

spark.udf.register("apply_logic_by_country", applyLogicByCountry)

spark.sql("SELECT id, country, amount, apply_logic_by_country(country, amount) FROM test").show()

I hope it helps!

References:

https://spark.apache.org/docs/latest/sql-ref-functions-udf-scalar.html

https://www.baeldung.com/scala/option-type

https://www.geeksforgeeks.org/scala-option/