Spark transformations are lazy? Yes, but not all. As most of people knows Spark transformations are lazy evaluated but some of the transformations are not lazily evaluated. In this article we will see what are the spark transformations which are not fully lazily evaluated. Lets dive into the sorted word count example
val textFile=sc.textFile(“README.md”)
val words = textFile.flatMap(line => line.split(” “))
val wordPair = words.map((_,1))
val wordCount = wordPair.reduceByKey(_+_)
Above code snippet will give us the word count for each word in README.md file, Now its time to sort the result on the basis of words.
val wordCountSort = wordCount.sortByKey()
wordCountSort: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[7] at sortByKey at :32
So now our result is sorted by words. But till now nothing is calculated because we only use transformations. Let’s confirm we didn’t fire any job with localhost:4040
Surprise? Yes, sortByKey() is transformation and its not fully lazily evaluated. If you see…
View original post 78 more words
I know this one is not techincal but its worth sharing ! We’ll be writing this series ! Please follow !
Hey Guys ,
This had been a great week so far.
Learned a lot ! Slept like nothing ! Won arguments ! lost some But most of all won a lot of hearts
An update from our side . We presented one of our products in Empressario 2018 in Product and Services track . We wanted to share our knowledge and our experience with you so that we can enroll maximum of you guys to take part in such events and to just stop what you are doing and think about your startup idea that you are planning to execute for a very long time and not starting it because of money issues, time issues, relationship issues etc.
This time we are gonna talk about Empressario , it is the annual business model competition organized by Entrepreneurship Cell, IIT Kharagpur in association with International Business Model Competition (IBMC).
View original post 231 more words
Today we are gonna talk about the problem of solving the mathematical optimization. Lets take a quick example,
Have you ever been in a condition where you have inequalities and you have to find out the maximum or minimum values of the variables for which a particular equation is maximum.
For example : There are two equations basically , one equation that you need to maximize and the other equation for which acts as a constraint
Equation to be maximized : -2 * x + 5 * y
Additional Constraint : y >:= 200 – x
Range of Values:
For X: 100<x<=200
For Y: 80<y<=170
So the value of x and y for which this equation would be maximum will be x=100 and y=170 (Checkout your mathematical skills :D)
Now we need to solve such equations programatically. Where comes in the mathematical optimization libraries.
In our case we are going to use Optimus
Now let’s get started with the programming:
Firstly you need to add the optimus dependency in your build.sbt So your buid.sbt should look like this.
name := "scala-mip" version := "0.1" scalaVersion := "2.12.4" organization := "com.foobar" // The necessary dependencies can be added here libraryDependencies ++= Seq( "com.typesafe" % "config" % "1.3.1", //Mathematical Programming Dependencies "com.github.vagmcs" %% "optimus" % "2.1.0", "com.github.vagmcs" %% "optimus-solver-oj" % "2.1.0", "org.scalatest" %% "scalatest" % "3.0.0" % "test" )
Now we have to make a bounded variable i.e to define this expression:
For X: 100<x<=200
For Y: 80<y<=170
In Optimus we do it using this.
val x = MPFloatVar("x", 100, 200) val y = MPFloatVar("y", 80, 170)
The whole code something looks like this :
import optimus.optimization._ implicit val problem = LQProblem(SolverLib.ojalgo) val x = MPFloatVar("x", 100, 200) val y = MPFloatVar("y", 80, 170) maximize(-2 * x + 5 * y) add(y >:= 200 - x) start() println("objective: " + objectiveValue) println("x = " + x.value + "y = " + y.value) release()
Explanation:
LQProblem: defines its a Linear equation Problem , there can be other problems like Quadratic equation problem , Mixed Integer problem.
maximize(): it takes the expression that needs to me maximize there are other functions also like minimize(), subjectTo() etc.
add(): It takes a constraint expression that needs to be keep in consideration for maximizing that expression
start(): To start the computaion
release(): To release all the resources.
And it results in this output :
Hence for these value of X and Y this equation is maximum.
Currently I am working on a use case in which I am looking forward to use MIP for solving the problem. The problem is optimal Order fullfillment problem which is described here : Optimal Order Fullfillment.
If someone has any suggestions on solving this problem. Ping me ! I can also use some help
I hope you enjoyed this and find it interesting !
If you have any queries , let me know on here or on Twitter
Happy Reading
One last thing : If you are interested in community based learning , please Join us @InternityFoundation
References :
1. Optimus
2. Optimal Order fullfillment
Just FYI,
The haversine formula determines the great-circle distance between two points on a sphere given their longitudes and latitudes. Important in navigation, it is a special case of a more general formula in spherical trigonometry, the law of haversines, that relates the sides and angles of spherical triangles.
Law of Haversines:
Haversine Formula :
For any two points on a sphere, the haversine of the central angle between them is given by
where
So , without wasting any more time , let’s get started with the code.
case class Location(lat: Double, lon: Double) trait DistanceCalcular { def calculateDistanceInKilometer(userLocation: Location, warehouseLocation: Location): Int } class DistanceCalculatorImpl extends DistanceCalcular { private val AVERAGE_RADIUS_OF_EARTH_KM = 6371 override def calculateDistanceInKilometer(userLocation: Location, warehouseLocation: Location): Int = { val latDistance = Math.toRadians(userLocation.lat - warehouseLocation.lat) val lngDistance = Math.toRadians(userLocation.lon - warehouseLocation.lon) val sinLat = Math.sin(latDistance / 2) val sinLng = Math.sin(lngDistance / 2) val a = sinLat * sinLat + (Math.cos(Math.toRadians(userLocation.lat)) * Math.cos(Math.toRadians(warehouseLocation.lat)) * sinLng * sinLng) val c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1 - a)) (AVERAGE_RADIUS_OF_EARTH_KM * c).toInt } } new DistanceCalculatorImpl().calculateDistanceInKilometer(Location(10,20),Location(40,20))
As you can see this is the scala implementation of Haversine formula and it computes the distance in pretty easy way.
You can see the compiled version of the code here on Scala Fiddle
You can also verify your out put from here: National Hurricane Center
They are approximately same.
Hence in this way we can get the distance between two Locations.
If you have any queries you can contact me here or on twitter:@shiv4nsh
If you liked it, share it among peers !
Just one more thing , we are developing a community of developers and college graduates to help them so they may not face the same problem as we did. So if you liked the idea and want to be a part of this movement .
Come and join us : InternityFoundation
I hope you join it
Wish you best of the weekend
Till then ! Happy hAKKAing
References:
1. Wikipedia : Haversine Formula
2. National Hurricane Center
But this time this simple thing was not happening and I was getting frustrated and then while searching for it I came across a very good library for scala known as Cats which seems to solve my problem.
Just FYI, Cats is a library which provides abstractions for functional programming in the Scala programming language. The name is a playful shortening of the word category.
(Copied from Github )
So this is how I got introduced to Cats !
So how I solved my problem is via writing a function using cats.Functor
For using Cats you need to add the cats dependency in your build.sbt
name := "scala-cats" version := "0.1" organization :="com.internity" scalaVersion := "2.12.3" scalacOptions += "-Ypartial-unification" libraryDependencies += "org.typelevel" %% "cats-core" % "1.0.0-RC1"
case class A(a:String) import scala.concurrent.Future import scala.concurrent.ExecutionContext.Implicits.global import cats.implicits._ import cats.Functor val eitherOfFutures:Either[Future[String], Future[A]]=Right(Future.successful(A("Shiv"))) val futureOfEitherOfFuture=Future.successful(eitherOfFutures) def eitherFmap[F[_], A, B](either: Either[F[A], F[B]])(implicit F: Functor[F]): F[Either[A, B]] = either.fold(f => F.map(f)(Left(_)), f => F.map(f)(Right(_))) val futureOfEither=futureOfEitherOfFuture.flatMap(eitherFmap(_))
You can also see it on the sbt Console
So this is how we can convert Either[Future[A],Future[B]] into Future[Either[A,B]].
As per comment from M50d pointed out a easier solution :
You can also do it in a more simpler way by using bisequence in cats.instances._ as pointed out by M50d
import cats.instances._
eitherOfFutures.bisequence
Recently from the comments from Reddit on this post , I have found a new apporach in simple scala I am adding it here thanks to the user @jangchoe for this code snippet
eitherOfFutures.fold(f => f.map(Left(_)), f => f.map(Right(_)))
If you have any problems look me up on twitter and let me know !
We are also developing a community of developers for bridging the gap between Corporates and the colleges ! If you want to be a part of this movement ! Signup on InternityFoundation
Till then Happy hAKKAing
References :
]]>
Playing with play and Lagom !
Simply put, it is a microservices tool.
Lagom is a microservices framework that guides developers in moving from monolithic to a scalable, resilient microservice based systems. It is a platform delivering you not only the microservice framework but also a whole tool set for developing applications along with creating, managing and monitoring your services.
According to Lightbend the focus should not be on how small the services are, but instead they should be just the right size, “Lagom” size services.
Most microservices frameworks focus on making it easy to build individual microservices. Lagom allows developers to run a whole system of microservices from a single command.
Before you begin creating your microservices, you need to make some selections:
View original post 242 more words
Here I am to going to write a blog on Hadoop!
“Bigdata is not about data! The value in Bigdata [is in] the analytics. ”
-Harvard Prof. Gary King
So the Hadoop came into Introduction!
Hadoop was created by computer scientists Doug Cutting and Mike Cafarella in 2006 to support distribution for the Nutch search engine. It was inspired by Google’s MapReduce.
The problem with RDBMS is , it can not processed semi-structured and unstructured data (text, videos, audios, Facebook posts, clickstream data, etc.). It can only work with structured data(banking transaction, location information, etc.). Both are also different in term of processing data.
RDBMS architecture with ER…
View original post 753 more words
Styling your Scala code !
“I have never seen elegance go out of style” – SONYA TECLAI
Scala (The Beast) has imported efficiency and elegance to the programming world by combining coziness of object oriented programming with the magic of functional programming. This blog illuminates the work of Ólafur Páll Geirsson who has done a great job of styling the beast. In the bulk below I will discuss my experience of using Scalafmt, its installation process and some of its cool code styling features. Stuff we would be using in this blog
One of the most important aspect of good code is its readability which comes with good and standard formatting. Ever wondered how an entire project having around 1000 Scala files of poorly formatted code could be formatted without having a headache? Well going shft + ctrl + alt…
View original post 634 more words
Introduction to Gherkin ! New way of testing !
Hello Everyone ,
In this blog we will discuss about Gherkin Language which we used in BDD for writing test cases.we will take a look on below topic.
Gherkin’s grammar is defined in the parsing expression grammars. It is Business Readable, DSL created specifically for behavior descriptions without explaining how that behaviour is implemented. Gherkin is a plain English text language.
Gherkin serves two purposes — documentation and automated tests. It is a whitespace-oriented language that uses indentation to define structure.
The Gherkin includes 60 different spoken languages so that we can easily use our own language.The parser divides the input into features, scenarios and steps.
Here is a simple example of Gherkin:
When we run this feature this gives us a step definition.In Gherkin, each line is start with a Gherkin keyword, followed by any text you like.
The main keywords are:
View original post 198 more words