A few vaguely interesting numbers

CrowdProcess is a very particular distributed computing platform. It has browsers connecting and disconnecting all the time, and their number varies considerably during the day.

So how are job times affected by these variations? We decided to run a very basic experiment, which give a (non scientific) feel for this. If you want to know how we did it, read on. If you are more interested in the cool graphs, scroll down. 

Here is what we did: 
We took a simple Run function, which uses a Monte Carlo simulation to calculate pi with 1000000000 points, and returns only the time it took to calculate. In node.js, each run takes about 12 and a half seconds.
   

function Run() {
   var inQuarterCircle = 0,
   n = 1000000000,
   i = n;
   timer= Date.now();
      while(i—) {
         if (Math.pow(Math.random(),2 ) + Math.pow(Math.random(),2 ) < 1) {
            inQuarterCircle++;
      }
   }
   var pi=4*inQuarterCircle / n;
   return (Date.now()-timer)/1000;
}

Next, we made 4 json files, and called them (to be very original), small, medium, large and huge. Each one got a different number of empty objects, corresponding to the number of tasks to be run (our function does not take an input in this case)

small.json:          2,080 tasks
medium.json:    10,000 tasks
large.json:          20,000 tasks
huge.json:          60,000 tasks

Excellent. Now it remained only to run them on the platform. During the different runs on the platform (across multiple days), the number of browsers varied  considerably, as follows:
image

So it was important to control for number of browsers, and each experiment was run multiple times. 

Each job we sent was returned with the times that each task took to execute. The interesting thing was to determine how different browsers took different times to run the same task. 

The average time on the local run, in node, on a Toshiba L50 (Intel® Core™ i5-3230M Dual Core) was 12.771 seconds. 

On the browsers, the distribution looked more like this:

image

This is an example from a small.json. Interestingly, a full 27,85% of browsers outperformed the local run (72.12% took longer). If you are wondering about the long tail, 3.3% crossed the 1 minute mark, and none past the two minute mark (because of a platform timeout at 2 minutes).

Interesting… now on to the experiments themselves! If a single task would take 12.771 seconds, and a small.json has 2080 objects, then the expected sequentially run would take a bit over 7 hours. To the platform! 

image

Not bad, the worse result was a 172x speedup, and the best was 288x. Time going down pleasantly linearly with the number of browsers. 

Beyond, to the medium jobs! 10k tasks, expected time 35.7 hours. 

image

Again, beautifully linear. as expected, and speedups ranging between 240x and 517x. Not much of a challenge for a distributed computing platform though…

So up again, to 20k tasks. (expected time: 70.9 hours).

image

All linear, except for a massive outlier. The most likely answer is that the platform was being shared by multiple developers running different tasks at the same time. Speedups at a respectable minimum 238x and maximum 643x.

Finally, the run corresponding to the ‘huge’ file, with 60k tasks (“huge is a major overstatement, the platform has run jobs orders of magnitude higher, but it sounded like the natural thing after “large”).

Expected time for the “huge” file was 212 hours (a bit over a week sequentially).
image

Max speedup at 755x, and a minimum at 268x. One question springs to mind, what happens if we plot speedup vs number of browsers? 

Well, this happens: 

image

Interestingly, the job sizes have a clear impact on speedup, and not only on time of computation, even though the tasks often outweigh the number of nodes by more than an order of magnitude.

Which begs the question… What happens to speedup per browser, with the number of browsers?

image

Speedup per browser went up to almost 0,5 with a large number of tasks on a small number of nodes, but clearly decreases as the number of nodes increases. It seems comparatively always higher on larger jobs.

So what’s next? Well, today we ran a job with 1 million tasks in 13.11 minutes, with a speedup of over 3000x. We will be publishing more on that next week so follow the blog, or try out the platform yourself (which is probably an even better idea).