Hi there, boys and girls!
I am starting my PhD on... distributed networks, yaaay! Having fun with gossiping algorithms1, using Docker to create hundreds instances of nodes. But I want hundreds of thousands!
So, my dear supervisors, and basically everyone knowing the field, is urging me to "try out" Amazon AWS (or Microsoft Azure, Google Cloud, etc.) to have them host my tremendous amount of nodes (it's lightweight, though: my laptop can run 200 before bugs arise). It would be perfect: they have the horse power, it's cheap, and they can send my nodes all around the globe, further validating my experiment in a real case study.
But they are major actors in corporate surveillance, I don't want my lab to give them any more money! (I checked out the encryption policy on AWS: they provide me with the encryption keys, I don't get to create them myself! Hence, they can eavesdrop on what I'm doing.)
So, the million dollar question is: is there any alternative? Raspberry PI clusters proofs-of-concept are happening all over the place, but I couldn't find anyone networking them altogether to provide a massive worldwide Beowulf cluster. Maybe some of you know such kind of initiative? I'd be really glad. I want: Docker, ~100 nodes at least, WAN connectivity, and the most geographically scattered cluster as possible.
I think there is much to do with the infamous "cloud" computing. If only we all had a bunch of networked micro-computers ready to serve my experiments, when they would not be serving cached pictures for Diaspora* or else.
PS: @aral, I just listened to your Boss Level Podcast. I loved the comparison between common goods, parks, and shopping malls! I'll be using this example too, unless you patented it, maybe?
1 Gossip algorithms are a collection of P2P protocols where the information is disseminated by nodes communicating with only some of the other nodes (their "neighbours"). In the end, the information is super well propagated, the system is uber resilient. A downside is the increased network traffic. Here's the founding article (Palo Alto 1987, please!), if you're interested. And a broader article proposing P2P as a solution for scaling up centralised systems (to make my point on the advantages of "cloud" computing).