I need a worldwide computing cluster; how do I avoid AWS, Azure etc.?


#1

Hi there, boys and girls!

I am starting my PhD on… distributed networks, yaaay! Having fun with gossiping algorithms1, using Docker to create hundreds instances of nodes. But I want hundreds of thousands!

So, my dear supervisors, and basically everyone knowing the field, is urging me to “try out” Amazon AWS (or Microsoft Azure, Google Cloud, etc.) to have them host my tremendous amount of nodes (it’s lightweight, though: my laptop can run 200 before bugs arise). It would be perfect: they have the horse power, it’s cheap, and they can send my nodes all around the globe, further validating my experiment in a real case study.

But they are major actors in corporate surveillance, I don’t want my lab to give them any more money! (I checked out the encryption policy on AWS: they provide me with the encryption keys, I don’t get to create them myself! Hence, they can eavesdrop on what I’m doing.)

So, the million dollar question is: is there any alternative? Raspberry PI clusters proofs-of-concept are happening all over the place, but I couldn’t find anyone networking them altogether to provide a massive worldwide Beowulf cluster. Maybe some of you know such kind of initiative? I’d be really glad. I want: Docker, ~100 nodes at least, WAN connectivity, and the most geographically scattered cluster as possible.

I think there is much to do with the infamous “cloud” computing. If only we all had a bunch of networked micro-computers ready to serve my experiments, when they would not be serving cached pictures for Diaspora* or else.

PS: @aral, I just listened to your Boss Level Podcast. I loved the comparison between common goods, parks, and shopping malls! I’ll be using this example too, unless you patented it, maybe? :wink:

1 Gossip algorithms are a collection of P2P protocols where the information is disseminated by nodes communicating with only some of the other nodes (their “neighbours”). In the end, the information is super well propagated, the system is uber resilient. A downside is the increased network traffic. Here’s the founding article (Palo Alto 1987, please!), if you’re interested. And a broader article proposing P2P as a solution for scaling up centralised systems (to make my point on the advantages of “cloud” computing).


#2

Look into Triton SmartOS, a lightweight open container hypervisor that delivers industrial grade security. As far as a geographically scattered clusters go, I know Trition users that would help out.


#3

Not sure I understand your problem 100%.

“I checked out the encryption policy on AWS: they provide me with the encryption keys, I don’t get to create them myself!”

Where do you need the encryption? If it is for ssh connectivity you can upload your generated public key(s) into AWS. You don’t have to use the provided ones.


#4

@jcz It’s quite the first time I look into SmartOS, though I already heard of Joyent. It sounds really nice, with bare-metal virtualization instead of software virtualization as Docker does (if I understood right). This is the kind of things I want to look into after this first experiment, things like unikernels.

Though, I already set up my experiment for Docker, I guess it’s not going to be a one-liner to port to SmartOS, and I’m on a schedule. So I don’t think I will attempt a port unless I fail at all other Docker solutions.

@tre Encryption was just one example of why I don’t like AWS so much (along with their human resources management, mostly). But I didn’t know I could upload my own keys, I’ll check this out.

Given the time I have and my imperative of having a worldwide cluster, I think I will be obliged to apply to big computing clusters. Someday, creating clusters won’t have such tremendous infrastructure costs, and we will be able to create affordable distributed ones, I hope!


#5

FYI
Triton fully supports Docker, uses the native Docker API.
Expanded container service


#6

Hi, I’ve worked on the same thing in the past and I’d love to compare notes. You can reach me almost 24/7 at faddat@gmail.com on google hangouts.


#7

Seriously? That’s funny because even if you uploaded your own private key, since it is running in a VM they can mess with the random number generator, peek under the hood of DHE or simply scan your virtual memory for a copy of the private key, so simply by using a virtual machine you are entirely at their mercy. If they then have physical premises in the USA, than the local legislation requires them to give NSA full access while also requiring NSA to gain maximum possible access.

Nothing that I am aware of, if you can’t run it on top of people’s private Linuxes. If it’s a cloud, then it is by design surveillable. If the cloud has physical hosting in the USA, your data goes into XKEYSCORE. If Triton runs on private computers rather than rented VMs, that might be a way to go.

Maybe you should reconsider the design of your application? I assume you asked the questions you ask because you intend to share private information with each and every node in the network. If each individual node only gossips encrypted or uncritical data instead, then it is not a problem if some nodes in the backend aren’t fully trustworthy. The GNUnet network has a mix of rented servers and private home computers. It uses gossip protocols for non-critical purposes and assumes zero trust in other nodes. You may want to look into that?


#8

It is and works great.

Check out https://project-fifo.net for more fantastic work.


#9

Just stumbled across this:

is that going towards what you want to have?