From: np map on 13 Mar 2010 18:21 I'd like to write an open source clustering (for computation and general use) and automation of configuration/deployment in Python. It's main purpose is to be used in academic environments. It would be something like running numpy/simpy code (and other custom python code) on a set of machines in a distributed fashion (e.g. splitting tasks, doing certain bits on some machines, other sub-tasks on other machines, etc). The cluster could be used in at least two ways: - submit code/files via a web interface, monitor the task via the web interface and download the results from the master node (user<>web interface<>master) - run code directly from another machine on the cluster (as if it were a subprocess or something like this) Requirements (so far): - support the Ubuntu Linux distribution in the initial iteration - be easy to extend to other OS-es and package managers - try to be 3.x compatible where dual compatibility is possible (2.x and 3.x) - it will support Python 2.5-2.6 - document required changes to the 2.x only code to make it work on 3.x - make it easy to submit code directly from python scripts to the cluster (with the right credentials) - support key based authentication for job submission - should talk to at least one type of RDBMS to store various types of data - the cluster should be able to kill a task on nodes automatically if it executes for too long or requires too much memory (configurable) - should be modular (use automation & configuration or just clustering) Therefore, I'd like to know a few things: Is there a clustering toolkit already available for python? What would the recommended architecture be ? How should the "user" code interface with the clustering system's code? How should the results be stored (at the node and master level)? Should threading be supported in the tasks? How should they be returned to the Master node(s)? (polling, submitted by the nodes, etc) What libraries should be used for this? (e.g. fabric as a library, pyro, etc) Any other suggestions and pieces of advice? Should Fabric be used in this clustering system for automation? If not, what else? Would simply using a wrapper written in python for the 'ssh' app be ok? Would the following architecture be ok? Master: splits tasks into sub-tasks, sends them to nodes - provided the node's load isn't greater than a certain percentage, gets results, stores and provides configuration to nodes, stores results, etc Node: runs code, applies configuration, submits the results to the master, etc If this system actually gets python-level code submission inside, how should it work? The reason I posted this set of questions and ideas is that I'd like this to be as flexible and usable as possible. Thanks.
|
Pages: 1 Prev: wx error, I suspect my class Next: Feeding differeent data types to a class instance? |