From: GaryScott on 2 Jul 2010 16:20 Comments Please: "A group at Rice University is pursuing an alternate vision of coarray extensions for the Fortran language. Their perspective is that the Fortran 2008 standards committee's design choices were more shaped more by the desire to introduce as few modifications to the language as possible than to assemble the best set of extensions to support parallel programming. They don't believe that the set of extensions agreed upon by the committee are the right ones. In their view, both Numrich and Reid's original design and the coarray extensions proposed for Fortran 2008, suffer from the following shortcomings: There is no support for processor subsets; for instance, coarrays must be allocated over all images. Coarrays must be declared as global variables; one cannot dynamically allocate a coarray into a locally scoped variable. The coarray extensions lack any notion of global pointers, which are essential for creating and manipulating any kind of linked data structure. Reliance on named critical sections for mutual exclusion hinders scalable parallelism by associating mutual exclusion with code regions rather than data objects. Fortran 2008's sync images statement doesn't provide a safe synchronization space. As a result, synchronization operations in user's code that are pending when a library call is made can interfere with synchronization in the library call. There are no mechanisms to avoid or tolerate latency when manipulating data on remote images. There is no support for collective communication. To address these shortcomings, Rice University is developing a clean- slate redesign of the Coarray Fortran programming model. Rice's new design for Coarray Fortran, which they call Coarray Fortran 2.0, is an expressive set of coarray-based extensions to Fortran designed to provide a productive parallel programming model. Compared to the emerging Fortran 2008, Rice's new coarray-based language extensions include some additional features: process subsets known as teams, which support coarrays, collective communication, and relative indexing of process images for pair-wise operations, topologies, which augment teams with a logical communication structure, dynamic allocation/deallocation of coarrays and other shared data, local variables within subroutines: declaration and allocation of coarrays within the scope of a procedure is critical for library based- code, team-based coarray allocation and deallocation, global pointers in support of dynamic data structures, and enhanced support for synchronization for fine control over program execution, safe and scalable support for mutual exclusion, including locks and lock sets; and events, which provide a safe space for point-to-point synchronization."
From: nmm1 on 12 Jul 2010 07:37 In article <d7fe8a45-0cc4-4231-8af0-6ffb690a2ac0(a)x27g2000yqb.googlegroups.com>, Ian Bush <ianbush.throwaway.account(a)googlemail.com> wrote: >> >> > for instance, coarrays must >> > be allocated over all images. >> >> This is intentional, and motivated by performance. =A0Team-based coarrays >> are either very hard to implement (requiring significant system >> changes), or will perform no better than allocatable components of >> coarrays which we already support. =A0Perhaps performance is not a primar= >y >> requirement for the Rice proposal, but it was for J3. =A0The feedback fro= >m >> users is that coarray performance has to be competitive with MPI. =A0If >> not, many people will not use it in their codes. =A0A lot of effort was >> put into the Fortran 2008 spec to avoid features that forced reduced >> performance. >> >I'm sorry, I don't get this. Why are team based co-arrays so >difficult? Their MPI equivalent, use of multiple MPI communicators, >are the basis of many large scale applications today, so why the >problem? Er, no. Far fewer than you might think. Of the dozen MPI applications I looked at, only two used them (and one was an MPI tester of mine). On this matter, I should be very interested to know the MPI calls that CRYSTAL (and DL_POLY_3 and GAMESS-UK) makes, so that I can update my table and potentially modify my course. I have a script for source scanning. There are two problems that I know of: 1) Specification. That's soluble, but making the standardese watertight is not easy. MPI put a lot of effort into that, and had a much simpler task than Fortran does. As others have said, it's being tackled. 2) Implementation. It's NOT easy to implement such things either reliably or efficiently - gang scheduling for all processes is one thing, and multiple, potentially interacting gangs is another. In MPI, it is quite common for the insertion of barriers on COMM_WORLD to improve performance considerably. Regards, Nick Maclaren.
From: nmm1 on 12 Jul 2010 08:04 In article <1653902b-410c-4e6a-ada1-c950c87599c9(a)d16g2000yqb.googlegroups.com>, GaryScott <garylscott(a)sbcglobal.net> wrote: >Comments Please: They should learn from the experiences of the past. Hoare found out just how hard it was to teach and use general parallelism, so he developed BSP. All right, he did throw the baby out with the bathwater, and BSP has never taken off, but all experience is that it is easy to teach, use and implement - and very efficient if your problem fits its model. What they are proposing is FAR too complicated for use by almost all programmers. The vast majority of MPI users use only a small subset of MPI, which they can get their head around, and OpenMP in all its glory has probably never been implemented reliably enough to use - almost everyone uses a small subset. I could go on, and further, but shall remain polite. Perhaps I should point out that all of WG5 are people with a lot of experience in using, implementing and supporting Fortran for scientific and other purposes. Whether the Rice team has is less clear, given that it seems to be composed of computer scientists. Regards, Nick Maclaren.
From: Ian Bush on 12 Jul 2010 09:31 Hi Nick, On 12 July, 12:37, n...(a)cam.ac.uk wrote: > In article <d7fe8a45-0cc4-4231-8af0-6ffb690a2...(a)x27g2000yqb.googlegroups..com>, > Ian Bush <ianbush.throwaway.acco...(a)googlemail.com> wrote: > > > > > > >> > for instance, coarrays must > >> > be allocated over all images. > > >> This is intentional, and motivated by performance. =A0Team-based coarrays > >> are either very hard to implement (requiring significant system > >> changes), or will perform no better than allocatable components of > >> coarrays which we already support. =A0Perhaps performance is not a primar= > >y > >> requirement for the Rice proposal, but it was for J3. =A0The feedback fro= > >m > >> users is that coarray performance has to be competitive with MPI. =A0If > >> not, many people will not use it in their codes. =A0A lot of effort was > >> put into the Fortran 2008 spec to avoid features that forced reduced > >> performance. > > >I'm sorry, I don't get this. Why are team based co-arrays so > >difficult? Their MPI equivalent, use of multiple MPI communicators, > >are the basis of many large scale applications today, so why the > >problem? > > Er, no. Far fewer than you might think. Of the dozen MPI applications > I looked at, only two used them (and one was an MPI tester of mine). > On this matter, I should be very interested to know the MPI calls > that CRYSTAL (and DL_POLY_3 and GAMESS-UK) makes, so that I can > update my table and potentially modify my course. I have a script > for source scanning. > Here's the list for DL_POLY_3, CRYSTAL and CASTEP. I don't work on GAMESS-UK anymore and don't have the code easily to hand, but I can dig it out if you are interested, the list will be similar to CRYSTAL. VASP is another example. I don't have the code here, but I would be surprised if it is markedly different from CASTEP. DL_POLY_3: MPI_ABORT MPI_ALLGATHER MPI_ALLREDUCE MPI_ALLTOALL MPI_ALLTOALLV MPI_BARRIER MPI_BCAST MPI_COMM_DUP MPI_COMM_FREE MPI_COMM_RANK MPI_COMM_SIZE MPI_COMM_SPLIT MPI_FILE_CLOSE MPI_FILE_GET_VIEW MPI_FILE_OPEN MPI_FILE_READ_AT MPI_FILE_SET_VIEW MPI_FILE_WRITE_AT MPI_FINALIZE MPI_GATHERV MPI_GET_COUNT MPI_INIT MPI_IRECV MPI_ISEND MPI_RECV MPI_SCATTER MPI_SCATTERV MPI_SEND MPI_TYPE_COMMIT MPI_TYPE_CONTIGUOUS mpi_type_create_f90_real MPI_TYPE_FREE MPI_WAIT CRYSTAL: MPI_Abort MPI_ALLREDUCE mpi_alltoall mpi_barrier MPI_BCAST mpi_comm_free MPI_COMM_RANK MPI_COMM_SIZE MPI_COMM_SPLIT MPI_Finalize mpi_gatherv MPI_INIT mpi_irecv MPI_RECV MPI_REDUCE MPI_SEND mpi_wait CASTEP MPI_abort MPI_allgather MPI_allreduce MPI_AllToAll MPI_AllToAllV MPI_barrier MPI_bcast MPI_comm_free MPI_comm_rank MPI_comm_size MPI_comm_split MPI_finalize MPI_gather MPI_gatherv MPI_init MPI_recv MPI_scatter MPI_scatterv MPI_send Ian
From: nmm1 on 12 Jul 2010 10:46
In article <6956b18f-159d-469c-948c-eb62fc79b051(a)d16g2000yqb.googlegroups.com>, Ian Bush <ianbush.throwaway.account(a)googlemail.com> wrote: > >Here's the list for DL_POLY_3, CRYSTAL and CASTEP. I don't work on >GAMESS-UK >anymore and don't have the code easily to hand, but I can dig it out >if you are >interested, the list will be similar to CRYSTAL. VASP is another >example. I don't >have the code here, but I would be surprised if it is markedly >different from CASTEP. Thanks very much. Upon updating the table, I realise that I misspoke earlier - there were in fact FOUR applications that used multiple communicators - two used groups not MPI_Comm_split. I also seem to have had bad data for CASTEP, so my remark was a bit off-beam about the frequency of use for the major applications. Sorry about that .... The list won't post, as it is too wide, but please Email me if you want to see it. Regards, Nick Maclaren. |