Prev: "HUH???" and/or "wiki broken, VERY broken?" was [Re: Plotting3D surfaces]
Next: How to send a virtual event from C++
From: cattaghia on 11 Apr 2010 01:18 I got myself in a big curiosity and only the Tcl masters could give me the answers. PROBLEM: The program needs to store in memory a variable-size collection of dicts, which can themselves have variable sizes, in spite of having always the same set of keys (the data in each key can be a string, or a nested dict, or a list, whatever). This collection is manipulated by access procedures, which by their turn must be able to search and gather a number of these dicts according to different criteria. For example, if these dicts have keys like "name" and "salary", among others, the access procedures must be able to search those dicts by the values found in any of these keys. Preferably, the solution would be based in pure and "vanilla" Tcl (i.e., no Tcllibs and alikes). QUESTION: According to your experience, what would be the best way to organize this collection, in terms of speed in access and in easiness to implement the access routines? Supposing the employees data previously mentioned, I imagined a number of possible combinations of data types and Tcl facilities: 1) Having a big dict in which the primary keys would be the employees names, then subdicts with the rest of data. I presume this would make access harder and slow if I need to retrieve a number of dicts by one of the inner dicts keys -- for example, by the "city" key. 2) Having a central dict in which each key would be an index and would have as value a subdict containing all of each employee's data. Then arrays named by search criteria and value -- like "City(London)", "City(Rio)", or "Gender(Female)" -- would hold secondary lists containing the indexes to subdicts in the central dict which match the given criteria. It seems complex but fast. 3) Have the dicts containing each employee's data stored in different dicts, nested according to different values for certain keys. For example, a dict in which the "City" value would be the primary key, other dict in which the "Gender" would be the primary key, and so on as needed. I presume this could allow fast searches, but seems quite inefficient and a waste of memory. So, what do you suggest?? Thanks and regards! Fabricio Rocha Brasilia, Brasil
From: Andrew Mangogna on 11 Apr 2010 16:35
cattaghia wrote: > I got myself in a big curiosity and only the Tcl masters could give me > the answers. > > PROBLEM: The program needs to store in memory a variable-size > collection of dicts, which can themselves have variable sizes, in > spite of having always the same set of keys (the data in each key can > be a string, or a nested dict, or a list, whatever). This collection > is manipulated by access procedures, which by their turn must be able > to search and gather a number of these dicts according to different > criteria. For example, if these dicts have keys like "name" and > "salary", among others, the access procedures must be able to search > those dicts by the values found in any of these keys. Preferably, the > solution would be based in pure and "vanilla" Tcl (i.e., no Tcllibs > and alikes). > > QUESTION: According to your experience, what would be the best way to > organize this collection, in terms of speed in access and in easiness > to implement the access routines? Supposing the employees data > previously mentioned, I imagined a number of possible combinations of > data types and Tcl facilities: > > 1) Having a big dict in which the primary keys would be the employees > names, then subdicts with the rest of data. I presume this would make > access harder and slow if I need to retrieve a number of dicts by one > of the inner dicts keys -- for example, by the "city" key. > > 2) Having a central dict in which each key would be an index and would > have as value a subdict containing all of each employee's data. Then > arrays named by search criteria and value -- like "City(London)", > "City(Rio)", or "Gender(Female)" -- would hold secondary lists > containing the indexes to subdicts in the central dict which match the > given criteria. It seems complex but fast. > > 3) Have the dicts containing each employee's data stored in different > dicts, nested according to different values for certain keys. For > example, a dict in which the "City" value would be the primary key, > other dict in which the "Gender" would be the primary key, and so on > as needed. I presume this could allow fast searches, but seems quite > inefficient and a waste of memory. > > So, what do you suggest?? > > Thanks and regards! > > Fabricio Rocha > Brasilia, Brasil The best way, IMHO, to organize any complicated data set is as a normalized relational schema. In this way you may use some relational management scheme in which case there are no access routines to write (or at least they are very shallow wrappers) and you will automatically support any ad hoc query that the organization of the data set allows. Alternatively, if the application has a fixed set of operations on the data set that are known in advance, those operations can be analyzed and a conventional data structuring can be derived to support those particular operations. It will depend very much on the precise nature of the schema and operations performed (which were not entirely clear to me from your question). If you are intent that the solution be "pure and vanilla Tcl" then the first alternative is probably not open to you. However, TclRAL and SQLite both provide a full complement of relational data structuring capabilities and the requisite relational algebra for the query operations on any data set. TclRAL provides this in the form of Tcl commands and SQLite does so by providing a means to execute SQL language statements. Both, however, require a binary extension to Tcl. Otherwise, it seems that you are favoring an implementation based on dictionaries, which when nested provide a convenient organization for things that are related in a one to many fashion, but can be less suitable when other operations are required. Andrew Mangogna |