From: Bill Mudry on 10 May 2010 01:03 I have seen many examples of just getting rid of duplicate records all in the same file. However, I need to know how to append one MySQL table into another table without adding duplicate records into the original file accepting the data. I ran the following script that successfully appended data from what I called the "tervuren_target" table into the file that drives the species level of my botanical wood tree, named "species_master". ................................................................................................................................................................................... INSERT INTO species_master (genera, name, species_name, authorities, species_source) SELECT genera, name, species_name, authorities, species_source FROM 'tervuren_target' WHERE tervuren_target.species_name != species_master.species_name; ................................................................................................................................................................................... The last line was an attempt to avoid carrying over duplicates from the Tervuren file into the species file but an error said that it did not recognize species_master.species_name. So, I tried putting single quotes as 'species_master.species_name'. The program now ran without errors and appended records for a total of 17,685 combined records. However, it did not take long to realize that what it had done is interpret 'species_master.species_name' as a string, not a column! Since none of Tervuren records would have that string in them, it let all the records to append including (sigh ...) 3,431 duplicate records! To clean this up, I realize that if I just use a simple statement with DISTINCT in it, duplicate records from the original (and precious) records could get erased instead of the duplicate record that was appended in from the Tervuren data. That would be a really bad disaster! I will see what ideas you all have but I am thinking it might be much safer to do the append operation over again with a better script that will stop duplicates from carrying over to the species_master file in the first case instead of trying to clean up the merged file. (... After all, that was what I was trying to do anyway). On structures, here are some facts you will wish to have: - I first made sure that the size, collation, engine and field names used (as shown above) were identical in both the tervuren_target table (the source of data to append) and the species_master table. - None of the data in the species_master file should ever be erased by this append of new data. - The species_master table has many more fields as the tervuren_target table but any field for the tervuren_target data being transferred has exact matching fields in the species_master table. They are under the same folder. - The comparison of which records is new is by using the species_name field as comparison keys in both. If they are identical in both tables for each new record being appended, that record should not be allowed in, thereby not allowing duplicate wood names in. (The meaning of duplicate in this situation is therefore NOT meant from within a table but across tables.) I have no idea if there is a way to use 'Distinct" across tables. How is my best way to merge in new data from the one file to expand the records in a master file without letting duplicate data in? Once the append without duplicates works properly, I then copy it over as table 'species' which is used as the main source of data in the species level of the botanical tree. (http://www.prowebcanada.com/taxa). Expanding the number of woods I can find and report was the original top level reason for starting this project. Conversationally: Once this works for me, it will better than doubled the number of existing wood species I can report. That will be most impressive progress for the wood knowledge base I am building. Once I have a general algorithm working, I have many other lists from which I may also be able extract valued new data --- far faster than I have been able to before. I got kind of close but not quite there. I look forward to your help. This should be relatively basic to most intermediate and advanced programmers :-) . I am still a junior programmer in PHP and MySQL but more things are starting to work for me ;-) . With thanks any and all help in advance, Bill Mudry Mississauga, Ontario
From: Ferenc Kovacs on 10 May 2010 01:37 On Mon, May 10, 2010 at 7:03 AM, Bill Mudry <billmudry(a)rogers.com> wrote: > I have seen many examples of just getting rid of duplicate records all in > the same file. > > However, I need to know how to append one MySQL table into another table > without > adding duplicate records into the original file accepting the data. > > I ran the following script that successfully appended data from what I > called the "tervuren_target" > table into the file that drives the species level of my botanical wood > tree, named > "species_master". > > .................................................................................................................................................................................. > INSERT INTO species_master (genera, name, species_name, authorities, > species_source) > > SELECT genera, name, species_name, authorities, species_source FROM > 'tervuren_target' > > WHERE tervuren_target.species_name != species_master.species_name; > > .................................................................................................................................................................................. > > The last line was an attempt to avoid carrying over duplicates from the > Tervuren file into the species file > but an error said that it did not recognize species_master.species_name. You didn't used the species_master table in you select query, hence the mysql couldn't understand it. > So, I tried putting single > quotes as 'species_master.species_name'. The program now ran without errors > and appended records > for a total of 17,685 combined records. > If you put it into quotes, the it will used as a string, not a reference to a field. Please correct me if I'm wrong: You have this tervuren_target and you have to copy every record which doesn't exists in the species_master INSERT INTO species_master (genera, name, species_name, authorities, species_source) SELECT DISTINCT genera, name, species_name, authorities, species_source FROM 'tervuren_target' LEFT JOIN species_master ON tervuren_target.species_name = species_master.species_name WHERE species_master.species_name IS NULL Basically: we select all of the records from tervuren_target, link with each record from tervuren_target to species_master through the species_name, we select only the records, where this link is not exists (species_master.species_name IS NULL, so we don't have records with this species_name), to be sure, I added a DISTINCT. Maybe you have to tweak the query a littbe bit, because mysql is a little bit tricky, it doesnt allow by default to insert into a row, which is used in the same statement as source, but with table alias you can workaround that, or you can use a third table. Tyrael
From: Niel Archer on 10 May 2010 02:03 Hi As this is not a PHP question and in particular not Windows' related PHP question, I'll keep my reply brief. If you had a unique index defined, duplicates wouldn't be allowed. Then you would only have to add IGNORE to the INSERT to prevent warnings being generated. http://dev.mysql.com/doc/refman/5.1/en/insert.html If that is not possible, then a join query to identify the non-duplicate rows. http://dev.mysql.com/doc/refman/5.1/en/join.html > I have seen many examples of just getting rid of duplicate records > all in the same file. > > However, I need to know how to append one MySQL table into another > table without > adding duplicate records into the original file accepting the data. > > I ran the following script that successfully appended data from what > I called the "tervuren_target" > table into the file that drives the species level of my botanical > wood tree, named > "species_master". > .................................................................................................................................................................................. > INSERT INTO species_master (genera, name, species_name, authorities, > species_source) > > SELECT genera, name, species_name, authorities, species_source FROM > 'tervuren_target' > > WHERE tervuren_target.species_name != species_master.species_name; > .................................................................................................................................................................................. > > The last line was an attempt to avoid carrying over duplicates from > the Tervuren file into the species file > but an error said that it did not recognize > species_master.species_name. So, I tried putting single > quotes as 'species_master.species_name'. The program now ran without > errors and appended records > for a total of 17,685 combined records. > > However, it did not take long to realize that what it had done is > interpret 'species_master.species_name' as > a string, not a column! Since none of Tervuren records would have > that string in them, it let all the records to append > including (sigh ...) 3,431 duplicate records! > > To clean this up, I realize that if I just use a simple statement > with DISTINCT in it, duplicate records from the original > (and precious) records could get erased instead of the duplicate > record that was appended in from the Tervuren data. > That would be a really bad disaster! I will see what ideas you all > have but I am thinking it might be much safer to do > the append operation over again with a better script that will stop > duplicates from carrying over to the species_master > file in the first case instead of trying to clean up the merged file. > (... After all, that was what I was trying to do anyway). > > On structures, here are some facts you will wish to have: > > - I first made sure that the size, collation, engine and > field names used (as shown above) were identical in both the > tervuren_target table (the source of data to append) and > the species_master table. > > - None of the data in the species_master file should ever be > erased by this append of new data. > > - The species_master table has many more fields as the > tervuren_target table but any field for the tervuren_target > data being transferred has exact matching fields in the > species_master table. They are under the same folder. > > - The comparison of which records is new is by using the > species_name field as comparison keys in both. If they are > identical in both tables for each new record being > appended, that record should not be allowed in, thereby not allowing > duplicate wood names in. (The meaning of duplicate in > this situation is therefore NOT meant from within a table but > across tables.) > > I have no idea if there is a way to use 'Distinct" across tables. How > is my best way to merge in new data from the one file > to expand the records in a master file without letting duplicate data in? > > Once the append without duplicates works properly, I then copy it > over as table 'species' which is used as the main source > of data in the species level of the botanical tree. > (http://www.prowebcanada.com/taxa). Expanding the number of woods I can > find and report was the original top level reason for starting this project. > > Conversationally: > Once this works for me, it will better than doubled the number of > existing wood species I can report. That will be most impressive > progress for the wood knowledge base I am building. Once I have a > general algorithm working, I have many other lists from which I may > also be able extract valued new data --- far faster than I have been > able to before. > > I got kind of close but not quite there. I look forward to your help. > This should be relatively basic to most intermediate and > advanced programmers :-) . I am still a junior programmer in PHP and > MySQL but more things are starting to work for me ;-) . > > > With thanks any and all help in advance, > > Bill Mudry > Mississauga, Ontario > -- Niel Archer
From: Ferenc Kovacs on 10 May 2010 03:04 On Mon, May 10, 2010 at 8:03 AM, Niel Archer <not(a)chance.now> wrote: > Hi > > As this is not a PHP question and in particular not Windows' related > PHP question, I'll keep my reply brief. > > If you had a unique index defined, duplicates wouldn't be allowed. Then > you would only have to add IGNORE to the INSERT to prevent warnings > being generated. > http://dev.mysql.com/doc/refman/5.1/en/insert.html > > I would suggest INSERT ... ON DUPLICATE UPDATE field = field instead of IGNORE(because IGNORE can suppress other errors), but you are right about that the unique constraint. Tyrael
From: Bill Mudry on 10 May 2010 15:55
On Mon, May 10, 2010 at 7:03 AM, Bill Mudry <<mailto:billmudry(a)rogers.com>billmudry(a)rogers.com> wrote: I have seen many examples of just getting rid of duplicate records all in the same file. However, I need to know how to append one MySQL table into another table without adding duplicate records into the original file accepting the data. I ran the following script that successfully appended data from what I called the "tervuren_target" table into the file that drives the species level of my botanical wood tree, named "species_master". ..................................................................................................................................................................................... INSERT INTO species_master (genera, name, species_name, authorities, species_source) SELECT genera, name, species_name, authorities, species_source FROM 'tervuren_target' WHERE tervuren_target.species_name != species_master.species_name; ..................................................................................................................................................................................... The last line was an attempt to avoid carrying over duplicates from the Tervuren file into the species file but an error said that it did not recognize species_master.species_name. You didn't used the species_master table in you select query, hence the mysql couldn't understand it.  So, I tried putting single quotes as 'species_master.species_name'. The program now ran without errors and appended records for a total of 17,685 combined records. If you put it into quotes, the it will used as a string, not a reference to a field. Please correct me if I'm wrong: You have this tervuren_target and you have to copy every record which doesn't exists in the species_master INSERT INTO species_master (genera, name, species_name, authorities, species_source) SELECT DISTINCT genera, name, species_name, authorities, species_source FROM 'tervuren_target' LEFT JOIN species_master ON tervuren_target.species_name = species_master.species_name WHERE species_master.species_name IS NULL Basically: we select all of the records from tervuren_target, link with each record from tervuren_target to species_master through the species_name, we select only the records, where this link is not exists (species_master.species_name IS NULL, so we don't have records with this species_name), to be sure, I added a DISTINCT. Maybe you have to tweak the query a littbe bit, because mysql is a little bit tricky, it doesnt allow by default to insert into a row, which is used in  the same statement as source, but with table alias you can workaround that, or you can use a third table. Tyrael |