Appending one MySQL file to another without duplicates. [Windows]

Prev: using php apc functions
Next: Will look into the suggestions tonight

From: Bill Mudry on 10 May 2010 01:03

I have seen many examples of just getting rid of duplicate records
all in the same file.

However, I need to know how to append one MySQL table into another
table without
adding duplicate records into the original file accepting the data.

I ran the following script that successfully appended data from what
I called the "tervuren_target"
table into the file that drives the species level of my botanical
wood tree, named
"species_master".
...................................................................................................................................................................................
INSERT INTO species_master (genera, name, species_name, authorities,
species_source)

SELECT genera, name, species_name, authorities, species_source FROM
'tervuren_target'

WHERE tervuren_target.species_name != species_master.species_name;
...................................................................................................................................................................................

The last line was an attempt to avoid carrying over duplicates from
the Tervuren file into the species file
but an error said that it did not recognize
species_master.species_name. So, I tried putting single
quotes as 'species_master.species_name'. The program now ran without
errors and appended records
for a total of 17,685 combined records.

However, it did not take long to realize that what it had done is
interpret 'species_master.species_name' as
a string, not a column! Since none of Tervuren records would have
that string in them, it let all the records to append
including (sigh ...) 3,431 duplicate records!

To clean this up, I realize that if I just use a simple statement
with DISTINCT in it, duplicate records from the original
(and precious) records could get erased instead of the duplicate
record that was appended in from the Tervuren data.
That would be a really bad disaster! I will see what ideas you all
have but I am thinking it might be much safer to do
the append operation over again with a better script that will stop
duplicates from carrying over to the species_master
file in the first case instead of trying to clean up the merged file.
(... After all, that was what I was trying to do anyway).

On structures, here are some facts you will wish to have:

- I first made sure that the size, collation, engine and
field names used (as shown above) were identical in both the
tervuren_target table (the source of data to append) and
the species_master table.

- None of the data in the species_master file should ever be
erased by this append of new data.

- The species_master table has many more fields as the
tervuren_target table but any field for the tervuren_target
data being transferred has exact matching fields in the
species_master table. They are under the same folder.

- The comparison of which records is new is by using the
species_name field as comparison keys in both. If they are
identical in both tables for each new record being
appended, that record should not be allowed in, thereby not allowing
duplicate wood names in. (The meaning of duplicate in
this situation is therefore NOT meant from within a table but
across tables.)

I have no idea if there is a way to use 'Distinct" across tables. How
is my best way to merge in new data from the one file
to expand the records in a master file without letting duplicate data in?

Once the append without duplicates works properly, I then copy it
over as table 'species' which is used as the main source
of data in the species level of the botanical tree.
(http://www.prowebcanada.com/taxa). Expanding the number of woods I can
find and report was the original top level reason for starting this project.

Conversationally:
Once this works for me, it will better than doubled the number of
existing wood species I can report. That will be most impressive
progress for the wood knowledge base I am building. Once I have a
general algorithm working, I have many other lists from which I may
also be able extract valued new data --- far faster than I have been
able to before.

I got kind of close but not quite there. I look forward to your help.
This should be relatively basic to most intermediate and
advanced programmers :-) . I am still a junior programmer in PHP and
MySQL but more things are starting to work for me ;-) .

With thanks any and all help in advance,

Bill Mudry
Mississauga, Ontario

From: Ferenc Kovacs on 10 May 2010 01:37

On Mon, May 10, 2010 at 7:03 AM, Bill Mudry <billmudry(a)rogers.com> wrote:

> I have seen many examples of just getting rid of duplicate records all in
> the same file.
>
> However, I need to know how to append one MySQL table into another table
> without
> adding duplicate records into the original file accepting the data.
>
> I ran the following script that successfully appended data from what I
> called the "tervuren_target"
> table into the file that drives the species level of my botanical wood
> tree, named
> "species_master".
>
> ..................................................................................................................................................................................
> INSERT INTO species_master (genera, name, species_name, authorities,
> species_source)
>
> SELECT genera, name, species_name, authorities, species_source FROM
> 'tervuren_target'
>
> WHERE tervuren_target.species_name != species_master.species_name;
>
> ..................................................................................................................................................................................
>
> The last line was an attempt to avoid carrying over duplicates from the
> Tervuren file into the species file
> but an error said that it did not recognize species_master.species_name.

You didn't used the species_master table in you select query, hence the
mysql couldn't understand it.

> So, I tried putting single
> quotes as 'species_master.species_name'. The program now ran without errors
> and appended records
> for a total of 17,685 combined records.
>

If you put it into quotes, the it will used as a string, not a reference to
a field.
Please correct me if I'm wrong:
You have this tervuren_target and you have to copy every record which
doesn't exists in the species_master

INSERT INTO species_master (genera, name, species_name, authorities,
species_source)

SELECT DISTINCT genera, name, species_name, authorities, species_source FROM
'tervuren_target'
LEFT JOIN species_master
ON
tervuren_target.species_name = species_master.species_name
WHERE species_master.species_name IS NULL

Basically: we select all of the records from tervuren_target, link with each
record from tervuren_target to species_master through the species_name, we
select only the records, where this link is not exists
(species_master.species_name IS NULL, so we don't have records with this
species_name), to be sure, I added a DISTINCT.
Maybe you have to tweak the query a littbe bit, because mysql is a little
bit tricky, it doesnt allow by default to insert into a row, which is used
in the same statement as source, but with table alias you can workaround
that, or you can use a third table.

Tyrael

From: Niel Archer on 10 May 2010 02:03

Hi

As this is not a PHP question and in particular not Windows' related
PHP question, I'll keep my reply brief.

If you had a unique index defined, duplicates wouldn't be allowed. Then
you would only have to add IGNORE to the INSERT to prevent warnings
being generated.
http://dev.mysql.com/doc/refman/5.1/en/insert.html

If that is not possible, then a join query to identify the non-duplicate
rows.
http://dev.mysql.com/doc/refman/5.1/en/join.html

> I have seen many examples of just getting rid of duplicate records
> all in the same file.
>
> However, I need to know how to append one MySQL table into another
> table without
> adding duplicate records into the original file accepting the data.
>
> I ran the following script that successfully appended data from what
> I called the "tervuren_target"
> table into the file that drives the species level of my botanical
> wood tree, named
> "species_master".
> ..................................................................................................................................................................................
> INSERT INTO species_master (genera, name, species_name, authorities,
> species_source)
>
> SELECT genera, name, species_name, authorities, species_source FROM
> 'tervuren_target'
>
> WHERE tervuren_target.species_name != species_master.species_name;
> ..................................................................................................................................................................................
>
> The last line was an attempt to avoid carrying over duplicates from
> the Tervuren file into the species file
> but an error said that it did not recognize
> species_master.species_name. So, I tried putting single
> quotes as 'species_master.species_name'. The program now ran without
> errors and appended records
> for a total of 17,685 combined records.
>
> However, it did not take long to realize that what it had done is
> interpret 'species_master.species_name' as
> a string, not a column! Since none of Tervuren records would have
> that string in them, it let all the records to append
> including (sigh ...) 3,431 duplicate records!
>
> To clean this up, I realize that if I just use a simple statement
> with DISTINCT in it, duplicate records from the original
> (and precious) records could get erased instead of the duplicate
> record that was appended in from the Tervuren data.
> That would be a really bad disaster! I will see what ideas you all
> have but I am thinking it might be much safer to do
> the append operation over again with a better script that will stop
> duplicates from carrying over to the species_master
> file in the first case instead of trying to clean up the merged file.
> (... After all, that was what I was trying to do anyway).
>
> On structures, here are some facts you will wish to have:
>
> - I first made sure that the size, collation, engine and
> field names used (as shown above) were identical in both the
> tervuren_target table (the source of data to append) and
> the species_master table.
>
> - None of the data in the species_master file should ever be
> erased by this append of new data.
>
> - The species_master table has many more fields as the
> tervuren_target table but any field for the tervuren_target
> data being transferred has exact matching fields in the
> species_master table. They are under the same folder.
>
> - The comparison of which records is new is by using the
> species_name field as comparison keys in both. If they are
> identical in both tables for each new record being
> appended, that record should not be allowed in, thereby not allowing
> duplicate wood names in. (The meaning of duplicate in
> this situation is therefore NOT meant from within a table but
> across tables.)
>
> I have no idea if there is a way to use 'Distinct" across tables. How
> is my best way to merge in new data from the one file
> to expand the records in a master file without letting duplicate data in?
>
> Once the append without duplicates works properly, I then copy it
> over as table 'species' which is used as the main source
> of data in the species level of the botanical tree.
> (http://www.prowebcanada.com/taxa). Expanding the number of woods I can
> find and report was the original top level reason for starting this project.
>
> Conversationally:
> Once this works for me, it will better than doubled the number of
> existing wood species I can report. That will be most impressive
> progress for the wood knowledge base I am building. Once I have a
> general algorithm working, I have many other lists from which I may
> also be able extract valued new data --- far faster than I have been
> able to before.
>
> I got kind of close but not quite there. I look forward to your help.
> This should be relatively basic to most intermediate and
> advanced programmers :-) . I am still a junior programmer in PHP and
> MySQL but more things are starting to work for me ;-) .
>
>
> With thanks any and all help in advance,
>
> Bill Mudry
> Mississauga, Ontario
>

--
Niel Archer

From: Ferenc Kovacs on 10 May 2010 03:04

On Mon, May 10, 2010 at 8:03 AM, Niel Archer <not(a)chance.now> wrote:

> Hi
>
> As this is not a PHP question and in particular not Windows' related
> PHP question, I'll keep my reply brief.
>
> If you had a unique index defined, duplicates wouldn't be allowed. Then
> you would only have to add IGNORE to the INSERT to prevent warnings
> being generated.
> http://dev.mysql.com/doc/refman/5.1/en/insert.html
>
>
I would suggest INSERT ... ON DUPLICATE UPDATE field = field instead of
IGNORE(because IGNORE can suppress other errors), but you are right about
that the unique constraint.

Tyrael

From: Bill Mudry on 10 May 2010 15:55

On Mon, May 10, 2010 at 7:03 AM, Bill Mudry
<<mailto:billmudry(a)rogers.com>billmudry(a)rogers.com> wrote:
I have seen many examples of just getting rid of
duplicate records all in the same file.

However, I need to know how to append one MySQL
table into another table without
adding duplicate records into the original file accepting the data.

I ran the following script that successfully
appended data from what I called the "tervuren_target"
table into the file that drives the species level
of my botanical wood tree, named
"species_master".
.....................................................................................................................................................................................
INSERT INTO species_master (genera, name,
species_name, authorities, species_source)

SELECT genera, name, species_name, authorities,
species_source FROM 'tervuren_target'

WHERE tervuren_target.species_name != species_master.species_name;
.....................................................................................................................................................................................

The last line was an attempt to avoid carrying
over duplicates from the Tervuren file into the species file
but an error said that it did not recognize species_master.species_name.

You didn't used the species_master table in you
select query, hence the mysql couldn't understand it.
Â
So, I tried putting single
quotes as 'species_master.species_name'. The
program now ran without errors and appended records
for a total of 17,685 combined records.

If you put it into quotes, the it will used as a
string, not a reference to a field.
Please correct me if I'm wrong:
You have this tervuren_target and you have to
copy every record which doesn't exists in the species_master

INSERT INTO species_master (genera, name,
species_name, authorities, species_source)

SELECT DISTINCT genera, name, species_name,
authorities, species_source FROM 'tervuren_target'
LEFT JOIN species_master
ON
tervuren_target.species_nameÂ = species_master.species_name
WHERE species_master.species_name IS NULL

Basically: we select all of the records from
tervuren_target, link with each record from
tervuren_target to species_master through the
species_name, we select only the records, where
this link is not exists
(species_master.species_name IS NULL, so we don't
have records with this species_name), to be sure, I added a DISTINCT.
Maybe you have to tweak the query a littbe bit,
because mysql is a little bit tricky, it doesnt
allow by default to insert into a row, which is
used in Â the same statement as source, but with
table alias you can workaround that, or you can use a third table.

Tyrael

| Next | Last
Pages: 1 2
Prev: using php apc functions
Next: Will look into the suggestions tonight