Duplicate Entries

Dawn85's profile image Dawn85 posted 2 years ago in General Permalink

Hello All, I am very new to HeidiSQL and am still fumbling my way around. I wrote the below Query but was wondering if there was a way to remove duplicate lines from the data based on id_workorder (WO#). Basically if there is a duplicate WO# remove the duplicate row (keeping the first occurance) so I only get unique id_workorder (WO#) numbers - see attached picture for desired result.

Thank you to anyone who can help!!

SELECT
t1.id_workorder AS `WO#`,
CAST(IFNULL(FROM_UNIXTIME(IF(t1.date_quoted=0, NULL,t1.date_quoted), '%Y-%m-%d'), 0)AS DATE) AS `Date Quoted`,
(SELECT `name` FROM crm_customer WHERE id_primary = t1.id_customer) AS `Customer Name`,
t1.labour_total,
t1.labour_pct_discount,
t1.labour_sub_discount,
t1.labour_special_pct_discount,
t1.labour_special_sub_discount,
t1.labour_sub_total,
t1.parts_total,
t1.parts_pct_discount,
t1.parts_sub_discount,
t1.parts_sub_total,
t1.charge_unit_total,
t1.charge_special_total,
t1.charge_freight_total,
t1.tax_total,
t1.grand_total,
t2.id_type
FROM
winnipeg.erp_quoting AS t1

JOIN
  erp_rework AS t2  
ON
  t1.id_workorder = t2.id_workorder

WHERE
  t1.date_quoted > UNIX_TIMESTAMP() - ((86400 * 30) * 30)

ORDER BY
  t1.date_quoted DESC`
1 attachment(s):
  • HeidiSQL
ansgar's profile image ansgar posted 2 years ago Permalink

I guess the table erp_rework table has the unwanted duplicates, right?

Assuming erp_rework has some more columns than id_workorder and id_type, you may be able to separate the unwanted rows by some value, for example by a creation date. If yes, I would look at its values and fire something like that:

DELETE FROM erp_rework WHERE created >= '2022-01-22'
Dawn85's profile image Dawn85 posted 2 years ago Permalink

I guess the table erp_rework table has the unwanted duplicates, right?

Assuming erp_rework has some more columns than id_workorder and id_type, you may be able to separate the unwanted rows by some value, for example by a creation date. If yes, I would look at its values and fire something like that:

DELETE FROM erp_rework WHERE created >= '2022-01-22'

I apologize but I am very much a novice in SQL. I believe it is the table erp_quoting that has the duplicates. Multiple quotes can be created for 1 workorder however only the most recent quote created is the one I want. If I do not convert the dates you can see there are differences in the date_quoted data or its the same if a person made multiple quotes very close together (see attached picture).

What I want is only the most recent one quoted. Is there a way to do this?

1 attachment(s):
  • HeidiSQL-2
ansgar's profile image ansgar posted 2 years ago Permalink

Ah, makes sense.

I have no idea how to do that in a single query. But what I sometimes do in such situations is to generate a batch of DELETE queries, copy these with a simicolon between them and run them separately.

So, at first you generate a batch of commands:

SELECT CONCAT('DELETE FROM erp_quoting WHERE id_workorder=', id_workorder, ' AND date_quoted!=', MAX(date_quoted))
FROM erp_quoting
GROUP BY id_workorder
ORDER BY date_quoted DESC

This will give you the rows with the duplicated ids and their highest date_quoted values (which you want to keep):

cmd
DELETE FROM erp_quoting WHERE id_workorder=115978 AND date_quoted!=123
DELETE FROM erp_quoting WHERE id_workorder=116293 AND date_quoted!=13

Now, rightclick the result (in HeidiSQL of course :), then click "Export grid rows", and copy the result as delimited text:

Description

Click OK and paste your clipboard into a new query tab in HeidiSQL. Then simply run the batch with F9.

This approach is quite hacky, and not suited if your tables contain millions of rows.

ansgar's profile image ansgar posted 2 years ago Permalink

@Dawn85, did that help you in some way?

Please login to leave a reply, or register at first.