Our system is facing performance issues selecting rows out of a 38 million rows table.
This table with 38 million rows stores information from clients/suppliers etc. These appear across many other tables, such as Invoices.
The main problem is that our database is far from normalized. The Clients_Suppliers table has a composite key made of 3 columns, the Code - varchar2(16), Category - char(2) and the last one is up_date, a date. Every change in one client's address is stored in that same table with a new date. So we can have records such as this:
code ca up_date
---------------- -- --------
1234567890123456 CL 01/01/09
1234567890123456 CL 01/01/10
1234567890123456 CL 01/01/11
1234567890123456 CL 01/01/12
6543210987654321 SU 01/01/10
6543210987654321 SU 08/03/11
Worst, in every table that uses a client's information, instead of the full composite key, only the code and category is stored. Invoices, for instance, has its own keys, including the emission date. So we can have something like this:
invoice_no serial_no emission code ca
---------- --------- -------- ---------------- --
1234567890 12345 05/02/12 1234567890123456 CL
My specific problem is that I have to generate a list of clients for which invoices where created in a given period. Since I have to get the most recent info from the clients, I have to use max(up_date).
So here's my query (in Oracle):
SELECT
CL.CODE,
CL.CATEGORY,
-- other address fields
FROM
CLIENTS_SUPPLIERS CL
INVOICES I
WHERE
CL.CODE = I.CODE AND
CL.CATEGORY = I.CATEGORY AND
CL.UP_DATE =
(SELECT
MAX(CL2.UP_DATE)
FROM
CLIENTS_SUPPLIERS CL2
WHERE
CL2.CODE = I.CODE AND
CL2.CATEGORY = I.CATEGORY AND
CL2.UP_DATE <= I.EMISSION
) AND
I.EMISSION BETWEEN DATE1 AND DATE2
It takes up to seven hours to select 178,000 rows. Invoices has 300,000 rows between DATE1 and DATE2.
It's a (very, very, very) bad design, and I've raised the fact that we should improve it, by normalizing the tables. That would involve creating a table for clients with a new int primary key for each pair of code/category and another one for Adresses (with the client primary key as a foreign key), then use the Adresses' primary key in each table that relates to clients.
But it would mean changing the whole system, so my suggestion has been shunned. I need to find a different way of improving performance (apparently using only SQL).
I've tried indexes, views, temporary tables but none have had any significant improvement on performance. I'm out of ideas, does anyone have a solution for this?
Thanks in advance!