Product data in the business-to-business marketspace is very different from that in the retail marketspace. One manufacturer may produce (say) some cutting insert. It's sold by multiple distributors, using different product-ids, with prices and other data varying by distributor, buyer, region, etc. Efficiently capturing these business relationships is very involved.
Getting all this right the first time is well-nigh impossible; I started with a simple set of data structures, and incrementally improved them as my understanding of the business issues deepened with hands-on experience with clients and buyers. The final result encompassed over ten different tables in Oracle, tied together with a maze of integrity constraints. And it was still not finished!
The next step was loading product data supplied by clients. I developed a set of procedures and systems for verifying and loading data. This was used to load some 50,000 different products into the tables I designed.
Searching these multitude of products was the next issue. I
designed the necessary indexing schemes and search strategies. For
example, a full-table scan of text fields might be more efficient than
an indexed scan because of disk access patterns. The resultant SQL
statements, permitting search by manufacturer, distributor, keyword
and product-id were very complex. The user was permitted to specify
any combination of the available search-criteria. Rather than writing
different queries for each combination, I designed the queries to be
nestable, so that the output of one becomes the source of the next
query. Then the queries were automatically composed at run-time,
depending on the search criteria specified. This can be seen in the
following query, with the nested EXISTS (SELECT ...)
constructs:
SELECT client_id, alias, text FROM pages_aliases tpa, pages, text_long
WHERE EXISTS (SELECT * FROM mgh.dists_mfrs_some, mgh.clients_prods tcp,
mgh.products tp, mgh.prods_txt tt, mgh.prods_parchild tpc WHERE dist_id
= '77586' AND mfr_id = client_id AND prod_id = tp.id AND attrib =
'descr' AND obj_id = tt.id AND LOWER(descr) LIKE '%clamp%' AND
LOWER(descr) LIKE '%welding%' AND (EXISTS (SELECT * FROM
mgh.dists_mfrs_all WHERE dist_id = '77586' AND client_id = mfr_id) OR
EXISTS (SELECT * from mgh.dists_series, mgh.prods_parchild WHERE
dist_id = '77586' AND cid = tp.id AND p_alias = page_alias) OR EXISTS
(SELECT * from mgh.dists_prods WHERE dist_id = '77586' AND
mgh.dists_prods.product_id = tp.id)) AND tp.id = cid AND p_alias =
tpa.alias) AND tpa.page_id = pages.id AND attrib = 'tit1' AND obj_id =
text_long.id ORDER BY client_id
The whole system was designed for efficiency, and could easily handle such complex queries. For example, the above query takes only about 2 seconds to search all the product data.