Product Data

January - April 1996.

Product data in the business-to-business marketspace is very different from that in the retail marketspace. One manufacturer may produce (say) some cutting insert. It's sold by multiple distributors, using different product-ids, with prices and other data varying by distributor, buyer, region, etc. Efficiently capturing these business relationships is very involved.

Getting all this right the first time is well-nigh impossible; I started with a simple set of data structures, and incrementally improved them as my understanding of the business issues deepened with hands-on experience with clients and buyers. The final result encompassed over ten different tables in Oracle, tied together with a maze of integrity constraints. And it was still not finished!

The next step was loading product data supplied by clients. I developed a set of procedures and systems for verifying and loading data. This was used to load some 50,000 different products into the tables I designed.

Searching these multitude of products was the next issue. I designed the necessary indexing schemes and search strategies. For example, a full-table scan of text fields might be more efficient than an indexed scan because of disk access patterns. The resultant SQL statements, permitting search by manufacturer, distributor, keyword and product-id were very complex. The user was permitted to specify any combination of the available search-criteria. Rather than writing different queries for each combination, I designed the queries to be nestable, so that the output of one becomes the source of the next query. Then the queries were automatically composed at run-time, depending on the search criteria specified. This can be seen in the following query, with the nested EXISTS (SELECT ...) constructs:

SELECT client_id, alias, text FROM pages_aliases tpa, pages, text_long WHERE EXISTS (SELECT * FROM mgh.dists_mfrs_some, mgh.clients_prods tcp, mgh.products tp, mgh.prods_txt tt, mgh.prods_parchild tpc WHERE dist_id = '77586' AND mfr_id = client_id AND prod_id = tp.id AND attrib = 'descr' AND obj_id = tt.id AND LOWER(descr) LIKE '%clamp%' AND LOWER(descr) LIKE '%welding%' AND (EXISTS (SELECT * FROM mgh.dists_mfrs_all WHERE dist_id = '77586' AND client_id = mfr_id) OR EXISTS (SELECT * from mgh.dists_series, mgh.prods_parchild WHERE dist_id = '77586' AND cid = tp.id AND p_alias = page_alias) OR EXISTS (SELECT * from mgh.dists_prods WHERE dist_id = '77586' AND mgh.dists_prods.product_id = tp.id)) AND tp.id = cid AND p_alias = tpa.alias) AND tpa.page_id = pages.id AND attrib = 'tit1' AND obj_id = text_long.id ORDER BY client_id

The whole system was designed for efficiency, and could easily handle such complex queries. For example, the above query takes only about 2 seconds to search all the product data.


Rujith de Silva
Created 1997-05-13; modified 2004-10-19.