Using XML documents to develop and integrate applications has many advantages, some of which have been touted many times. One that is probably underpromoted is XML's ability to represent a business event. Let's look at a couple other parts of the software design process and how they would represent those business events. Specifically, both object-oriented programming and relational database design are parts of the development process that allows us to express things conceptually in the process of building useful software. I agree with the utility of both approaches for many problems. But I think that XML is an even better way to do this for particular classes of problems. One important example is representing the full truth of a particular business event that may be relevant to more than one application or participant.
What is a Business Event?
What I mean by this is a given event in the real world where other abstractions that we are programming come together. They generally take place at a specific time, and at a specific place. There is usually some qualifier of the event, either a quantity, or a rating, or a price. Something that qualifies the event itself. There is at least one, but often several actors in the event. To ground this in language descriptions, you can think of it as the subject, a subject and object, or the subject and many objects involved in the event.
What are example business events? They include classic ones such as retail purchases. This will have objects such as Customer, Product, Store, Manufacturer and the Date and Time of Sale. There will be attributes of the event itself such as the price, the quantity, and the total sale. Another event is "Employee Hired". This will have the employee, their manager, the division in the company and the many other company applications, databases and functions that need to work on this information
Modeling Events as Objects
So we have just described many of the necessary participants in the event as "objects". Most programming languages used today have some object-oriented heritage. So is object-oriented modeling the best way to do all of the design necessary to build the functionality around a business event, or perform integration of systems around a business event?
Let's look at the previous example of a retail purchase. We already said that there are several objects there: customers, products, stores, manufacturers and so on. To represent those objects we think object-oriented design is a good thing. Lay out the properties of those objects, hide the data, and provide the appropriate methods to access the data and change the state of those objects. As we build the system to handle this business event, having classes available which give us operations on these objects will be useful. But should we use OOP to model the event itself. Well if the event is simple enough (much simpler than a retail purchase), we may implement the event as a method on one of the actor as just make changes to all of the necessary associated objects inside the code for this method. This doesn't sound very good from a design standpoint though. All of the interesting stuff about what happens with an event is happening inside the nasty method code of one of the objects.
OK, so how about we just design a class for the event itself? Let's say we call it the RetailPurchase class. It may be quite simple as far as methods: perhaps just a "DoThePurchase" and a "VoidThePurchase" (if we want to get fancy maybe we implement a "ReturnThePurchase" method. But wait, we still have the problem of all the interesting stuff about how different actors or object participate in the transaction being "hidden" inside the method code. There is a lot of interesting information in how different objects participate, what restrictions they may have and what side effects there may be on them (the Store has less inventory, the Customer has less money) that is all just "hidden" inside the code. The question is: is the whole OOP paradigm helpful for describing how different objects interoperate simultaneously in a complex transaction or real world event. Is all of the interaction that takes place just "implementation detail" to be stuff into the innards of a (possibly single) method on some rather artificially created object that represents the event. I don't think so - so let's look at some other alternatives.
Modeling Events in Relational Databases
It may seem hard to believe but relational databases were once considered to be a conceptual layer on top underlying database implementations. And SQL was considered an end-user query language! While we would probably scoff at these propositions today, at least for modeling what I am calling business events, relational database design actually has some merits.
Usually business events have a very clean mapping to a single relational database table. The table will typically have several columns with IDs of the objects that participate in the event. It will have at least one, but often several, columns with values and other descriptive information about the event itself. Based on the columns with IDs in them, it will logically join to several other tables with information about those participating objects. There will typically be foreign key constraints enforcing the semantics of those logical joins.
But the design process that I need to go through to reflect all of the information in the event is somewhat disjointed. I need to:
- design my tables for all of my constituent objects, including identifying all of the necessary columns as datatypes
- define primary key (uniqueness) constraints and put indexes on those columns and potentially others used for query
- design the table reflecting the event itself
- put foreign key constraints on the "event table" that enforce whatever restrictions are present
- (usually) design views that do the necessary joins with other tables to be able to present an easily queryable view of the important information about constituent objects involved in the event (such as customer or product names for example)
While certainly a cleaner mapping to what is happening with a business event than trying to make those events into objects, this is a somewhat fragmented process. The "truth" about the structure an whole event is spread out amongst multiple tables, with multiple ways of relating those tables, and multiple sources of information (tables, SQL queries, constraints, views). Also, once we want to look at a particular instance of an event, the only way to do it is to do a rather complicated query against multiple tables.
The reason why relational databases took off so well for representing "business event" information is that it allows a much more conceptual design process that maps pretty well the structure of real world transactions and happening. And, for those developers who have to build apps that manage the full scope of the business event (their apps are "in charge" of the transaction) it provides some automatic capabilities with the appropriate software. At runtime, relational databases provide great facilities to quickly update the information about the business event. Later when we want to query on various aspects of the business event, relational databases offer great facilities to both update the database quickly, retrieve the state of an individual transaction, find transactions based on criteria of participant objects in the transaction, or check the current state of participant objects. If I am the designer of an application with sole and primary responsibility for managing the processing and storage of a particular business event, the relational database design approach to modeling the event was worthwhile. It maps cleaner to the business event and its associated objects. Plus, coupled with a modern RDBMS, it gives me valuable capabilities at runtime that I would have to implement anyway. I care about each transaction and the full scope of it, each aspect of the event and each aspect of the participant in the event.
However, as soon you make the assumption that there are business events that another application or party only has to know certain aspects of, does not need to know about every instance of, or is not "in charge of", relational databases are not necessarily the best way to agree on a conceptual design of the transaction, or even to communicate information about the transaction at runtime.
Modeling Events in XML
XML has the same ability to reference other separately defined objects that was so helpful in relational databases. The schema for an XML document as a whole that represents an entire event can incorporate constituent subelements that represent the object that participates in the event, with references and thus reuse of the schemas for those subelements.
More importantly, the entire description of what represents a business event is much more self-contained. Developers are other applications are following lines around a database schema to figure out where all of the information relevant to a business event is kept. Its all there in the XML document. The price is some redundancy of data storage that might be unacceptable for data stored in volume. But for communicating what happened in individual instances its a very small price to pay. To see the truth of an individual instance of a business event, I don't think we've seen a better format than XML. Generally what is sent over the wire between systems are individual business events.
So, combined with XML's excellent ability to crossing system boundaries (with its text oriented format and careful attention to generalized data types not tied to specific systems) its a superb wire format (a topic of future posts). We'll see that, in volume, other stores, such as relational databases, may be better. After we've established some of the advantages of coarse-grained XML document-oriented programming, we will take a look at some of the tools that need to be built in various areas (some of them from Systinet, most from others) before document-oriented XML programming can really take off. Assistance in mapping document schemas to persistence formats is definitely one of them.
Most importantly, the data in the XML document is self-describing. Relatively readable tags encapsulating each value clarify what each element's data means semantically. References to XML schema for the document (which we hope and assume is there) make it even clearer. The schemas contain the data types for each element along with comments and other annotation that disambiguate what the meaning of each piece of the document is. Plus, the redundancy usually present in the XML document, in the form of surrounding elements (e.g. friendly readable names in addition to IDs), not only makes the documents self-contained but also assist when a person looking at an instance or schema is trying to determine the meaning of still unclear elements.
For developers of applications that need to just be aware of certain aspects of what happened in a business event, the XML document format of expressing the information is ideal. They do not need to learn the appropriate ways to join tables to reconstruct the information, and they don't need to learn how to make a sequence of calls to various methods of disparate objects. In addition, all of this data is conveniently there for the taking in the one XML document instance. With good hierarchical document design, XML instance retrieval capabilities (e.g. XPath), and sophisticated XML instance viewers (such as InfoPath) this data should not be overwhelming for their understanding (the whole XML instance approach to representing complex objects is an interesting inversion to the normal OO mantra of data hiding).
Generally the tags and the schema corresponding the the instance makes it clear where the data they care about is. And facilities such as XPath make it very easy to get to the fragments of the data they care about in declarative form without writing actual code. This ability to just identify the information that a particular application cares about makes it very easy for the XML document-style approach to web service interface design to be robust to evolution of the information that needs to be communicated. It also allows multiple applications to be informed of and work with the information about a business event, without it requiring each participant to explicitly negotiate notification methods with each. We'll talk more later about how XML document-centric design allow us to build systems with interfaces that are stable and extensible over time while facilitating easy involvement of many participants in a process.
// posted by Adam @ 11:37 AM