Using SGML to Balance Mechanical and Organic Approaches

The ability to both standardize and customize is likely to become increasingly important as workgroups and organizations are pressured to quickly adapt to changing circumstances and are driven to pursue divergent information management strategies. By supporting both mechanical and organic approaches, SGML and its companion standards (especially HyTime) provide a framework for developing shared solutions that balance competing drivers and help ensure organizational health in the face of uncertainty. SGML allows organizations and workgroups to emphasize either engineering or organic approaches to realize immediate benefits but does not preclude evolving to a more balanced approach in the future.

The challenge to system designers is to provide a framework where local experimentation and variety do not become barriers to information exchange and where investment decisions can be made at a tactical level and still meet strategic requirements. SGML is both flexible enough to meet local requirements anare important precisely because they allow local, tactical solutions to be integrated and shared within a common information management architecture.

Improving Mechanical Efficiency

The classic view of SGML has been as an interchange standard. This use of SGML promotes mechanical efficiency, minimizes data transformations, emphasizes resource optimization, integrates and augments a wide variety of information technologies, and protects information from technology changes. SGML can reduce overall information lifecycle costs by formally defining information structures that meet the needs of a wider variety of information producers and consumers, thus reducing the barriers and costs associated with sub-optimization.

Formalized structures can function as interface specifications between dissimilar systems. Formal validation processes allow individual sets of information to be tested for conformance. Taken to its extreme, a formalized structure can act as a common denominator throughout the information lifecycle, allowing the same information objects to be used for creation, intermediate processing, and archival. SGML can reduce the need for duplicate systems and costly data conversions and make it easier to retrieve, recycle, re-purpose, and reformat information.

Rules-based formatting reduces cost. Lifecycle labor costs can be dramatically reduced as authoring and editing for appearance disappear. Consistency and quality are often improved, as are cycle times. Rules-based formatting dramatically improves flexibility and accounts for much of the reusability and repackaging described above.

The dynamics of an engineering-oriented SGML project are to pick a document type, stake out as much of the information lifecycle as possible, and try to get everyone to agree on a single acceptable structure. On the systems side, information producers and consumers will use the resulting SGML DTD as an interface specification, and build or adapt information technologies to understand and produce and interact with conforming documents. Because of the magnitude of associated investments, these interchange DTDs are meant to be stable artifacts that are used as fixed points of reference for years—or even decades.

Stakeholder Interests and Metadata Requirements

But SGML is an incredibly challenging information management standard. At a fundamental level, the challenges associated with implementing SGML are equivalent to the challenges associated with learning how to use computers and information technologies to improve organizational performance. This is because SGML is one of the better examples of revolutionary computer technologies that were not designed to conform to the paper paradigms of the past but to meet the more complex requirements of information-based economies.

SGML's flexibility allows organizations to do things that could not be done as easily any other way—to make big gains. That same flexibility also gives organizations the ability to do things that are not in their best interests — to make big mistakes. Many leading edge SGML projects are percieved as being so important to their corporation's succes that they won't even discuss them, believing that the mere conceptualization of an SGML application can have strategic value.

Understanding metadata (data about data) is at the core of understanding these challenges. Information, by itself, is not terribly valuable anymore. There is simply too much of it. Metadata, by contrast, is increasing in importance because it provides the "hooks" needed by computers to determine how to process the data and the "handles" needed by humans to help identify which pieces of information are relevant to their interests.

What is metadata? The SGML tags within a document instance are metadata. They describe the role of each element within the context of the document's structure. Attributes are metadata, as they further describe important characteristics of the data within the SGML instance. Titles, authors, publication dates, and index numbers are metadata, as are annotations, bookmarks, and other navigational aides.

TV Guide is one of the best examples of metadata and its increasing importance. With the exception the horoscopes and advertisements, TV Guide is almost entirely metadata, and not so long ago, Wired magazine reported that TV Guide makes more money than the four major networks, combined.

When SGML is used to develop a vendor and processing-neutral markup language, the resulting Document Type Definition (DTD) is a formalized framework for capturing and storing metadata. DTD development is often described as a contact sport because of the conflict that results when a varied group of information producers and consumers attempt to develop common definitions. This is because a DTD (or metadata framework) represents a negotiated balance between the divergent stakeholder interests that exist at different points in the information lifecycle. In most cases, those interests diverge precisely on the types of metadata that should be stored and managed.

It is not uncommon, for example, for authors and editors to desire a simple markup language that is easy to use. Information consumers, on the other hand, usually desire richer, more complex sets of metadata. Instead of being satisfied with a DTD that reflects the generic structures of the document (e.g., chapter and title), tags that capture the meaning of the data (e.g, purpose, scope, rationale, part number, voltage, person, software package, company) are preferred. Rich metadata allows documents to better function as databases and can have important benefits when using retrieval tools that support context-sensitive searches.

Disruption of the Status Quo and its Policy Implications

For most organizations, the transition from proprietary, page-based document production and management architectures to SGML will be one of the most destabilizing efforts ever attempted. The adoption of these standards means significant changes to tools, processes, responsibilities, and even the way people think about information. The transition to an SGML-based document management process generally shifts the cost burden upstream and shifts the realization of benefits downstream. Often, the magnitude of these changes become even greater if the organization seeks to maximize the returns on their SGML investments and align production processes with the new paradigms.

Taken together, divergent metadata requirements, shifts in cost-benefits profiles, and process changes are not technology issues, but represent policy choices that a given organization needs to make. In most cases, the most compelling reasons for adopting SGML are not to improve the efficiency of existing operations but to realize long-term policy goals. For many organizations, however, the policy implications of SGML are its most challenging aspects. Identifying and balancing competing policy objectives is often difficult and chaotic.

SGML effectively forces organizations to formalize an information policy, often for the first time. This is not easy. It also goes a long way towards explaining why it is so difficult to develop a business case for SGML. The potential benefits of using SGML are well known and fairly well documented, but many of the benefits are tied to the specific metadata that the organization chooses to implement—not the standard itself.

It is precisely because the SGML standard leaves so much to be decided by implementing companies that a large number of economic, organizational, and technical factors influence both the design choices and investment decisions required to implement SGML. Some of these potential benefits are concerned primarily with mechanical efficiency, while others deal with human interaction and performance. The choices that an organization or project team makes when balancing these competing measures of value have tremendous impact on how (and even whether) potential and intended benefits are fully realized.

For many organizations, real value cannot be easily measured in strictly financial terms. They require the richer, more expensive metadata that can be difficult to justify using only calculated costs and benefits. At the same time, these organic measures of value can be central to the SGML implementation effort and a major source of strategic value. For example, as the information density of business transactions continues to increase, organizations that deliver richer, more useful information products to their customers are likely to realize competitive advantages, relative to competators that focus their their investment strategies around cost savings.

Managing Information Performance

In many respects, SGML represents the "derivatives of the information technology marketplace". SGML is a very complex technology whose implementation is dominated by complex policy and associated political issues. Like derivatives, SGML is a strategic technology that allows organizations to make big gains and big mistakes. And many managers don't really understand "all that computer stuff", having effectively abdicated their decision-making role to one or more "experts": in-house technology managers, developers, vendor's sales reps, consultants, or even the popular computing magazines. When confronted with decisions about using SGML, will managers stop to educate themselves on the policy issues, or will they just hand over virtually all learning and decision-making to someone else? Will their organizations make wise investments? How will they know?

Senior managers have begun to learn that they need to provide oversight and define corporate policies for the use of derivatives to reduce and manage the risk associated with these investments. They also need to provide oversight of and policy-making guidance for their information investments — particularly those involving SGML — to ensure that these investments are aligned with corporate performance objectives.


Copyright, The Sagebrush Group, 2000