|
|
|
Notes from 49th International STC Conference
Nashville, Tennessee, May 5-8, 2002
Introduction to XML for Technical Writers
Steve Manning is a senior consultant with The Rockley Group and is a
recognized expert in XML.
Session Description:
XML is one of the hot topics in Web technology. This session focused on the value that it
brings to technical writers and their users, and included code samples to
describe the concepts.
Note: The author published two papers in the Proceedings to this
conference. For an electronic copy or a hard copy (depending on availability), contact
Dan Voss.
- What is XML?
- It is a standard for markup languages the identify document structure.
- It was a recommendation of the World Wide Web Consortium (W3C).
- It is designed for Web-based publication/distribution.
- It is based on SGML.
- SGML = Standardized General Markup Language.
- ISO Standard 8879.
- SGML is a standard for defining markup language for describing structure
(chapter, paragraph, list, figure, etc., not format).
- It allows semantic markup.
- It is flexible, powerful, and transportable.
- SGML overcomes incompatible data formats and file formats as well as inconsistencies
in format and structure.
- What is semantic markup?
- Markup elements can be given meaningful names.
- Elements become metadata (data about data).
- You consider the document as data.
- What is structured markup?
- Elements identify parts of the document.
- It is not format-based.
- It is hierarchical.
- Why not stick with SGML?
- Long learning curve.
- Expensive tools.
- Not as portable.
- Not designed for the Web.
- XML goals
- Web-based delivery.
- Open standard.
- Based on SGML.
- Formal and concise.
- Easy to create.
- Flexible.
- XML vs. SGML
- XML is a lighter version of SGML.
- XML is more suitable for Web-based delivery.
- But what about HTML?
- HTML describes format, not content.
- You cannot extend HTML by adding your own tags.
- XML lets you describe content that provides retrieval based on content or
reuse based on content.
- The pieces of XML.
- Elements
- These are basic building blocks. Tags = names.
- Tags enclose content, establish start and end tags.
- They can have empty elements.
- They are nested, never overlapping.
- Unlike HTML, XML is case-sensitive.
- Attributes
- Name-value pairs associated with elements (for example, font size).
- Used mostly for data the user won't or doesn't need to see.
- Provide metadata.
- Entities
- Internal: work like a variable.
- External: work like an "Include" statement. For example, during Y2K, huge
Web sites like IBM had to update their copyright dates by changing the hard
code on 3,000,000 individual pages. With an XML entity for copyright, that could
have been done once.
- Referenced in text.
- Name spaces
- Unique way of identifying structural content models
- Allows you to include pieces of other DTDs without worrying about conflicts.
- DTDs = defining structure
- A DTD is a Document Type Definition: a formal definition of the elements,
attributes, and entities for a specific type of document.
- It defines all names, relationships, and frequency of elements.
- It is not absolutely required for XML.
- In XML writer jargon, text is called parsed character data (PCDATA).
- Well-formed vs. valid XML
- Well-formed is syntactically correct, with all elements closed/terminated.
Advantages include short development time and ease of distribution (no DTD), making
it good for prototyping. Disadvantages are that it can only be validated for syntax,
not structure; and it does not provide the same power in retrieval as valid.
- Valid XML is linked to a DTD and is validated against the DTD for correctness.
Advantages are that it acts as a template for writers, promotes consistency, can be
validated for compliance to content models (e.g., all lists of procedures must be
introduced by a stem sentence), and enables powerful manipulation of information.
Disadvantages are that it is fixed, no flexible; it requires long development curves;
and it entails a long learning curve.
- Available DTDs: You can use existing DTDs. Examples are DocBook, for software
documentation; CML, Chemical Markup Language; MathML, Math Markup Language; SMIL,
Synchronized Multimedia Integration Language (cueing multiple language tracks to
one video track in Web-based video); SVG, Scalable Vector Graphics; and XHTML, XML
version of HTML (XHTML will replace HTML 4.0; there will never be an HTML 5.0.
- Schemas
- Proposed alternative to DTDs that provide much more power.
- Written in XML
- Can ensure that usages are consistent throughout a document (e.g., abbreviation
for tablespoons in all recipes on a Web site)
- Standard is still developing.
- Presenting XML
- How do you deliver XML?
- XML is rarely delivered to end-users.
- It is mostly transformed to PDF, HTML, or other markup languages.
- XSL = Extensible Style Language
- Transforms XML data into other formats
- Can reorder/filter data for display
- Written in XML
- Based on matching element names in the XML document
- Functions similarly to a cascading style sheet.
- Three parts of XSL
- XLST: transforms XML to other markup languages, like HTML
- XSL FO: converts XML to paper-based markup language
- XPath: provides syntax for identifying elements by name and relationship
- XSL supports sorting (e.g., displaying books from A to M, sorted in ascending
alphabetical order by title)
- XLL = Extensible Link Language
- Extends the types/functionality of links
- Two types are XLinks and XPointers
- Xlinks enable any element to become a link. They can be bi- or multi-directional,
and they use URLs.
- Xpointers point to a particular point in a document (a specific element, a
numeric element, to an element based on a relationship, or to a range)
- Not yet fully supported by browsers
- Benefits of XML: Summary
- Structural completeness: all the required information in one place
- Structural consistency: predictability is an absolute requirement for reuse
- Separation of content and format: enables easier conversion to multiple formats
- Is SML for you? Yes, if you need to publish in multiple formats, get the
other 3 uses from the slides when they arrive.
- References
|