Table of Contents
If you want to develop a DLF converter for an application whose logging data model isn't adequately represented by one of the existing DLF schema, you'll need to develop a new one.
If you are familiar with SQL, a DLF schema is similar to a table schema description. A DLF file can be seen as a table, where each log record is represented by a table row. Each log record in the same DLF schema shares the same fields.
In this chapter, we will create a new schema for logging of FTP session. That DLF schema could serve for an improved DLF converter for log files generated by Microsoft Internet Information Server™. Lire currently has a DLF converter for these log files but the current ftp DLF schema is modelled after the xferlog log file which only represents file transfers whereas the log generated by Microsoft Internet Information Server™ contains more detailed information on the ftp session.
Here is an example of such a log file:
#Software: Microsoft Internet Information Server 4.0
#Version: 1.0
#Date: 2001-11-29 00:01:32
#Fields: time c-ip cs-method cs-uri-stem sc-status
00:01:32 10.0.0.1 [56]created spacedat/091001092951LGW_Data.zip 226
00:01:32 10.0.0.1 [56]created spacedat/html/bx01g01.gif 226
00:01:32 10.0.0.1 [56]created spacedat/html/catlogo.gif 226
00:01:32 10.0.0.1 [56]QUIT - 226
00:03:32 10.0.0.1 [58]USER badm 331
00:03:32 10.0.0.1 [58]PASS - 230
          As you can see, this log file contains other information beyond the simple upload/download represented in the standard FTP schema. It a session identifier, the command executed, as well as the result code of the action. Our new schema should be able to represent these things.
To create a DLF schema, you have to create a XML file
            named after your schema identifier:
            ftpproto.xml. Schema name should be
            made of alphanumeric characters. This schema identifier is
            case sensitive. You schema identifer shouldn't contains
            hyphens (-) or underscore characters
            (_). (The hyphen is used for a special
            purpose).
          
All DLF schemas starts and ends the same way:
<?xml version="1.0" encoding="ascii"?>
<!DOCTYPE lire:dlf-schema PUBLIC
  "-//LogReport.ORG//DTD Lire DLF Schema Markup Language V1.1//EN"
  "http://www.logreport.org/LDSML/1.1/ldsml.dtd">
<lire:dlf-schema xmlns:lire="http://www.logreport.org/LDSML/"
              superservice="ftpproto"
              timestamp="time"
              >
<!-- Other elements will go here -->
</lire:dlf-schema>
            
            The first lines contains the usual XML declaration and
            DOCTYPE declarations, you'll find in many XML documents.
            The real stuff starts at the
            lire:dlf-schema. What is important for
            your schema are the value of the superservice and timestamp attributes. The
            first one contains your schema identifier. It is called
            “superservice” for historical reasons. The
            other one should contains the name of the field which
            order the record by their event type. (See the section called “The Field Types” for more information.)
          
The last line in the above excerpt would be the last
            thing in the file and closes the
            lire:dlf-schema element.
          
The next things that goes into the schema file are the schema's title and description. Both are intended for developers to read and should be informative of the scope of the schema:
 <!-- Starting lire:dlf-schema element was omitted -->
  <lire:title>DLF Schema for FTP Protocol</lire:title>
  <lire:description>
    <para>This DLF schema should be used for FTP servers that have
          detailed information on the FTP connection in their log
          files.
    </para>
    <para>Each record represents a command done by the client during
     the FTP session.
    </para>
  </lire:description>
            
The content of the lire:description
            elements are DocBook elements. If you don't know DocBook,
            you just need to know that paragraphs are delimited using the
            para elements.
          
The only remaining things in the schema definitions are the field specifications. Here is the definition of the first one:
  <lire:field name="time" type="timestamp" label="Timestamp">
    <lire:description>
      <para>This field contains the timestamp at which the command was
              issued.
      </para>
    </lire:description>
  </lire:field>
            
As you can see, the fields are defined using the
            lire:field element which has three
            attributes:
            
This attribute contains the name of the field. This name should contains only alphanumeric characters. It can also make use of the underscore character.
This attribute contains the type of the field. The available types will described shortly.
This should contains the column label that should be used by default in your report for data coming from this field. This label should be short but descriptive.
The field's description is held in the
            lire:description element which contains
            DocBook markup. The field's description should be
            descriptive enough so that someone implementing a DLF
            converter for this schema knows what goes where.
          
The main types available for fields are:
This should be use for field which contains a value to indicate a particular point in time. All timestamp values are represented in the usual UNIX convention: number of seconds since January 1st 1970.
Each DLF schema must contains at least one
                      field of this kind and its name should be in the
                      lire:dlf-schema's timestamp
                      attribute.
                    
This type should be used for fields which contains an hostname or IP address.
It is important to mark such fields, because it will possible eventually to resolve automatically IP addresses to hostname.
Type for boolean values.
Type for numeric values.
You shouldn't use this type when the values are limited in number and are semantically related to an enumeration like result code. You should use the string type for this.
You should only use the number type for values which you'll want to report in classes instead on the individual values.
This type should be use for numeric values which are quantities in bytes. The more specific typing is useful for display purpose.
This type should be use for numeric values which are quantities of time. The more specific typing is useful for display purpose.
This is the type which can be use for all other purpose.
If you read the specifications, you'll find other types which are used. These additional types don't bring anything over the basic ones defined above and you shouldn't use them.
In addition to the time field defined above, here are the remaining field defintions which make our complete ftpproto schema:
  <lire:field name="sessid" type="string" label="Session">
    <lire:description>
     <para>This field should contains an identifier that can used
     to related the commands done in the same FTP session. This
     identifier can be reused, but shouldn't be while the FTP session
     isn't closed.
     </para>
    </lire:description>
  </lire:field>
  <lire:field name="command" type="string" label="Command">
    <lire:description>
     <para>This field contains the FTP command executed. The FTP
      protocol command names (STOR, RETR, APPE, USER, etc.) should be used.
     </para>
    </lire:description>
  </lire:field>
  <lire:field name="result" type="string" label="Result">
    <lire:description>
     <para>This should contains the FTP result code after executing
     the command.
     </para>
    </lire:description>
  </lire:field>
  <lire:field name="cmd_args" type="string" label="Argument">
    <lire:description>
     <para>This field should contains the parameters to the FTP command.
     </para>
    </lire:description>
  </lire:field>
  <lire:field name="size" type="bytes" label="Bytes Transferred">
    <lire:description>
     <para>When the command involves a transfer like for the RETR or STOR
      command, it should contains the number of bytes transferred.
     </para>
    </lire:description>
  </lire:field>
  <lire:field name="elapsed" type="duration" label="Elasped">
    <lire:description>
     <para>This field contains the number of seconds executing the
           command took. 
     </para>
    </lire:description>
  </lire:field>
              
Making available the new schema to the Lire
            framework is pretty easy: just copy the file to one of the
            directories set in the lr_schemas_path
            configuration variable. By default, this variable contains
            the directories
            datadir/lire/schemasHOME/.lire/schemas
Since we want our schema to be available for other users as well, we will install it in the system directory:
&root-prompt; install -m 644 ftproto.xml /usr/local/share/lire/schemas
            
            (In this case, Lire was installed under /usr/local.