Validating FIX Messages
Table of Contents
A Bit About FIX
In the world of Finance, the Financial Information eXchange (FIX) protocol is used to facilitate communication between two market participants. Most users come close to FIX when they're using an Order Management System (OMS), which typically acts as an interface to a FIX engine.
At it's core, the FIX protocol is a set of structured messages that facilitate various types of transactions like orders, executions, and status updates about financial instruments.
// an example FIX message pulled from Wikipedia 8=FIX.4.2|9=65|35=A|49=SERVER|56=CLIENT|34=177|52=20090107-18:15:16|98=0|108=30|10=062|
The Object Oriented Approach
The FIX protocol has some concept of modularity and encapsulation, and messages themselves are comprised of fields that bear a resemblance to objects with attributes. This makes the protocol well suited for implementation in an object oriented language like C++ or Java.
There are only a handful of open source FIX implementations available. In this post I'll be referring to QuickFIX/J, which is the Java flavor of QuickFIX.
implementing the QuickFIX/J Application interface
QuickFIX applications are centered around quickfix.Application interface. The interface defines various callback methods for things like session management, and the processing of incoming and outgoing fix messages. Instances of the application are passed to an "acceptor" (server) or "initiator" (client).
// pulled from the quickfix/j user manual package quickfix; public interface Application { void onCreate(SessionID sessionId); void onLogon(SessionID sessionId); void onLogout(SessionID sessionId); void toAdmin(Message message, SessionID sessionId); void toApp(Message message, SessionID sessionId) throws DoNotSend; void fromAdmin(Message message, SessionID sessionId) throws FieldNotFound, IncorrectDataFormat, IncorrectTagValue, RejectLogon; void fromApp(Message message, SessionID sessionId) throws FieldNotFound, IncorrectDataFormat, IncorrectTagValue, UnsupportedMessageType; }
handling messages with QuickFIX/J's MessageCracker
The approach to message handling in the documentation recommends utilizing the MessageCracker class which peeks at the MsgType field of a generic incoming Message and casts it to it's corresponding typed message. Then it's just a matter of specifying what types of messages the application cares about and what to do with them.
// Example application using quickfix.MessageCracker public class MyApplication extends MessageCracker implements quickfix.Application { public void fromApp(Message message, SessionID sessionID) throws FieldNotFound, UnsupportedMessageType, IncorrectTagValue { crack(message, sessionID); } // Using annotation @Handler public void myEmailHandler(quickfix.fix50.Email email, SessionID sessionID) { } public void onMessage(quickfix.fix44.Email email, SessionID sessionID) { } }
Learning from QuickFIX
Message classes for the QuickFIX libraries are generated from a common
specification available in XML format. the C++ version of QuickFIX uses XSLT
to parse the specification, and build header files containing FIX message
classes (e.g. MarketDataRequest.h
).
The anatomy of FIX message specifications
The lowest level of a FIX message is the data type
. FIX data types resemble
the primitives found in most programming languages, but come with additional
constraints that can make them more complex.
Fields
are the next level, and are either required or optional, and are of a
particular FIX data type.
<field number='1' name='Account' type='STRING' /> <field number='2' name='AdvId' type='STRING' /> <field number='3' name='AdvRefID' type='STRING' />
Groups
are distinct collections of ordered or unordered fields.
<group name='NoHops' required='N'> <field name='HopCompID' required='N' /> <field name='HopSendingTime' required='N' /> <field name='HopRefID' required='N' /> </group>
Components
, which are comprised of fields and groups, essentially represent
properties common to many different FIX messages. Though present in the
specification files, components are not a concrete concept within the FIX
protocol in the same way that fields and groups are.
<component name='PtysSubGrp'> <group name='NoPartySubIDs' required='N'> <field name='PartySubID' required='N' /> <field name='PartySubIDType' required='N' /> </group> </component>
The Header
and Trailer
of a message are also comprised of fields and
components. All of these things come together to form a Message
.
Transforming the FIX XML specification with Meander
Meander is a Clojure library that provides a number of macros that use logic
variables (symbols that start with a ?
) to transform data in a plain fashion.
Often times when working with code that transforms data, the shapes of inputs
and outputs aren't readily apparent. Meander does a decent job representing
transformations as data, so that transformations are easy to understand without
walking through functions. For that reason, I wanted to use it to render the FIX
specification into an intermediate representation.
Defining Primitive Data Types Manually
There are only a handful of data types, and the QuickFIX specification files don't provide a means for constructing them, so they must be defined manually. the FIXimate tool is a useful source of information that provides specifics for each primitive. Each primitive in my case is defined as a Clojure spec:
... (s/def ::integer #(mu/is-int? %)) (s/def ::length #(mu/is-pos-int? %)) (s/def ::tag-number #(mu/is-pos-int? %)) (s/def ::sequence-number #(mu/is-pos-int? %)) (s/def ::number-in-group #(mu/is-pos-int? %)) ...
They are then added to a lookup table that maps each spec to it's name as found in the specification files. This table will be referenced while generating fields.
... (def primitives {"AMT" ::amount "BOOLEAN" ::boolean "CHAR" ::character "COUNTRY" ::country ...
Parsing and Filtering XML
Specification files are read and then converted to EDN via clojure.data.xml
.
They are then filtered in various ways to extract data that will be handed to
Meander.
(ns meriweather.parse.xml (:require [clojure.java.io :as io] [clojure.data.xml :as xml] [meander.epsilon :as m] [meriweather.util :as mu] [meriweather.data-types :refer [primitives]])) ;; clojure.data.xml reads lazily, this function forces reading with `doall` (defn read-xml [file] (let [xml (-> file io/file io/input-stream)] (-> xml xml/parse xml-seq doall))) (defn filter-version [data] (filter #(= (:tag %) :fix) data)) (defn filter-fields [data] (let [by-tag #(= (:tag %) :field) by-number #(contains? (:attrs %) :number)] (filter (apply every-pred [by-tag by-number]) data))) ;; components are defined and then referenced within the same specification, ;; filtering out the :required field ensures we only recieve definitions. (defn filter-components [data] (let [by-tag #(= (:tag %) :component) by-required #(not (contains? (:attrs %) :required))] (filter (apply every-pred [by-tag by-required]) data))) (defn filter-messages [data] (filter #(= (:tag %) :message) data)) ;; takes the initial conversion of xml to edn created by `read-xml` and makes it ;; less xml-y by flattening the data somewhat. (defn xml->edn [elem] (m/rewrite elem {:tag ?tag :attrs (m/map-of !k !v) :content (m/seqable !content ..1)} {:tag ?tag :children [(m/cata !content) ...] & ([!k !v] ...)} {:tag ?tag :attrs (m/map-of !k !v) :content (m/pred empty?)} {:tag ?tag & ([!k !v] ...)}))
Transforming the Data
Once the data has been ingested and flattened a little bit, each part of the
specification has it's elements transformed with Meander's rewrite
macro.
the rewrite
macro allows easy handling of the various forms an input could
take, which makes working with nested or self-referential data much easier.
(defn field [field] (m/rewrite field {:number ?number :name ?name :type ?type :tag :field :children (m/seqable !children ..1)} {(m/app Integer/parseInt ?number) {:name ~(mu/keywordize ?name) :spec ~(get primitives ?type) :values {& [(m/cata !children) ...]}}} {:number ?number :name ?name :type ?type :tag :field} {(m/app Integer/parseInt ?number) {:name ~(mu/keywordize ?name) :spec ~(get primitives ?type)}} {:enum ?enum :description ?desc :tag :value} {?enum ~(keyword ?desc)})) (defn component [component] (m/rewrite component {:name ?name :tag :component :children (m/seqable !children ..1)} {:name ~(mu/keywordize ?name) :tag :component :children [(m/cata !children) ...]} {:name ?name :required ?required :tag ?tag :children (m/seqable !children ..1)} {:name ~(mu/keywordize ?name) :tag ?tag :required ~(mu/char->boolean ?required) :children [(m/cata !children) ...]} {:name ?name :tag ?tag :required ?required :children (m/pred empty?)} {:name ~(mu/keywordize ?name) :tag ?tag :required ~(mu/char->boolean ?required)})) (defn message [message] (m/rewrite message {:name ?name :msgtype ?msgtype :msgcat ?msgcat :tag :message :children (m/seqable !children ..1)} {:name ?name :msgtype ?msgtype :msgcat ~(keyword ?msgcat) :tag :message :children [(m/cata !children) ...]} {:name ?name :required ?required :tag ?tag} {:name ?name :required ~(mu/char->boolean ?required) :tag ?tag}))
Putting it Together
(def data-file (read-xml "FIX44.xml")) (def fields (->> data-file filter-fields (map xml->edn) (map field) (into (sorted-map))))
What we end up with looks a lot like the original specification with a few differences. Most notably, fields are numerically indexed and have their primitive spec associated.
... {1 {:name :account, :spec :meriweather.data-types/string}, 2 {:name :adv-id, :spec :meriweather.data-types/string}, 3 {:name :adv-ref-id, :spec :meriweather.data-types/string}, ...
Now that we have our IR, we can use it to validate some FIX messages.
Validating A FIX Message
In this post, all of the FIX messages have been tagvalue encoded. Tagvalue encoding is still used today, except in use-cases where it is imperative that extremely low levels of latency can be achieved while operating on high levels of throughput. Such is the case with data flowing in and out of the NASDAQ.
Since tagvalue encoding is represented as a string, we can use a simple regex to parse a message and quickly validate that it's in the correct format. If it isn't, we don't want to bother trying to validate the field values.
(def field-pattern #"(?<tag>\d+)(?<delim>=)(?<value>[^\u0001]+)(?<SOH>\u0001)") (defrecord Field [tag delim value soh]) (s/def ::tag (s/and string? #(re-matches #"[0-9]{0,4}" %))) (s/def ::delim #(= "=" %)) (s/def ::value any?) (s/def ::soh #(= (mu/hex->str %) "1")) (s/def ::field (s/keys :req-un [::tag ::delim ::value ::soh])) (defn str->fields [message] (->> message (re-seq field-pattern) (map #(apply ->Field (rest %))) (into []))) (defn valid-message-format? [message] (every? #(s/valid? ::field %) message))
Assuming that the message is in the proper format, we'll use our intermediate representation to validate that each field in the message is of a type would be expected in that kind of message.
(defn valid-field? [definitions & {:keys [tag value]}] (let [tag-number (Integer/parseInt tag) field (get definitions tag-number)] (if (contains? field :values) (and (s/valid? (:spec field) value) (contains? (:values field) value)) (s/valid? (:spec field) value)))) (defn valid-message? [definitions message] (every? #(valid-field? definitions %) message))
Example Message
A common FIX message is the NewOrderSingle message. As one could imagine, this message type is used to place a single new order, perhaps within a trading terminal. Here is an example of the NewOrderSingle message according to the FIX42 specification:
(def new-order-single-42 "8=FIX.4.29=25135=D49=56=ABROKER34=252=2003061501:14:4911=123451=11111163=064=2003062121=3110=1000111=5000055=IBM48=45920010122=154=160=2003061501:14:4938=500040=144=15.7515=USD59=010=127")
After loading the IR, validating that the message is properly formatted is relatively simple:
(def field-definitions (-> "FIX42.edn" io/resource slurp edn/read-string :field)) (def parsed-message (str->fields new-order-single-42)) (valid-message-format? parsed-message) ;; => true (valid-message? field-definitions parsed-message) ;; => true
Whats Next?
The above code doesn't enforce the order of fields within groups, which is something that would have to be accounted for before it could be used in any sort of scenario. After that, my next steps are to produce valid messages using that same intermediate representation. Ultimately, my goal is to have a FIX implementation written purely in Clojure. At the time of writing however my code leans heavily upon QuickFIX/J and leaves much to be desired.