More Advanced Queries: “Don’t .qry for me Argentina”

In a previous blog, we learned about Apama Queries, including how to build a simple application using the Query Designer graphical interface. This GUI is a powerful tool, allowing users to build full, complex systems without resorting to coding. But the UI is actually building an underlying textual representation of the query. So what does that look like? Let’s find out.

Query definition

Recall that like a regular EPL listener, a query searches for an event pattern that you specify. You define a query in a file with the extension “.qry”, and each .qry file contains the definition of only one query. If specified, any package or using statements must be before the query declaration.

The format for a query definition is as follows:

query name {
    [ parameters { parameters_block } ]
    inputs { inputs_block }
    find pattern_block
    { action_definition ... }
}

Where the find pattern block takes the more complex form of:

find
    [every]
    [wait duration as identifier]
    event_type as identifier [find_operator event_type as identifier]...
    [wait duration as identifier]
    [where_clause] [within_clause] [without_clause]
    [select_clause]*
    [having_clause]*
    {
        action_definition
    }

That’s quite a lot to take in! Let’s have a look at a simple example first, namely the definition of the query constructed in the previous blog, extended slightly to include a parameters block:

query ImprobableWithdrawalLocations {
    parameters {
        float period;
    }
    inputs {
        Withdrawal() key cardNumber within period;
    }
    find
        Withdrawal as w1 -> Withdrawal as w2
        where w2.country != w1.country {
            send("eventType":".SuspiciousWithdrawalAlert",
                    "title":"Send Action 1", "description":"",
                    "channel":"\"apama.test\"",
                    "fields": {
                        "msg":"\"\\\"Suspicious withdrawal\\\" + \" + (w2.cardHolder).toString() + \" \\t+ \" + (w2.country).toString()"
                    });
        }
}

And now let’s take a closer look at each component:

Parameters section: This section is optional. Parameters allow us to create multiple parameterized instances of queries. In this case, we want to be agile about the size of our window without maintaining multiple query definitions. The lifetime of parameterized queries is slightly different from their non-parameterized cousins; no instances are created at injection time and instead must be instantiated using the Scenario Service; please refer to the Using the Scenario Service to manage parameterized queries page in the documentation for more information.

Inputs section: This is a required section in a query, and must specify at least one event type. The inputs section defines which event types (Withdrawal events in the example above) the query will be triggered by. These events will be partitioned by a key – each key value is processed independently and in parallel. In this example, we’re using the cardNumber field of the Withdrawal event as the key, so the withdrawals for each cardNumber are handled separately, and the Query will only trigger on events of the same cardNumber. Partitioning the inputs like this allows Apama Queries to scale up easily with the number of distinct cardNumbers (of which we’d naturally expect many!). The input also defines a window of interest for the events – in this example, we’re using a within specifier to limit the age of the events we’re interested in; we could also use a retain specifier instead to limit the number of events.

Find statement: This is also a required section in a query. It specifies the event pattern of interest, in this case including the where clause(s), and a block that contains procedural code. In this example, we are looking at all the Withdrawals which take place after another Withdrawal (within the window), which also take place in different countries (which might be a bit suspicious, given a suitably short time window). If we do complete that pattern, the code inside will run, in this case we will send an event detailing the suspicious activity. Other clauses available in the find statement are the within and without clauses. Within functions much like we have seen before, defining a temporal window for consideration, while the without clause specifies conditions which prevent a match, as shown below:

find OuterDoorOpened as od -> InnerDoorOpened as id
    where od.user = id.user
    without SecurityCodeEntered as sce where od.user = sce.user {
        log "Intruder!: "+id.user;
    }  

There are many operators available for use with the find statement, though they are outside the scope of this blog. Please to refer to the documentation for the full list, with examples.

Timers

Sometimes it is handy to put time intervals before or after our patterns, and for this we have the wait operator. The wait operator functions much like it does in EPL, creating a timer which fires after a defined period. Timers in queries have some additional uses, where we can define dynamic ranges in conjunction with the without operator, as in the below example:

find Warning as warn -> Alert as alert -> wait(10.0) as t 
   without AllClear as allClear between ( alert t )

Timer ranges can also be constructed using just the within operator. For example, checking that certain conditions happen within certain time windows of each other. In the below example, we only care about the situation when the B event arrives within 10 seconds of the A event, and then that we also receive the C and D event within 10 minutes of each other (note that temporal literals are something regular listeners can’t do!):

find A as a -> B as b -> (C as c and D as d)
   within 10.0 between (a b)
   within 10 minutes between (c d)

Aggregates

We can also specify find patterns that aggregate data across the window; for instance we may want to check if an event has a field with a value above or below a moving average for that field composed from the previous events. For this, we introduce Aggregates (which users familiar with EPL’s Stream Queries will recognize) and three new clauses:

using com.apama.aggregates.avg;
using com.apama.aggregates.last;
query FindSuspiciouslyLargeWithdrawals {
    inputs {
        ATMWithdrawal() key accountId retain 20;
    }
    find every ATMWithdrawal as w
        select last(w.transactionId) as tid
        having last(w.amount) > 1.6 * avg(w.amount) {
            send SuspiciousTransaction(tid) to "SuspiciousTxHandler";
        }
}

So what’s changed in this example? Firstly, we are bringing in a couple of built-in aggregates (avg and last) for use in our query. Next, we introduce the every modifier which tells the query it will be operating over the whole window set. Then, we use the select clause in conjunction with the standard last aggregate to grab some pertinent data from the last element in the window (the most recent event) if the having clause is true, which in this case will be when the last item in the set has an amount greater than the running average. The action block in this query will send a warning containing the information we co-assigned in the select clause. Any find pattern containing an every modifier must contain one or more select and or having clause(s), for example we may want to select both the ID of the transaction and the location of where it took place. Finally, we can mix these clauses in with our old friend the where clause, such as in this example, where we want to keep track of the average trade price that doesn’t involve ourselves:

find every Trade as t
    where t.buyer != myId and t.seller != myId
    select wavg(t.price, t.amount) as avgprice

Conclusion

Although Apama Queries can be entirely developed from the graphical editor in Designer, drilling down into the code itself reveals the true scale and power of the language. Queries can be parameterized to avoid code duplication and have their inputs partitioned and windowed both temporally and quantitatively. The find statement has a range of operators and clauses, allowing us to define queries that search for complex patterns including filtering, ordering, timing and even aggregations over the entire set. While it is recommended to work with Apama Queries visually from Designer, we hope that this blog has given some insight under the hood of the run-time, as well as giving more confident community members something to bite into.

As always, please post on our forums with any questions you may have. Thanks, and happy developing!

Callum