Search this blog

Tuesday 26 November 2013

New book: Exploring data with RapidMiner

I'm the author of "Exploring real data with RapidMiner"

Of course I would be very happy if you would like to purchase a copy :)

Some more details are here.

Monday 18 November 2013

Negative and positive lags

It's often required to compare an attribute's value in one example with a value in another example either in the past or in the future.

The Lag Series is the one to use to get previous values brought forward to now. Looking into the future is somewhat harder (and of course you can argue that in a Data Mining context, it's cheating to use future values when these are the things we are trying to predict).

One approach is to go forward in time and use the lag operator to bring forward the values that are now in the past. Then go back in time and use the values brought forward as the new now.

Another approach is to use the Generate Id operator combined with the Join operator. There is a little known parameter called "offset" that allows numerical ids to be generated from a given starting value. If this operator is applied twice with different offsets (with some accompanying gymnastics to make the ids regular attributes with different names) followed by Join using these ids, the result is an example where past and future values can be brought to the present (although some more gymnastics are needed to get the names right).

Here's an example process showing this with the Lag Series operator for comparison (make sure you install the Series extension for the Lag Series operator).

Maybe those nice RapidMiner R&D chaps will add a negative lag :)