Search this blog

Sunday 21 February 2016

Reading from a SQL Server database

Reading from a SQL Server database is easy using R.

Here's a process that shows how to do this from within a RapidMiner process.

Of course, you can use the built in "Read Database" operator to read from a database, but there are restrictions in the community version. By using R you can get partially round the restrictions but you should always be aware of your license agreement. Just because you can get round the license does not mean that the terms no longer apply. If you do something that would normally trigger the purchase of an additional license then you still need to. I'm not a lawyer thankfully but, you have been warned.

Having said that, there are situations where you have to try things to prove viability and get political buy-in before committing to a more serious plan where money is to be spent. Political buy-in, as everyone knows, can sometimes takes a very long time and even the most trivial objection can completely de-rail progress. Removing the ability to make a full prototype is just such a potential trivial objection.

Having said all of that, the method the process uses here will have some subtle differences in the way it interacts with the database when compared to the "Read Database" operator. This means it might not work for some reason as yet unknown. Simple advice, don't rely on it.

Enough words, on with the process.

The process has two parts, the first sets some macros that are used within the second. It's a little known fact that you can use macros in this way but it's extremely powerful and allows the code to work in lots of places. The macros themselves are shown in the following table.



Change these to match what you have in your environment. Note that I am using SQL Server authentication so this means you have to set up your environment like this. I am led to believe that built-in authentication is possible but I have not tried it.

The R code itself is shown here.



Additional points:

  1. Install the RJDBC package into your environment, rJava may also be required.
  2. Download the Microsoft JDBC drivers from here (note that care is always needed with downloads such as these because the vendors keep changing their Web sites).
  3. If you are running on Ubuntu, the process will still work but there are some changes to do as shown in the R code.
  4. I have not tried it on a Mac.
  5. Change the query to whatever you want. The example here queries the system table.
The end result is an example set. The query shown in the example yields this.


You will see that the attribute names have been created automatically and a basic mapping to types has been done. The following shows part of the statistics for this example set.


One mapping that would need additional downstream work is the create_date attribute. It looks like it has been transformed into a polynominal. Closer inspection would, no doubt, reveal other foibles.

The example set can then be used in the normal way 

In summary, you can see that it is very easy to access SQL Server using R. It is therefore easy to do it from within RapidMiner.

No comments:

Post a Comment