Complementing MongoDB with Real-time Solr Search

Overview

I’ve been a long time user and evangelist of Solr given its amazing ability to fulltext index large amounts of structured and unstructured data. I’ve successfully used it on a number of projects to add both Google-like search and faceted (or filtered) search to our applications. I was quite pleased to find out that MongoDB has a connector for Solr to allow that same type of searching against my application that is back-ended with MongoDB. In this blog post, we’ll explore how to configure MongoDB and Solr and demonstrate its usage with the MongoDB application I wrote several months back that’s outlined in my blog post Mobile GeoLocation App in 30 minutes – Part 1: Node.js and MongoDB.

Mongo-Connector: Realtime access to MongoDB with Solr

I stumbled upon this connector during my research, mongo-connector. This was exactly the sort of thing I was looking for namely because it hooks into MongoDB’s oplog (somewhat similar to a transaction log in Oracle) and updates Solr in real-time based on any create-update-delete operations made to the system. The oplog is critical to MongoDB for master-slave replication, thus it is a requirement that MongoDB needs to be set-up as a replica set (one primary, n-number of slaves; in my case 2). Basically, I followed the instructions here to setup a developer replica set. Once established, I started each mongod instance as follows so they would run in the background (–fork) and use minimal space due to my disk space limitation (–smallfiles).

% mongod –port 27017 –dbpath /srv/mongodb/rs0-0 –replSet rs0 –smallfiles –fork –logpath /srv/mongodb/rs0-0.log

% mongod –port 27018 –dbpath /srv/mongodb/rs0-1 –replSet rs0 –smallfiles –fork –logpath /srv/mongodb/rs0-1.log

% mongod –port 27019 –dbpath /srv/mongodb/rs0-2 –replSet rs0 –smallfiles –fork –logpath /srv/mongodb/rs0-2.log

Once you have MongoDB configured and running you need to install the mongo-connector separately. It relies on Python, so if not installed, you will want to install version 2.7 or 3. To install the mongo-connector I simply ran this command to install it as a package:

% pip install mongo-connector

After it is installed you can run it is as follows so that it will run in the background as well using nohup (hold off on running this till after the next section):

% nohup sudo python mongo_connector.py -m localhost:27017 -t http://solr-pet.xxx.com:9650/solr-pet -d ./doc_managers/solr_doc_manager.py > mongo-connector.out 2>&1

A couple things to note here is that the -m option points to the localhost and port of the primary node in the MongoDB replica set. The -b option is the location of Solr server and context. In my case, it was a remote based instance of Solr. The -n option is the namespace to the the Mongo databases and collection I wish to have indexed by Solr (without this it would index the entire database). Finally, the -d option indicates which doc_manager I wish to use, which of course, in my case is Solr. There are other options for Elastic search as well, if you chose to use that instead.

With this is place your MongoDB instance is configured to start pushing updates to Solr in real-time, however, let’s take a look at the next section to see what we need to do on the Solr side of things.

Configuring Solr to work with Mongo-Connector

Before we run the mongo-connector, there are a few things we need to do in Solr to get it to work propertly. First, to get the mongo-connector to post documents to Solr you must be sure that you have the Solr REST service available for update operations. Second, you must configure the schema.xml with specific fields that are required as well as any fields that are being stored in Mongo. On the first point, we need to be sure that the following line exists in the solr.xml config:

<requestHandler name=”/update” class=”solr.UpdateRequestHandler”/>

As of version 4.0 of Solr, this request handler supports XML, JSON, CSV and javabin. It allows the mongo-connector to send the data to the REST interface for incremental indexing. Regarding the schema, you must include a field per each entry you have (or are going to add) to your Mongo schema. Here’s an example of what my schema.xml looks like:

<schema name="solr-suggest-box" version="1.5">
        <types>
                <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
                <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
                <fieldType name="text_wslc" class="solr.TextField" positionIncrementGap="100">
                        <analyzer type="index">
                                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                                <filter class="solr.LowerCaseFilterFactory"/>
                        </analyzer>
                        <analyzer type="query">
                                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                                <filter class="solr.LowerCaseFilterFactory"/>
                        </analyzer>
                </fieldType>
                <fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>
                <fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
                <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true" precisionStep="6" positionIncrementGap="0"/>
        </types>

        <fields>
                <field name="_id" type="string" indexed="true" stored="true" required="true" />
                <field name="name" type="text_wslc" indexed="true" stored="true" />
                <field name="description" type="text_wslc" indexed="true" stored="true" />
                <field name="date" type="tdate" indexed="true" stored="true" />
                <field name="nmdsc" type="text_wslc" indexed="true" stored="true" multiValued="true" />
                <field name="coordinate" type="location" indexed="true" stored="true"/>
                <field name="_version_" type="long" indexed="true" stored="true"/>
                <field name="_ts" type="long" indexed="true" stored="true"/>
                <field name="_ns" type="string" indexed="true" stored="true"/>
                <field name="ns" type="string" indexed="true" stored="true"/>
                <field name="coords" type="string" indexed="true" stored="true" multiValued="true" />
                <dynamicField name="*" type="string" indexed="true" stored="true"/>
        </fields>

        <uniqueKey>_id</uniqueKey>

        <defaultSearchField>nmdsc</defaultSearchField>

        <!-- we don't want too many results in this usecase -->
        <solrQueryParser defaultOperator="AND"/>

        <copyField source="name" dest="nmdsc"/>
        <copyField source="description" dest="nmdsc"/>
</schema>

I found that all the underscore fields (lines 21-32) I have were required to get this working correctly. To future proof this, on line 32 I added a dynamicField so that the schema could change without affecting the Solr configuration — a tenant of MongoDB is to have flexible schema. Finally, I use copyfield on lines 42-43 to only include those fields I wish to search against, which name and description were only of interest for my use case. The “nmdsc” field will be used as the default search field for the UI as per line 37, which I will go into next.

After your config is in place and you start the Solr server, you can now launch the mongo-connector successfully and it will continuously update Solr with any updates that are saved to Mongo in real-time. I used nohup to kick it off in the background as shown above.

Using Solr in the DogTags Application

To tie this all together, we need to alter the UI of the original application to allow for Solr searching. See my original blog post for a refresher: Mobile GeoLocation App in 30 minutes – Part 2: Sencha Touch. Recall that this is a Sencha Touch MVC application and so all I needed to do was add a new store for the Solr REST/JSONP service that I will call for searching and update the UI to provide a control for the user to conduct a search. Let’s take a look at each of these:

Ext.define('MyApp.store.PetSearcher', {
    extend: 'Ext.data.Store',
    requires: [
        'MyApp.model.Pet'
    ],
    config: {
        autoLoad: true,
        model: 'MyApp.model.Pet',
        storeId: 'PetSearcher',
        proxy: {
            type: 'jsonp',
            url: 'http://solr-pet.xxx.com:9650/solr-pet/select/',
            callbackKey: 'json.wrf',
            limitParam: 'rows',
            extraParams: {
                wt: 'json',
                'json.nl': 'arrarr'
            },
            reader: {
                root: 'response.docs',
                type: 'json'
            }
        }
    }
});

Above is the new store I’m using to call Solr and map its results back to the original model that I used before. Note the differences from the original store that our specific to Solr, namely the URL and some of the proxy parameters on lines 10-18. The collection of docs are a bit buried in the response, so I have to set the root accordingly as I did on line 20.

The next thing I need to do is add a control to my view so the user can interact with the search service. In my case I chose to use a search field docked at the top and have it update the list based on the search term. In my view, the code looks as follows:

Ext.define('MyApp.view.PetPanel', {
    extend: 'Ext.Panel',
    alias: 'widget.petListPanel',
    config: {
        layout: {
            type: 'fit'
        },
        items: [
            {
                xtype: 'toolbar',
                docked: 'top',
                title: 'Dog Tags'
            },
            {
                xtype: 'searchfield',
                docked: 'top',
                name: 'query',
                id: 'SearchQuery'
            },
            {
                xtype: 'list',
                store: 'PetTracker',
                id: 'PetList',
                itemId: 'petList',
                emptyText: "<div>No Dogs Found</div>",
                loadingText: "Loading Pets",
                itemTpl: [
                    '<div>{name} is a {description} and is located at {latitude} (latitude) and {longitude} (longitude)</div>'
                ]
            }
        ],
        listeners: [
            {
                fn: 'onPetsListItemTap',
                event: 'itemtap',
                delegate: '#PetList'
            },
            {
                fn: 'onSearch',
                event: 'change',
                delegate: '#SearchQuery'
            },
            {
                fn: 'onReset',
                event: 'clearicontap',
                delegate: '#SearchQuery'
            }
        ]
    },
    onPetsListItemTap: function (dataview, index, target, record, e, options) {
        this.fireEvent('petSelectCommand', this, record);
    },
    onSearch: function (dataview, newValue, oldValue, eOpts) {
        this.fireEvent('petSearch', this, newValue, oldValue, eOpts);
    },
    onReset: function() {
        this.fireEvent('reset', this);
    }
});

Lines 15-18 add the control and lines 38-47 define the listeners I’m using to fire events in my controller. The controller supports those events as follows:

    onPetSearch: function(view, value, oldvalue, opts) {
        if (value) {
            var store = Ext.getStore('PetSearcher');
            var list = this.getPetList();
            store.load({
                params: {q:value},
                callback: function() {
                    console.log("we searched");
                    list.setData(this._proxy._reader.rawData.response.docs);
                }
            });
            list.setStore(store);
        }
    },

    onReset: function (view) {
        var store = Ext.getStore('PetTracker');
        var list = view.down("#petList");
        store.getProxy().setUrl('http://nodetest-loutilities.rhcloud.com/dogtag/');
        store.load();
        list.setStore(store);
    },

Since the model is essentially the same between Mongo and Solr, all I have to do is swap the stores and reload them to get the results updated accordingly. On line 6, you can see where I pass in the dynamic search term so that is loads the PetSearcher store with that value. When I reset the search value, I want to go back to the original PetTracker store to reload the full results as per lines 17-21. In both, I set the list component’s view to the corresponding store as I did on lines 12 and 21 so that the list will show the results according to the store it has been set to.

Conclusion

In this short example, we established that we could provide real-time search with Solr against MongoDB and augment an existing application to add a search control to use it. This has the potential of being a great compliment to Mongo because it keeps us from having to add additional indexes to MongoDB for searching which has a performance cost to it, especially as the record set grows. Solr removes this burden from Mongo and leverages an incremental index that can be updated in real-time for extremely fast queries. I see this approach being very powerful for modern applications.

Advertisements

RabbitMQ, Node.js and Groovy: Making Messaging Easy

Overview

If you’ve been following my blog, you’re probably well aware of my penchant for node.js and the few blog posts I’ve already posted on the subject. More recently, I wanted to explore RabbitMQ, a messaging platform similar to JMS, that can be easily leveraged with node.js as an alternative to saving messages with HTTP POST using a REST interface. In this blog post, I’m going to extend the DogTag application I blogged about in DogTag App in 30 minutes – Part 1: Node.js and MongoDB and show you how you can use RabbitMQ to push messages into MongoDB using Node.js on the server and a simple Groovy client as the publisher of those messages.

What is RabbitMQ?

RabbitMQ provides robust messaging that is easy to use and supported by a large number of developer platforms as well as operating systems. Its low barrier to entry makes it quite suitable to use with node.js as you will be amazed at the remarkable little code that is required to establish a RabbitMQ exchange hosted within a node.js application. RabbitMQ is similar to JMS in that it supports many of the same paradigms you may have seen in this or other messaging platforms (e.g., Queues, Topics, Routing, Pub/Sub, RPC, etc.). Primary language support includes Python and Java, but there are so many others supported as well.

In this blog post, I’ll be using a NPM package aptly named node-amqp. AMQP stands for “Advanced Message Queuing Protocol”. AMQP is different from API-level approaches, like JMS, because it is a binary wire-level protocol that defines how data will be formatted for transmission on the network. This results in better interoperability between systems based on a standard supported through the Oasis standards body. AMQP started off as a beta but released a 1.0 version in 2011 and has since grown in popularity (see here for comparison).

RabbitMQ and Node.js

In example I’m about to show you, I’m basically providing a simple means to batch several messages to save new “dog tags” in the DogTags application I referenced earlier. I am going to post the messages to a queue and have the server save them into the database. Let’s take a look at the JavaScript code necessary to do this in node.js:


    var amqp = require('amqp');
    var conn = amqp.createConnection({url: mq_url});
    conn.on('ready', function () {
        var exchange = conn.exchange('queue-dogtag', {'type': 'fanout', durable: false}, function () {
            var queue = conn.queue('', function () {
                console.log('Queue ' + exchange.name + ' is open');
                
                queue.bind(exchange.name, '');
                queue.subscribe(function (msg) {
                    console.log('Subscribed to msg: ' + JSON.stringify(msg));
                    api.saveMsg(msg);
                });
            });
            queue.on('queueBindOk', function () {
                console.log('Queue bound successfully.');
            });
        });
    });

On lines 1-2, you can see how I’m including the amqp package from NPM and establishing a connection with a provided URL (e.g., amqp://guest:guest@localhost:5672). Once the connection is established we create what’s called an exchange on line 4. Here you define the type of exchange, the name and whether it is durable or not. For further details on this and other settings, I will refer you to the docs here. You may be asking why not just send directly to a queue? While this is possible, it’s not something that is practiced given the messaging model of RabbitMQ; namely, that you should never send any messages directly to a queue. Instead, the producer should send messages to an exchange which knows exactly what to do with them. In my case, I chose “fanout” as my exchange type, which simply broadcasts all the messages to all the queues it is aware of, which in my case is just the one queue defined on line 5. Other exchanges you may want to look into include Direct, Headers and Topic exchanges which are described in detail here.

Once I have defined the exchange and the queue, it is time to bind the queue to the exchange name and subscribe to any messages published to the queue. I do this on line 8-11. It is then a simple matter of processing each message and using the api object I’ve established to save the message to MongoDB and to record an appropriate log message. As an added measure, I also think it is worthwhile to see if the queue is bound to the exchange prior to any processing and report the results the log. I do this on lines 14-16.

The Groovy Publisher

So now that we have the server-side component in place ready to subscribe to any messages, we need to create a publisher to publish messages to the exchange. To do this, I chose to write a CLI tool in Groovy and leverage Spring MQ to publish the message. Groovy is great language to create CLI tools and since it is maintained by the folks at SpringSource, it was just a natural fit to use the SpringMQ framework which can generate AMQP messages without much fuss. Of course, as a prerequisite, you’ll have to grab the latest Groovy binaries so you can run the script. Let’s take a look at the code:

@Grab(group = 'org.springframework.amqp', module = 'spring-amqp', version = '1.1.1.RELEASE')
@Grab(group = 'org.springframework.amqp', module = 'spring-rabbit', version = '1.1.1.RELEASE')
@Grab(group = 'com.rabbitmq', module = 'amqp-client', version = '2.8.7')
@Grab(group = 'org.codehaus.jackson', module = 'jackson-mapper-asl', version = '1.9.9')

import org.springframework.amqp.rabbit.connection.*
import org.springframework.amqp.rabbit.core.RabbitTemplate
import org.springframework.amqp.support.converter.JsonMessageConverter

class RabbitMqUploader {
    static void main(args) {
        println("Starting...")
        def cli = new CliBuilder(usage: 'uploader.groovy [filename.csv]')

        def factory = new CachingConnectionFactory();
        factory.setUsername("username");
        factory.setPassword("password");
        factory.setVirtualHost("ve2b460e7e6b24b5da88c935ff63c4e86");
        factory.setHost("hostname")
        factory.setPort(10001)

        RabbitTemplate templ = new RabbitTemplate(factory);
        templ.setExchange("queue-dogtag");
        templ.setMessageConverter(new JsonMessageConverter());

        def options = cli.parse(args)
        if (!options) {
            // Means no parameters were provided
            cli.usage()
            return
        }

        def extraArguments = options.arguments()
        def filename = ''
        if (extraArguments) {
            if (extraArguments.size() > 0) {
                filename = extraArguments[0]
            }
            //load and split the file
            new File(filename).splitEachLine("\n") { row ->
                def field = row[0].split(',')
                def dt = new DogTag()
                dt.name = field[0]
                dt.description = field[1]
                dt.longitude = field[2]
                dt.latitude = field[3]
                dt.phone = field[4]
                templ.convertAndSend(dt)
                println('Sent >>>' + row)
            }
        }

        factory.destroy()
    }
}

So after our discussion in the previous section, you can follow the code here to see how simple it is to publish a message to the RabbitMQ exchange we set up. One thing I simply love about Groovy is that I don’t need to grab any dependencies or build any jars or define any classpaths. For dependencies, I simply use the @Grab annotation to pull those required for SpringMQ, RabbitMQ and the JSON parser down to the client so that they can run the groovy script without any set-up. In lines 15-20, I set-up the configuration to point to my host (which I recommend using a free aPaaS like OpenShift or Nodejitsu to host your Node.js app). Lines 22-24, I define the RabbitMQ template based on the config, set the exchange name and establish a JSON converter since MongoDB likes JSON format. Finally, in lines 34-55, I parse a CSV file provided as a command line parameter and send each line as an AMQP message to our server.

To run this program, I simply run: groovy RabbitMqUploader.groovy ./dogtags.csv.

Conclusion

With a few simple changes to my original application, it was easy to augment it to support RabbitMQ and define an exchange to consume messages. With Groovy, it was easy to write an autonomous, concise client to establish a connection to my node.js server and broadcast messages over the AMQP wire protocol for the server to process. I expect to use this approach in the future where perhaps a REST interface may not be best suited and using well established messaging paradigms make more sense. I encourage you to explore this interesting messaging protocol for your future needs as well.

QConSF 2012: Mobile Web Tooling

Pete LePage (@petele) gives a great overview of all the different tools you should consider when embarking on mobile web application development.

Sublime – text editor for Mac

Codekit
  • Minify, compiles
  • live browser reloads
  • JSHint, JSLint
  • Watches all files in direcotry for reloads

Great sites for good design patterns for mobile dev:

Great HCI guidelines from Android, iOS, Safari, etc. can be found on their respective websites

Start with boilerplates from SenchaTouch, bootstrap, and maybe even jqueryMobile

github.com/ftlabs/fastclick for improving perfomance of button clicks (usually delayed by 300ms for pinch zoom, slide, etc.)

To debug, jsconsole.com can be used for remote debugging on other devices and will show stuff on site….coolio.

Hammer.js is cool for jestures to listen to those events

brian.io/lawnchair good to store some transient data by abstracting away for IndexDB, WebSQL or LocalStorage

@media, cool for high density resolution or using CSS -webkit-image-set for both 1x and 2x image densities for hi-rez icons

CharlesProxy good to test network latency or Network Link Conditioner on Mac

Chrome Dev Tools have cool features to override geolocation, user agent, and device orientation

Android and Safari remote debugging possible but need USB cable to connect.

m.chromeexperiments.com to show off your skills.

new.mcrbug.com for filing any bugs

User Experience is x-axis and Platforms is y when trying to decide whether to go native or not, lower-right native, upper-left is hybrid. Left of line is better for hybrid because of hardware acceleration and browser improvements.

PhoneGap Perf Tips:
  1. Keep small footprint by limiting libs
  2. Use touch events, not click events
  3. Use hardware accelerated CSS transitions
  4. Avoid network access and cache locally.

zepto.js is a lightweight version of jquery that doesn’t worry about legacy browsers

Phonegap API explorer is a reference application to look at from the AppStore
Checkout PhoneGap Build to target up to 6 platforms without having to load XCode, Eclipse, etc to build all the individual platforms by hand.

QConSF 2012: Real Time MVC with Geddy

Daniel Erickson from yammer presents on mixing MVC with Real Time apps.

MVC is here to say because…

  • It provides structure to a project
  • Allow people to easily jump into a project and multiple people at that
  • Easier to get started with

Realtime is great for:

  • Getting instant feedback from your users
  • Works well on mobile (Voxer)
  • WebSockets over long/short polling

How can MVC and Real Time can be mixed?

Enter Geddy…if you know Rails, NoSQL/SQL ORM layers and templating languages than Geddy is a natural progression. PostGres, Riak, MongoDB adapters; EJS and Jade TL’s. Supports RT OOTB.

Geddy gives you a reach ORM/MVC framework that you can readily work into your Node.js applications.  I’ll have to take a look when I get back!

QConSF 2012: Graph Database Overview from Emil Eifrem

@emileifrem gives a great introduction on NoSQL databases, especially graph databases.  He also gives a quick overview on use cases for graph databases and highlights some of the benefits of using Neo4j in this endeavor.  Here are my notes from this session:

Trends in BigData & NoSQL:

  • Increasing data size
  • Increasing connected data
  • Semi-structured data
  • Architecture – a facade over multiple services

Categories of NoSQL

Key/Value store (heritage from Amazon Dynamo) –

  • Riak, Redis, and Voldermort are implementation examples
  • Strengths is that it’s a simple but also weakness

Columnar Stores

  • BigTable – every row can potentially have it’s
  • Examples are HBase and Cassandra, Hyper Table
  • Supports semi-structured data but weakness is nested or connections

Document Database

  • Collections of documents, JSON that could potentially be nested
  • 90% uptick on NoSQL in MongoDB but also CouchDB
  • Strength is simplicity of database, but hard to do connected data

Graph Database

  • Nodes and Relationships
  • Examples are Neo4j, InfiniteDB, OrientDB, etc.
  • Great for connectiveness of data and complexity but harder to scale to size

Graph databases can provide statistical data on how likely a node is related to other nodes.  Relationships are first class citizens in graph databases to give color to how nodes are related.  Nodes and relationships have simple key/value pair types.  Indexes are also available.  Types are being considered for post production control of data.

Speed comparison for social graph database example from presentation shows that MySQL is 2000ms and Neo4j is 2ms.  For 2k to 1mil people if they’re connected is still 2ms.  In SQL, JOIN clause explosion due to combinatorial issue.  Neo4j can visit about 1-2mil nodes/per second.

Graph queries:

Cypher gives you higher abstraction and ease of use, but slower performance. It uses graph patterns to defines nodes and relationships represented like:

START A=node:person(name=”A”)

MATCH (A) – [:LOVES] -> (B)

RETURN B as lover

pattern-matching query language, declarative grammer and aggregation, ordering, limits…tabular results

Neo4j is fully ACID compliant as opposed to eventual consistency like most other NoSQL systems

QConSF 2012: MongoDB to Scale

Kenny Gorman presents on the gotchas with MongoDB and how to scale it appropriately. 

Scale = very big, busy, tricky system

Denormalize till it works to get to true scale; split vertically and then horizontally

Scaling today is focused on rapid development

MongoDB is an OPS nightmare

built-in horiz. scaling and replication 65% deployments are in the cloud

MongoDB challenges:

  • Course lock scope; 2.2 has DB lock scope; 2.0 was global
  • Visibility
  • Schema – still need to business stuff
  • When bad things happen…it goes down fast

Design for scale:

  • Denormalize into diff. DB’s
  • Tune your workloads
  • NoORM; dump it
  • Shard early (Think about your shard key in advance)
  • Replicate (How to get availability)
  • Load test pre-prod

Other thoughts:

  • Embed vs. not
  • Indexing the right amount (btree index 10% drop in performance for every index added; covered indexes are cool)
  • Atomic ops; use them! (set, pop, push)
  • Use profiler and explain(); you can mine the profiler collection for optimizing

Shard Keys:

  • Tradeoffs – uses range-based shard key
  • Local vs. Scattered
  • Figure this out at design time

Architecture:

  • Engage all processors
  • Replication is election based and has fault zones so you can plan; “shell game” – rebuild slaves due to fragmentation
  • Client connections, getLastError

I/O

  • Need fast response time from disk when passivating for large DBs

Perform work on slave and rotate that back in for the primary to do the “shell game” 

Add arbitar to replicaSets so it prevents replica set down from voting for itself error

Random partitioning tips:

  • Monitor elections and who is the primary
  • Write scripts to kill sessions
  • automate or die
  • mongostat (like iostat)
  • historical performance and serverstatus to get histogram of perf. (or just use MMS)

Gotchas:

  • Logical schema corruption – use docs to manage it across teams especially types
  • Locks are DB so split DB’s if required; 50% lock status is considered slow
  • Not enough I/O perf
  • Engage all processors (use test harness and optimize for scaling)
  • Visibility (save data over time; monitoring; MMS, etc.)
  • Not understanding how MongoDB works; use it when it makes sense
  • Don’t believe the FUD

@kennygorman; kgorman@objectrocket.com 

rocketstat a good replacement for mongostat

Mobile GeoLocation App in 30 minutes – Part 3: Deploying to OpenShift

Overview

In the previous two parts, you saw how to create a Node.js application and store JSON data in MongoDB (Part 1). You also saw how to present this data in a mobile Web application using Sencha Touch complete with Google Maps and geospatial searching (Part 2). Now, we are going to take both of these two pieces and deploy them on a Platform as a Service (PaaS). I looked at three different provides in this space: OpenShift, Heroku and Nodejitsu. The latter is Node.js specific while the other two you can deploy other types of applications (Java EE, Ruby, PHP, Python, etc.). I really liked RedHat’s OpenShift model. It has several nice dev features that I thought worked well which I will go over in this post. Nodejitsu worked well too but didn’t use Git (a distributed SCM tool) to manage changes. Heroku required a credit card at the point I wanted to introduce MongoDB, so I opted not to continue with Heroku. It doesn’t hurt to experiment with multiple vendors since you can sign-up for free accounts on the respective sites.

One of the main motivations to use a PaaS is that it wouldn’t be very maintainable or production worthy SSH’ing to a server and running your node process every time you wanted to launch your app. Moreover, even if you run your process, it eventually will shutdown when you close your SSH session. There is an npm package forever that can keep this from happening, but I think running Node on a PaaS is ideal.

Using OpenShift to create a Node App

The first thing you’ll want to do is create a free OpenShift account here. Once you have an account created, you can create the application directly from the website (very convenient) or you can install Ruby and the RHC gem (sudo gem install rhc) that allows you to do the same thing from command line. The website is pretty intuitive so I won’t cover this and should you want to use the CLI, the set-up is sufficiently covered here. Note that there are a few prerequisites to use the CLI, namely installed Ruby and Msysgit. Be sure to have those installed as per the provided link before continuing.

For the purposes of deploying the app we created in Part 1, you would do the following after you’ve established your OpenShift domain and added your SSH public keys to commit code from Git:

First, we create “dogtag” as a nodejs app:

$ rhc app create -a dogtag -t nodejs-0.6

Second, we add the MongoDB cartridge:

$ rhc app cartridge add -a myMongo -c mongodb-2.0

Last, we add the RockMongo cartridge to administer MongoDB from the browser (context URI: /rockmongo from your domain)

$ rhc app cartridge add -a myMongo -c rockmongo-1.1

Working with your application in OpenShift

After you have created the app, the next part is copy the DogTag Node.js code to the “dogtag” directory that was created by OpenShift. Since Git is used to manage changes, you can run the following commands to push the code to the server:

$ git add .

$ git commit -am ‘Initial check-in’

$ git push

What will happen then is that the code changes will push to the server, stop the node.js instance, deploy your code and restart the server…very cool! It will pick-up any NPM packages you need as long as you define them in the package.json file (in our case, we need express and mongoose). After that you can go to the provided URL from OpenShift and test the various URL routes we defined in Part 1 (e.g., http://nodetest-loutilities.rhcloud.com/dogtag). You’ll notice that there is no data, so I created a JSON file that you can import into MongoDB to have some initial data. To do the import, you need to run the following command when you SSH to your server’s node:

$ mongoimport –host $OPENSHIFT_NOSQL_DB_HOST –port $OPENSHIFT_NOSQL_DB_PORT –username $OPENSHIFT_NOSQL_DB_USERNAME –password $OPENSHIFT_NOSQL_DB_PASSWORD –db dogtag –collection dogtags –file dogs.json –jsonArray

Note the $OPENSHIFT* provided constants so that you don’t have to figure out the actual host, port, username and password you are on which is especially helpful in a cloud environment running on multiple nodes. Once you import, you can check the files and add a geospatial index, which is required for the geospatial queries as follows:

$ mongo
> use dogtag
> db.dogtags.find() //will return results
> db.dogtags.find().forEach( function (e) { e.coords = [ e.longitude, e.latitude ]; db.dogtags.save(e); }); //will add the required format for a coordinates field to index
> db.dogtags.ensureIndex( { coords : “2d” } ) //indexes the coords field for geospatial queries
> exit

Now, if you go to the URL again you will have JSON data returned in a JSONP callback. You can change the callback simply by adding &callback=[your callback name]. This is necessary to prevent cross-site scripting issues and required when using the JSONP proxy in Sencha Touch.

Deploying the Mobile App

Since the Mobile App was created with Sencha Touch, I just used a “DIY” (Do it yourself) app from OpenShift to deploy my app there. The process is very similar:

$ rhc app create -a dogtags -t diy-0.1

This creates a very basic Web app. You can cd to the “diy” directory that is created and copy the code from the Mobile app there. It will prompt you to replace index.html, which is fine. There is one thing you need to do in the existing code; you will need to update the store’s proxy to point to your “dogtag” node.js app you created in the previous section as well as the Pettracker controller which also has these references. To do that, simply update the URL with the one OpenShift provides and append “dogtag” on the end to get the list as follows:

 proxy: {  
       type: 'jsonp',  
       url: 'http://nodetest-loutilities.rhcloud.com/dogtag',  
       reader: {  
           type: 'json',  
           idProperty: '_id',  
           rootProperty: 'records',  
           useSimpleAccessors: true  
       }  
}  

Lastly, use git to push the updates to the server. It will deploy the new code and restart the server. Launch your app from a Webkit browser or from your mobile device and you should be in business similar to what you see here.

Once the app is deployed to OpenShift, you can tail the logs by running the following:

$ rhc app tail -a <app-name>

You can also start and stop the server as follows:

$ rhc app [start | stop ] -a <app-name>

And to snapshot a particular revision and go back to it you run the following:

$ rhc app snapshot save -a <app-name>

$ rhc app restore -a <app-name> -f <path to snapshot file>

Conclusion

I hope you found this three part series useful. In addition to creating a Node.js app and front-ending it with a Mobile application, you saw how easy it is to deploy such an application to the cloud.

Mobile GeoLocation App in 30 minutes – Part 2: Sencha Touch

Overview

This is the second part of a three part series to show how we can use Sencha Touch (a mobile web JavaScript framework) to create the GUI to integrate with our Node.js REST API that we built in Part 1. As you may recall from the previous post, I’m tracking dogs and storing their coordinates in a MongoDB. Now we will create a mobile app that will list the dogs and when selected will show them on a Google Map relative to your current location. Let’s get started.

Connecting to Node.js using MVC

Connecting to Node or really any REST/JSONP service is fairly trivial. Since Sencha Touch supports MVC on the client side, I will illustrate how to create the application using that approach. This is very useful because it keeps with one of the main programming tenets: Separation of Concerns. In this way we can create concise JavaScript applications that break-up the application along the lines of model, view and controller components. First step is to create the model as follows:

Ext.define('MyApp.model.Pet', {
    extend: 'Ext.data.Model',
    config: {
        fields: [
            {
                name: 'name'
            },
            {
                name: 'description'
            },
            {
                name: 'longitude',
                type: 'float'
            },
            {
                name: 'latitude',
                type: 'float'
            },
            {
                name: 'coords'
            },
            {
                name: 'date',
                type: 'date'
            },
            {
                name: '_id'
            }
        ]
    }
});

Notice above that I’m defining a “Pet” model by extending the Ext.data.Model class and assigning the fields to the model. Additionally, I set types as needed so they aren’t considered strings. Next, we create the store that will populate the model:

Ext.define('MyApp.store.PetTracker', {
    extend: 'Ext.data.Store',
    requires: [
        'MyApp.model.Pet'
    ],
    config: {
        autoLoad: true,
        model: 'MyApp.model.Pet',
        storeId: 'PetTracker',
        proxy: {
            type: 'jsonp',
            url: 'http://localhost:8080/dogtag',
            reader: {
                type: 'json',
                idProperty: '_id',
                rootProperty: 'records',
                useSimpleAccessors: true
            }
        }
    }
});

The PetTracker Store uses a JSONP proxy and binds the output of the service to the model we previously created. Notice, that I simply put the URL to the Node.js URL binding that will list out all dogtags and use a JSON reader to put them in the model.

Binding the Model with the View and Controller

Now that I have the data, where do I put it in the GUI? Sencha Touch provides several mobile GUI widgets that look very much like native mobile apps. In my case, I decided to use the panel with a docked toolbar at the top. Let’s take a look:

Ext.define('MyApp.view.PetPanel', {
    extend:'Ext.Panel',
    alias: 'widget.petListPanel',
    config:{
        layout:{
            type:'fit'
        },
        items:[
            {
                xtype:'toolbar',
                docked:'top',
                title:'Pettracker'
            },
            {
                xtype:'list',
                store:'PetTracker',
                id:'PetList',
                itemId:'petList',
                emptyText: "<div>No Dogs Found</div>",
                loadingText: "Loading Pets",
                itemTpl:[
                    '<div>{name} is a {description} and is located at {latitude} (latitude) and {longitude} (longitude)</div>'
                ]
            }
        ],
        listeners:[
            {
                fn:'onPetsListItemTap',
                event:'itemtap',
                delegate:'#PetList'
            }
        ]
    },
    onPetsListItemTap:function (dataview, index, target, record, e, options) {
        this.fireEvent('petSelectCommand', this, record);
    }
});

In the example above, I create a panel with fit layout (aka, responsive web design) and put a toolbar and list component in it. To get the data in the list, all I have to do is set the store to the store name I created earlier (line 16). On line 22, you can see that I’m using expression language to templatize the output of the store data in each list row (look Ma, no FOR loops!). Finally, we need to create the Controller to respond to events in the code. In the View code, you see on line 36 that I fire the ‘petSelectCommand’ when a list item is tapped. Let’s look at the controller code for this.

Ext.define('MyApp.controller.PetTracker', {
    extend: 'Ext.app.Controller',
    markers: [],
    directionsDisplay: null,
    directionsService: null,
    config: {
        stores: ['PetTracker'],
        refs: {
            petListPanel: 'petListPanel',
            petList: '#PetList',
            petMap: 'petMap',
            radiusPicker: 'radiusPicker'
        },
        control: {
            petListPanel: {
                petSelectCommand: "onPetSelected"
            },
            petMap: {
                backButton: "onBackButton",
                mapRender: "onMapRender",
                nearButton: "onNear"
            },
            radiusPicker: {
                pickerChanged: "onPickerRadiusChange"
            }
        }
    },

    launch: function () {
        // Initialize Google Map Services
        this.directionsDisplay = new google.maps.DirectionsRenderer();
        this.directionsService = new google.maps.DirectionsService();

        var mapRendererOptions = {
            //draggable: true,  //Allows to drag route
            //hideRouteList: true,
            suppressMarkers: true
        };

        this.directionsDisplay.setOptions(mapRendererOptions);
    },

    // Transitions
    slideLeftTransition: { type: 'slide', direction: 'left' },
    slideRightTransition: { type: 'slide', direction: 'right' },

    onPetSelected: function (list, record) {
        var mapView = this.getPetMap();
        mapView.setRecord(record);
        Ext.Viewport.animateActiveItem(mapView, this.slideLeftTransition);


        this.renderMap(mapView, mapView.down("#petMap").getMap(), record.data);
    },

    onBackButton: function () {
        var store = Ext.getStore('PetTracker');
        store.getProxy().setUrl('http://nodetest-loutilities.rhcloud.com/dogtag/');
        store.load();
        Ext.Viewport.animateActiveItem(this.getPetListPanel(), this.slideRightTransition);
    },

    renderMap: function (extmap, map, record) {
        // erase old markers
        if (this.markers.length > 0) {
            Ext.each(this.markers, function (marker) {
                marker.setMap(null);
            });
        } 
        var position = new google.maps.LatLng(record.latitude, record.longitude);

        var dynaMarker = new google.maps.Marker({
            position: position,
            title: record.name + "'s Location",
            map: map,
            icon: 'resources/img/yellow_MarkerB.png'
        });

        this.markers.push(dynaMarker);
    }
});

Covering all that’s inside this controller is beyond the scope of this article and besides there’s already really good articles on this such as the one here. If we pick-up from the event we fired in the view, you can see on line 16 where I’ve bound the onPetSelected() method to the event. On line 47, you can see the implementation of that method where I do the slide transition to the map view panel and then render the map in that panel. Finally, there’s this small piece of code to bootstrap the application and launch it.

Ext.Loader.setConfig({
    enabled: true
});


Ext.application({
    models: [
        'Pet'
    ],
    stores: [
        'PetTracker'
    ],
    views: [
        'PetPanel',
        'MapPanel',
        'RadiusPicker'
    ],
    name: 'MyApp',
    controllers: [
        'PetTracker'
    ],
    launch: function() {
        var petList = {
            xtype: 'petListPanel'
        };
        var petMap = {
            xtype: 'petMap'
        };
        var radiusPicker = {
            xtype: 'radiusPicker'
        };
        Ext.Viewport.add([petList, petMap, radiusPicker]);
    }
});

Conclusion

Well there you have it: how to create a mobile web application integrated with Node.js and Google Maps. You can see this application in action by pointing your mobile device to this site, http://dogtags-loutilities.rhcloud.com/, or internally here. I hope you see the value in implementing client side MVC so that your code is concise and maintainable, especially if multiple developers were to work on it. You can view the complete code base here. Look for the final part of this series where I will show you how easy it is to deploy this application to the cloud using a PaaS like OpenShift, Heroku or Nodejitsu.

Mobile GeoLocation App in 30 minutes – Part 1: Node.js and MongoDB

Overview

Are you interested in learning how to use Node.js with a NoSQL database like MongoDB to create REST services? Would you like to know how to create a mobile app with Google Maps that can perform geospatial queries with ease? Do you want to see how you can deploy all this to a PaaS with the click of a few buttons and a rudimentary understanding of Git? The answers to this and more are part of my three part series where I will cover the details on how you can create a “Pettracker” like application using Node.js, MongoDB, and Sencha Touch hosted on OpenShift. In this first part I will discuss how to build the middle tier and data store with Node.js and MongoDB.

What is Node.js and MongoDB?

Node.js is a JavaScript framework (API to a C library) for creating highly scaling, non-blocking IO applications that does not create multiple process threads for each invocation incurring a lot less O/S overhead under load. This is accomplished using an event-driven programming model that executes an IO bound call but does not block; it continues to the next line of code. It uses a callback mechanism to conclude the operation with the response once the event is complete. JavaScript is perfect to accommplish this because its native support for closures. Node by itself isn’t that interesting, but akin to Maven in the Java world, Node has npm so that you can add modules to it and expand its functionality further to all sorts of different use cases. In this article you’ll see how I leverage npm for working with MongoDB and creating a Web server router. Finally, Node together with Socket.IO can be used to create real-time Web applications (more on this in the future).

MongoDB is a highly scalable, high performance, open source NoSQL database that stores documents (rows in SQL parlance) into collections (tables in SQL) and is maximized for performance using in-memory cache, replica sets and sharding. It has a very nice query API and supports map/reduce functionality found in Hadoop/HBase. One of the pecular things with NoSQL DB’s is that there is no schema defined ahead of time. So other than installing MongoDB, there isn’t much else that you need to do. Given the fact that there’s no schema it doesn’t mean that we don’t have structure to our data. NoSQL allows for a very flexible schemas that can change easily as your application needs changing. Finally, MongoDB has built-in geospatial querying to support “near” and polygon searches. We’ll explore this in detail.

The DogTag Application

As you may be aware, Pettracker (aka Tagg) is a Qualcomm device/application that can be used to track lost Pets. I work for Qualcomm but I didn’t work on this project;however, I thought it would be interesting to see how I could replicate this application using Node.js and MongoDB, since it would be the perfect use case for something that would require high throughput and geospatial searching. Node.js will provide the middleware both to consume the JSON data from the emitters (i.e., the dog tag transponder) and it will provide the REST interface so that the mobile device can locate the dogs or perform geospatial queries of what’s near the user.

The first step to do all this is to create a Node.js application. I used the Express and Mongoose Node modules to allow me to create a website and to interface with MongoDB. You can add these modules using npm to your node install or add them to your package.json. I’ll talk more about this later in Part 3. Also, you can download the code from my GitHub account here.

Let’s take a look at the model.js code:

var mongoose = require('mongoose')
  , Schema = mongoose.Schema;

var dtSchema = new Schema({
    name:String,
    description:String,
    date: {type: Date, default: Date.now},
    longitude: Number,
    latitude: Number,
    coords: [Number, Number]
});

module.exports = mongoose.model('DogTag', dtSchema);

Mongoose is a nice Node module that allows me to work with MongoDB data in JavaScript, so in the code above you can see that I am defining “DogTag” schema (a document in Mongo) and setting fields for what should be stored. Mongoose will take care of the unmarshalling/marshalling the JSON data to MongoDB. Next, I need to define the api for the operations I wish to perform shown as follows:

var DogTag = require('../model/dogtag.js');

exports.save = function(req, res) {
    var dogtag = new DogTag({name: req.params.name, description: req.params.descr,
        longitude: req.params.longitude, latitude: req.params.latitude});
    dogtag.save(function (err) {
        if (err) throw err;
        console.log('Dogtag saved.');

        res.send('Dogtag saved.');
    });
}

exports.list = function(req, res) {
    DogTag.find(function(err, dogtag) {
          res.setHeader('Content-Type', 'text/javascript;charset=UTF-8');
        res.send(req.query["callback"] + '({"records":' +  JSON.stringify(dogtag) + '});');
    });
}

exports.show = (function(req, res) {
    DogTag.findOne({name: req.params.name}, function(error, dogtag) {
        res.send([{Dog: dogtag}]);
    })
});

// first locates a dog by its name
exports.show = (function(req, res) {
    DogTag.findOne({name: req.params.name}, function(error, dogtag) {
        res.send([{Dog: dogtag}]);
    })
});

exports.near = function(req, res) {
    DogTag.find({coords : { $near : [req.params.lon, req.params.lat], $maxDistance : req.params.dist/68.91}}, 
      function (error, dogtag) {    
        res.setHeader('Content-Type', 'text/javascript;charset=UTF-8');
        res.send(req.query["callback"] +'({"records":' + JSON.stringify(dogtag) + '});');
    })
}

This code provides the CRUD operations to work with MongoDB. I used the schema I created previously to create dogtag objects and I use their save method to persist them into MongoDB. It’s that simple! If I need to find all the dogs, I use find(), or to find one dog by name, I use show(petname). Now, the last API, “near”, is doing a geospatial query using $near and $maxDistance against my “coords” field in the database. One thing to note is that you have to tell MongoDB to index fields that contain lat/lon. For details on this, see here, but all you really do is run this command from a mongo prompt: db.[collection_name].ensureIndex({ [field_name]: "2d" }).

One more thing to note is that I need to return JSONP back to the client by providing a callback name. To do that, I need the client to pass my a queryParameter (called “callback”) that will be the name of the callback method in the JSONP that is returned. I also need to call the built-in JavaScript function JSON.stringify to convert the schema object to JSON. You can see this put to use on lines 17 and 38.

The last piece is do build the node server. Yes, that’s right…you’re actually going to create the HTTP server that will respond to the request. Let’s take a look to see how:

var express = require('express');
var mongoose = require('mongoose');
var app = express();

var ipaddr  = '127.0.0.1';
var port    =  8081;

mongoose.connect('mongodb://localhost/dogtag');

app.configure(function () {
    app.use(express.bodyParser());
    app.use(express.methodOverride());
    app.use(app.router);
});

// set up the REST API handler methods are defined in api.js
var api = require('./controller/api.js');
app.post('/dogtag', api.post);
app.get('/dogtag/near/:lon/:lat/:dist?', api.near);
app.get('/dogtag/:name/:descr/:latitude/:longitude?', api.save);
app.get('/dogtag/:name.:format?', api.show);
app.get('/dogtag', api.list);

//  And start the app on that interface (and port).
app.listen(port, ipaddr, function() {
   console.log('%s: Dogtag server started on %s:%d ...', Date(Date.now() ),
               ipaddr, port);
});

As described earlier, I need to tell node that I’m going to use express and mongoose in my application. One line 3, I simply create the HTTP server. On lines 6-9, I use mongoose to connect to the database. On lines 18-23, I configure URL routing to the controller api’s from the previous example. Note how I use colons to express parameters on the URL and then bind them to a given api. Finally, I tell the server to listen on a certain port and host spit out a start message if everything went successful.

So that’s really all there is to create the necessary middleware to support our DogTag application. It takes in JSON data from the emitters to store in MongoDB and then uses the Express framework to create a router to respond to requests for finding dogs whether by name or through geospatial searching. In the next part I’ll explore how we can use Sencha Touch to create a mobile application that will help a user locate their dog. For a little preview of that application, you can go here to experiment with it here.

Use your Maven build to auto deploy to WebLogic 10.3

Ever wonder how you can deploy to WebLogic server using Maven. In the 10.3 version of WebLogic you can use WLDeploy (an ANT task) to do this in Maven using the antrun plugin. The XML is provided below and what I did is wrap this in a profile called “wlsDeploy”. On our CI build, I simply added -PwlsDeploy to the command line set-up so that it would deploy to WebLogic.

In the code below, you simply need to have the WebLogic JAR available to you on the build server, update the path property variable and replace the other variables with properties in your Maven build or add -D parameters to the run command line, especially for things like passwords.

Once complete, your builds will auto deploy to WebLogic based on your build schedule, assuming you’re using a CI build server for continuous integration. Otherwise, you can run this locally as well.

<plugins>
    <plugin>
        <artifactId>maven-antrun-plugin</artifactId>
        <executions>
            <execution>
                <id>local-deploy</id>
                <phase>package</phase>
                <goals>
                    <goal>run</goal>
                </goals>
                <configuration>
                    <tasks>
                        <taskdef name="wldeploy" classname="weblogic.ant.taskdefs.management.WLDeploy">
                            <classpath>
                                <pathelement path="${path.to.weblogic.jar.local}"/>
                            </classpath>
                        </taskdef>
                        <wldeploy action="stop"
                                  name="${deploy.name}"
                                  failonerror="false"
                                  user="${wls.dev.username}"
                                  password="${wls.dev.password}"
                                  verbose="true"
                                  adminurl="http://${wls.dev.hostname}:${wls.dev.port}"
                                  targets="${wls.deploy.target}"/>
                        <wldeploy action="undeploy"
                                  name="${deploy.name}"
                                  failonerror="false"
                                  user="${wls.dev.username}"
                                  password="${wls.dev.password}"
                                  verbose="true"
                                  usenonexclusivelock="true"
                                  adminurl="http://${wls.dev.hostname}:${wls.dev.port}"
                                  targets="${wls.deploy.target}"/>
                        <wldeploy action="deploy"
                                  name="${deploy.name}"
                                  source="${deploy.source}"
                                  remote="true"
                                  upload="true"
                                  user="${wls.dev.username}"
                                  password="${wls.dev.password}"
                                  verbose="true"
                                  usenonexclusivelock="true"
                                  adminurl="http://${wls.dev.hostname}:${wls.dev.port}"
                                  targets="${wls.deploy.target}"/>
                    </tasks>
                </configuration>
            </execution>
        </executions>
    </plugin>
</plugins>