Use node.js LTS for Azure Functions

Did a proof-of-concept for an Azure Function in node.js that uses Redis Cache for session storage as the function runtime is 100% managed and using memory store doesn’t make sense. Once I had this I wanted to play with the local development support but that didn’t work as I was using node.js v. 13 and the Azure tooling only works an LTS version of node.js (meaning 12.x at the time of this writing). To fix I had to uninstall my current version of node.js and switch to v. 12.x as follows (I’m using Homebrew to manage dependencies):

brew uninstall node
brew install node@12

Then I had to update my PATH as the node binary is not in /usr/local/bin anymore but rather in /usr/local/opt/node@12/bin. Once that was done the Azure tooling for local application development worked like a charm.

The proof-of-concept is at https://github.com/lekkimworld/poc-azure-functions-with-session.

Populating the user object with passport.js and Salesforce OAuth

Using passport.js is a great option for doing authentication in node.js applications with great strategies for authenticating through just about anything on the planet including Salesforce. Using passport.js with Salesforce involves using the OAuth2Strategy but the user object in the session is not usable really as I really want actual information about the user to be there. The solution I came up with was overriding the userProfile-method and adding a call to the Salesforce userinfo endpoint as shown below.

// configure authentication using oauth2
app.use(passport.initialize());
passport.serializeUser(function(user, done) {
    done(null, user.username);
});
passport.deserializeUser(function(login, done) {
    done(undefined, {
        "username": login
    });
});

OAuth2Strategy.prototype.userProfile = function(accessToken, done) {
    this._oauth2.get(`https://${process.env.SF_LOGIN_URL || "login.salesforce.com"}/services/oauth2/userinfo`, accessToken, function (err, body, res) {
        if (err) { return done(new InternalOAuthError('Failed to fetch user profile', err)); }
        try {
            let json = JSON.parse(body);
            let profile = {
                "provider": "Salesforce.com",
                "username": json.preferred_username,
                "name": json.name,
                "email": json.email,
                "firstname": json.given_name,
                "lastname": json.family_name,
                "payload": json
            };
            
            done(null, profile);
        } catch(e) {
            done(e);
        }
    });
}

passport.use(new OAuth2Strategy({
        authorizationURL: `https://${process.env.SF_LOGIN_URL || "login.salesforce.com"}/services/oauth2/authorize`,
        tokenURL: `https://${process.env.SF_LOGIN_URL || "login.salesforce.com"}/services/oauth2/token`,
        clientID: process.env.SF_CLIENT_ID,
        clientSecret: process.env.SF_CLIENT_SECRET,
        callbackURL: process.env.SF_CALLBACK_URL
    },
    function(accessToken, refreshToken, profile, cb) {
        cb(undefined, profile);
    }
));

The interesting piece is really the code in bold where I inject a call to /services/oauth2/userinfo to get information about the user and then add that as the user object.

Of course after having done all this I found passport-salesforce which is a strategy that does exactly the same thing – duh!!! Anyways it was fun to code it up.

kafka-node with Heroku Kafka

The other day I was building a demo for a customer around Heroku and Heroku Kafka. My language of choice these days is node.js so I needed a Kafka library for node.js and I settled on kafka-node. Now I needed to figure how to use the library using the environment variables provided by Heroku to access my Kafka cluster.

After having bound my Kafka add-on to the app I had 5 environment variables added to my app. The variables are well explained in the Heroku DevCenter but how to use them with kafka-node had me spend some time on it. The root cause seemed to be that the name in the certificate presented by the Kafka brokers did not match the name in the certificate provided to me by Heroku. Since the library uses the node.js TLS module under the covers the solution was to implement some of the verification login myself using the checkServerIdentity method. In the method I verify the cryptographic fingerprint of the root issuer certificate as provided by Heroku (KAFKA_TRUSTED_CERT) with that of the issuing certificate of the Kafka broker. I also had to remove the “kafka+ssl://” part from the Kafka broker URL’s. I do that using regular expressions.

Below is the code. YMMW.

const kafka = require("kafka-node");
const x509 = require('x509');
const Client = kafka.KafkaClient;

const kafkaHosts = process.env.KAFKA_URL.replace(/kafka\+ssl:\/\//gi, "");
const kafkaCert = x509.parseCert(process.env.KAFKA_TRUSTED_CERT);

const options = {
  "key": process.env.KAFKA_CLIENT_CERT_KEY,
  "cert": process.env.KAFKA_CLIENT_CERT,
  "ca": [process.env.KAFKA_TRUSTED_CERT],
  "checkServerIdentity": (host, cert) => {
    if (kafkaCert.fingerPrint === cert.issuerCertificate.fingerprint) return undefined;
    return Error('Not authentic')
  }
}

module.exports = {
    "client": () => {
        return new Client({
            "kafkaHost": kafkaHosts,
            "sslOptions": options
        });
    },
    "topic": `${process.env.KAFKA_PREFIX}safe-habour`
}

Using SalesforceDX to perform Bulk API operations

As noted the other day (Using SalesforceDX to automate getting Apex class test coverage percentages) SalesforceDX is great for many things but one of the ways is automate some operations that are time consuming or just takes a lot of manual work each time. One of these things are Bulk API operations which in of by themselves are not hard but there is no UI for them besides the DataLoader and no console API when using the DataLoader when not on Windows.

The customer I’m working for currently has a monster data load to perform and one of the things I’ve done is writing  script to split the data into data sets. One set per country – 91 sets all in all. All sets consists of 3 files to support the data load. One file for Accounts and two additional files for custom objects that needs to be loaded as well. All in all it’s a lot of clicking in the DataLoader and it doesn’t really scale for testing. That’s a lot of clicking in DataLoader when testing.

But I’m lucky as SalesforceDX receives new functionality all the time and at some point some data Bulk API features had snuck by me so I was pleasantly surprised to discover force:data:bulk:upsert and force:data:bulk:delete today. They we just what I needed. SalesforceDX to the rescue yet again…

So today I grabbed by IDE by the horns (#vscode in my case) and wrote some wrappers around the Bulk API capabilities of SalesforceDX. The fact that all SalesforceDX commands takes an optional –json argument makes it easy to script and parse responses. This combined with select-shell from npm I now have a nice CLI interface to doing Bulk data loads. The script looks as the available data sets and asks me what country to load data for and then what export timestamp to process (the data sets may exists in multiple versions). Then it goes and does its thing UPSERTing all 3 times in turn and reports status. So nice. The Bulk API is asynchronous so  the script also handles polling for job status and only proceeds once the job has completed successfully.

$ ./upsert mheisterberg@example.com.appdev
SFDX - Org for mheisterberg@example.com.appdev is connected...

Select country code:
 ae
 au
 ca
 cn
 es
 fr
 hk
 hu
 co
 ▸ it
 jp
 kr
 my
 pt
 sg
 th
 tr
 tw
 us

Select timestamp:
 2018-04-16T07:25:37Z
 2018-04-16T08:31:28Z
 ▸ 2018-04-16T08:34:14Z

Will process following data
Country : it
Timestamp: 2018-04-16T08:34:14Z
UPSERT for Account data...
Issued UPSERT bulk request to object (Account) - id 7516E000002DckQQAS, jobId 7506E000002QQ3zQAG - state: Queued
SFDX - asking for bulk status for id 7516E000002DckQQAS, jobId 7506E000002QQ3zQAG
SFDX - received bulk status for id 7516E000002DckQQAS, jobId 7506E000002QQ3zQAG - state: Completed
Issued UPSERT bulk request to object (MarketRelation__c) - id 7516E000002DckkQAC, jobId 7506E000002QQ4JQAW - state: Queued
SFDX - asking for bulk status for id 7516E000002DckkQAC, jobId 7506E000002QQ4JQAW
SFDX - received bulk status for id 7516E000002DckkQAC, jobId 7506E000002QQ4JQAW - state: Completed
Issued UPSERT bulk request to object (Consent__c) - id 7516E000002DckuQAC, jobId 7506E000002QQ4OQAW - state: Queued
SFDX - asking for bulk status for id 7516E000002DckuQAC, jobId 7506E000002QQ4OQAW
SFDX - received bulk status for id 7516E000002DckuQAC, jobId 7506E000002QQ4OQAW - state: Completed
Finished upsert of data

Once I’m done testing a particular data set I can use the –delete-accounts flag to my script to delete data using the Bulk API as well. Here I actually combined force:data:soql:query and force:data:bulk:delete to first retrieve the ID’s of the records I need to delete and then kick off the required Bulk API delete requests. Again easy peasy. And repeatable…

$ ./upsert mheisterberg@example.com.appdev --delete-accounts
SFDX - Org for mheisterberg@example.com.appdev is connected...

Are you sure?
 ▸ No
 Yes

Received 32463 records
Issued DELETE bulk request to object (Account) - id 7516E000002DckLQAS, jobId 7506E000002QQ3uQAG - state: Queued
SFDX - asking for bulk status for id 7516E000002DckLQAS, jobId 7506E000002QQ3uQAG
SFDX - received bulk status for id 7516E000002DckLQAS, jobId 7506E000002QQ3uQAG - state: InProgress
SFDX - asking for bulk status for id 7516E000002DckLQAS, jobId 7506E000002QQ3uQAG
SFDX - received bulk status for id 7516E000002DckLQAS, jobId 7506E000002QQ3uQAG - state: Completed
Performed delete...

Only issue I had here really was that node.js has a maximum buffer size of 200kb to stdin so I could not simply read the stdin response from the SOQL query as it may be pretty big. Instead I pipe to a tmp-file and read that back in and parse as JSON. Not ideal but it gets the job done.

The code of the script itself is for the customers eyes only but the source for the helpers is available as sfdx-bulk-helper on Github and sfdx-bulk-helper on npm.

YMMV!

 

Loving streams in node.js

Node.js is a great platform for writing scripts. Besides being Javascript and besides having access to npm it lends it very well to data processing as it’s completely async unless you specifically tell it not to be. One of the best aspects in my opinion about node.js as a data processing language is the concepts of streams and using streams to process data. Using streams can drastically lower the memory consumption by processing data as it comes down the stream instead of keeping everything in memory at any one time. Think SAX instead of DOM parsing.

In node.js using streams is easy. Basically data flows from Readable streams to Writable streams. Think producers of data and consumers of data. Buffering is handled automatically (at least in the built in streams) and if a down stream consumer stops processing the upstream producer will stop producing. Elegant and easy. Readable streams can be stuff like files or network sockets and Writeable streams stuff like files or network sockets… Or stdout which in node.js also implement the Writable stream API. Working with streams is like being a plumber so piping (using the pipe method) is how you connect streams.

An example always helps – the below example reads from alphabet.txt and pipes the data to stdout.

const fs = require('fs')
const path = require('path')

fs.createReadStream(path.join(__dirname, 'alphabet.txt'))
  .pipe(process.stdout)
> a
> b
> c

Simple example but works with small and big files without too much of a difference in memory consumption.

Sometimes processing is required and for this we use Transform streams (these are basically streams that can read and write). Say that we want to uppercase all characters. It’s easy by piping through a Transform stream and then on to the Writable stream (stdout):

const {Transform} = require('stream')
const fs = require('fs')
const path = require('path')

fs.createReadStream(path.join(__dirname, 'alphabet.txt'))
  .pipe(new Transform({
    transform(chunk, encoding, callback) {
      // chunk is a Buffer 
      let data = chunk.toString().toUpperCase() 
      callback(null, data) 
    }
  }))
  .pipe(process.stdout)
> A
> B
> C

It’s easy to see how streams are very composeable and adding processing steps are easy. the pipe could even be determined at runtime. The above examples use strings but streams can also work on objects if required.

Streams are beautiful but can take some time to master. I highly recommend reading up on streams and start getting to know them. The “Node.js Streams: Everything you need to know” post is very nice and provides a good overview.

Happy coding!

 

JSONata looks very nice

While JSON is a very nice and concise data format it lacks the structure and query capabilities of XML and XPath. Often times querying JSON leads to line on line of code to do proper error checking and retrieve the proper value and – if need be – a default value. Meet JSONata! JSONata is a query language plus so much more. I invite you to look at the slides from the recent IBM tech talk on the matter or visit the JSONata Exerciser to try it out.

JSONata is also available as a NPM module.

Writing command line scripts with node.js

Found this little tip this morning to make it easier to use command line scripts written in node.js. Instead of having your node.js file(s) and invoking it using “node myfile.js” on the Mac you can simply do the following:

  1. At the top of the file as the first line add: #!/bin/usr/env node
  2. Make the file executable using chmod +x myfile.js
  3. Invoke away

Now the file is usable by simply using myfile.js.