Using SalesforceDX to perform Bulk API operations

As noted the other day (Using SalesforceDX to automate getting Apex class test coverage percentages) SalesforceDX is great for many things but one of the ways is automate some operations that are time consuming or just takes a lot of manual work each time. One of these things are Bulk API operations which in of by themselves are not hard but there is no UI for them besides the DataLoader and no console API when using the DataLoader when not on Windows.

The customer I’m working for currently has a monster data load to perform and one of the things I’ve done is writing  script to split the data into data sets. One set per country – 91 sets all in all. All sets consists of 3 files to support the data load. One file for Accounts and two additional files for custom objects that needs to be loaded as well. All in all it’s a lot of clicking in the DataLoader and it doesn’t really scale for testing. That’s a lot of clicking in DataLoader when testing.

But I’m lucky as SalesforceDX receives new functionality all the time and at some point some data Bulk API features had snuck by me so I was pleasantly surprised to discover force:data:bulk:upsert and force:data:bulk:delete today. They we just what I needed. SalesforceDX to the rescue yet again…

So today I grabbed by IDE by the horns (#vscode in my case) and wrote some wrappers around the Bulk API capabilities of SalesforceDX. The fact that all SalesforceDX commands takes an optional –json argument makes it easy to script and parse responses. This combined with select-shell from npm I now have a nice CLI interface to doing Bulk data loads. The script looks as the available data sets and asks me what country to load data for and then what export timestamp to process (the data sets may exists in multiple versions). Then it goes and does its thing UPSERTing all 3 times in turn and reports status. So nice. The Bulk API is asynchronous so  the script also handles polling for job status and only proceeds once the job has completed successfully.

$ ./upsert mheisterberg@example.com.appdev
SFDX - Org for mheisterberg@example.com.appdev is connected...

Select country code:
 ae
 au
 ca
 cn
 es
 fr
 hk
 hu
 co
 ▸ it
 jp
 kr
 my
 pt
 sg
 th
 tr
 tw
 us

Select timestamp:
 2018-04-16T07:25:37Z
 2018-04-16T08:31:28Z
 ▸ 2018-04-16T08:34:14Z

Will process following data
Country : it
Timestamp: 2018-04-16T08:34:14Z
UPSERT for Account data...
Issued UPSERT bulk request to object (Account) - id 7516E000002DckQQAS, jobId 7506E000002QQ3zQAG - state: Queued
SFDX - asking for bulk status for id 7516E000002DckQQAS, jobId 7506E000002QQ3zQAG
SFDX - received bulk status for id 7516E000002DckQQAS, jobId 7506E000002QQ3zQAG - state: Completed
Issued UPSERT bulk request to object (MarketRelation__c) - id 7516E000002DckkQAC, jobId 7506E000002QQ4JQAW - state: Queued
SFDX - asking for bulk status for id 7516E000002DckkQAC, jobId 7506E000002QQ4JQAW
SFDX - received bulk status for id 7516E000002DckkQAC, jobId 7506E000002QQ4JQAW - state: Completed
Issued UPSERT bulk request to object (Consent__c) - id 7516E000002DckuQAC, jobId 7506E000002QQ4OQAW - state: Queued
SFDX - asking for bulk status for id 7516E000002DckuQAC, jobId 7506E000002QQ4OQAW
SFDX - received bulk status for id 7516E000002DckuQAC, jobId 7506E000002QQ4OQAW - state: Completed
Finished upsert of data

Once I’m done testing a particular data set I can use the –delete-accounts flag to my script to delete data using the Bulk API as well. Here I actually combined force:data:soql:query and force:data:bulk:delete to first retrieve the ID’s of the records I need to delete and then kick off the required Bulk API delete requests. Again easy peasy. And repeatable…

$ ./upsert mheisterberg@example.com.appdev --delete-accounts
SFDX - Org for mheisterberg@example.com.appdev is connected...

Are you sure?
 ▸ No
 Yes

Received 32463 records
Issued DELETE bulk request to object (Account) - id 7516E000002DckLQAS, jobId 7506E000002QQ3uQAG - state: Queued
SFDX - asking for bulk status for id 7516E000002DckLQAS, jobId 7506E000002QQ3uQAG
SFDX - received bulk status for id 7516E000002DckLQAS, jobId 7506E000002QQ3uQAG - state: InProgress
SFDX - asking for bulk status for id 7516E000002DckLQAS, jobId 7506E000002QQ3uQAG
SFDX - received bulk status for id 7516E000002DckLQAS, jobId 7506E000002QQ3uQAG - state: Completed
Performed delete...

Only issue I had here really was that node.js has a maximum buffer size of 200kb to stdin so I could not simply read the stdin response from the SOQL query as it may be pretty big. Instead I pipe to a tmp-file and read that back in and parse as JSON. Not ideal but it gets the job done.

The code of the script itself is for the customers eyes only but the source for the helpers is available as sfdx-bulk-helper on Github and sfdx-bulk-helper on npm.

YMMV!

 

Loving streams in node.js

Node.js is a great platform for writing scripts. Besides being Javascript and besides having access to npm it lends it very well to data processing as it’s completely async unless you specifically tell it not to be. One of the best aspects in my opinion about node.js as a data processing language is the concepts of streams and using streams to process data. Using streams can drastically lower the memory consumption by processing data as it comes down the stream instead of keeping everything in memory at any one time. Think SAX instead of DOM parsing.

In node.js using streams is easy. Basically data flows from Readable streams to Writable streams. Think producers of data and consumers of data. Buffering is handled automatically (at least in the built in streams) and if a down stream consumer stops processing the upstream producer will stop producing. Elegant and easy. Readable streams can be stuff like files or network sockets and Writeable streams stuff like files or network sockets… Or stdout which in node.js also implement the Writable stream API. Working with streams is like being a plumber so piping (using the pipe method) is how you connect streams.

An example always helps – the below example reads from alphabet.txt and pipes the data to stdout.

const fs = require('fs')
const path = require('path')

fs.createReadStream(path.join(__dirname, 'alphabet.txt'))
  .pipe(process.stdout)
> a
> b
> c

Simple example but works with small and big files without too much of a difference in memory consumption.

Sometimes processing is required and for this we use Transform streams (these are basically streams that can read and write). Say that we want to uppercase all characters. It’s easy by piping through a Transform stream and then on to the Writable stream (stdout):

const {Transform} = require('stream')
const fs = require('fs')
const path = require('path')

fs.createReadStream(path.join(__dirname, 'alphabet.txt'))
  .pipe(new Transform({
    transform(chunk, encoding, callback) {
      // chunk is a Buffer 
      let data = chunk.toString().toUpperCase() 
      callback(null, data) 
    }
  }))
  .pipe(process.stdout)
> A
> B
> C

It’s easy to see how streams are very composeable and adding processing steps are easy. the pipe could even be determined at runtime. The above examples use strings but streams can also work on objects if required.

Streams are beautiful but can take some time to master. I highly recommend reading up on streams and start getting to know them. The “Node.js Streams: Everything you need to know” post is very nice and provides a good overview.

Happy coding!

 

JSONata looks very nice

While JSON is a very nice and concise data format it lacks the structure and query capabilities of XML and XPath. Often times querying JSON leads to line on line of code to do proper error checking and retrieve the proper value and – if need be – a default value. Meet JSONata! JSONata is a query language plus so much more. I invite you to look at the slides from the recent IBM tech talk on the matter or visit the JSONata Exerciser to try it out.

JSONata is also available as a NPM module.

Writing command line scripts with node.js

Found this little tip this morning to make it easier to use command line scripts written in node.js. Instead of having your node.js file(s) and invoking it using “node myfile.js” on the Mac you can simply do the following:

  1. At the top of the file as the first line add: #!/bin/usr/env node
  2. Make the file executable using chmod +x myfile.js
  3. Invoke away

Now the file is usable by simply using myfile.js.