Pages

Monday, November 13, 2017

Dreamforce 2017 Lessons Learned & Takeaways

Lucky me, I got to attend Dreamforce again. Here's a summary.

Misc
  1. "Machine learning" tools are getting point-and-click enough that feeding hundreds of pieces of historical student data (e.g. "days before schoolyear applied," "GPA," "time between inquiry & admission," "time of year applied," etc.) into them and letting a computer look for trends (e.g. "likely to submit a confirming deposit?") isn't all that out of reach. Drag-and-drop tools are coming into prominence that make dimensionality reduction & linear regression problems like this intuitive & easy to play with.
     
  2. There's something called "macros" I should look into more. At a glance, not sure it's anything I can't do a lot faster w/ DemandTools, but worth a glance.
     
Org configuration change management
  1. One of the concepts behind "DX scratch orgs'" mere 7-day lifespan is to encourage you to put the result of all your coding time & effort into a source code version control system.
     
  2. Two guiding philosophies behind development with "DX" are that: 1) you can make all the changes you like by hand in them -- just be sure to use the DX command-line tool to pull those changes down to your computer so you can promptly get them into your version control system, and 2) you should never again be making any changes to configuration in production / normal-sandbox orgs by hand (only via deploying from a copy of whatever you did in a "DX scratch org" into one of those orgs via something like the DX command-line tool).
     
  3. On a related note, I still haven't heard a really admin-geared lecture about DX and "please don't mess around with things by hand in production/staging sandboxes, and please don't deploy anything via change set."
     
  4. Gearset looks like an absolute miracle for shops where those kinds of "no change sets, please" disciplined version-control practices aren't yet in place (e.g. shops transitioning from Salesforce as an "interesting side project" with a no-code solo admin). It's an enterprise-class-(seeming?) tool that simply lets you log into Salesforce and a cloud Git repository and say, "Hey, synchronize my entire org's metadata once a day -- kthxbye." It also seems to be able to facilitate rollbacks. In a small shop where communication lines are open enough that once you know what changed since things were working, sending out a few emails to figure out who changed it and if they could kindly fix it is easy, this "daily snapshot" seems well worth $3600/year. Even after you have "version control" in place -- when it's this easy, what's the harm in a "side repository" of "CYA snapshots?" (Such Git-stored snapshots could also, potentially, be mined by local scripts that transform them into things like SQL commands for dropping & rebuilding tables & SOQL queries for a local daily cache of Salesforce data.) And that's without all the other things Gearset does. (Please stay just $3600/year, Gearset!) Getting a tour from a staffer makes their product look even cooler than their summary on their web site -- apparently they can give you one over the web. *drools* I want.
     
Lightning Experience
  1. Although there's a lot that's still missing from "Lightning Experience" (e.g. "bucketing" & "formulas" in native reporting tools), by the Winter '19 edition, there might be enough LE-only functionality in Salesforce's native reporting tools to justify a switch--for example, something called "joined reports," field-to-field filtering, subfoldering, easier permissions management, and bookmarking of reports. (So far, we haven't found it worthwhile.) I'm hoping to get a lot of data integration projects done & stable over the next few months so I have time to play around with Lightning Experience in a sandbox.
     
  2. Apparently, having a really nice "Lightning Experience" home page for a user doesn't automatically translate into a really nice mobile version of the same page -- that has to be set up separately.
     
  3. There's a thing called the "Lightning Data Service" that might make Lightning Components require almost as little code as Visualforce for simple "put this SOQL query's results on a page" use cases, unlike last year when I took a hands-on training. It also looks like it has something to do w/ getting Lightning Components on the same page to auto-update when data from the database that one of them just edited changes.
     
SOQL query performance & data models
  1. OpportunityContactRole isn't anywhere close to becoming a first-class object (a table against which you can write triggers). *sigh*
     
  2. The "affiliations" package might be replaceable with the "multiple accounts for contacts" feature & a few triggers similar to ones from the "affiliations" package. That said, the "affiliations" package doesn't really seem to be hurting anything. But Pardot and other vendors are working to get on board with the native version of the feature, so it's worth keeping an eye on.
     
  3. Having a single "Generic Account" record for people whose account you don't know the value of slows down most SOQL queries. It improves performance to give everyone their own accounts named after them, as in the HEDA model.
     
  4. Salesforce objects/tables aren't really tables. They're sets of tables (e.g. indexes are a table, fields are a table, etc.). That's why Salesforce doesn't call them "tables" and impacts SOQL query performance. We don't seem to have hit issues yet, but there's also something Salesforce can do with their back-end data model and materialized views for storing your org's data called skinny tables if you ask -- it can speed up SOQL queries.
     
  5. Reminder: "=" is fast in a query that can leverage a highly selective index, "!=" is not -- "!=" always requires a table scan. Query optimization 101, but so easy to overlook when you get busy and don't write queries daily where you have to worry about it.
     
  6. There's a technology called "big objects," paired with "asynchronous SOQL," coming along, but it doesn't yet look useful for us. You've got to be really sure about data not changing, because right now, you can't even drop such a table once it's created.
     
Other programming-related notes
  1. Stay on top of the "metadata in Apex" project, but overall, it's on purpose that you can't change your schema & change your data via the same programming language / execution environment, so don't get too excited.
     
  2. Hopefully coming within a year: the ability to easily make Apex respect CRUD permissions & field-level security permissions (when you don't want your triggers to "play God").
     
  3. If you're a web developer, you can write your Visualforce/Apex to include a Visualforce Boolean variable as a toggle that switches whether Visualforce pages rely on Salesforce-hosted "static resources" (e.g. CSS, JavaScript) or "localhost"-hosted ones. This can make testing a lot easier & less fragile & faster than actually trying to update the contents of Salesforce-hosted "static resources" when you're not yet even sure if they're correct. Talk to Jon Schleicher for more info about how.
     
  4. Via the reporting API (warning: the JSON to parse "is a doozy"), you can wrap Reports from Salesforce's native reporting tool in Lightning Components and therefore make them easy to embed in other pages. (Natively, only dashboards can be drag-and-dropped into Lightning Components.)
     
  5. Google Docs are a form of "cloud-hosted info storage with an API" -- which means that you can include info from them in custom-programmed user interface elements like Lightning Components. Possibly overlooked form of integration of results from a cheap-and-easy aggregate-reporting tool (spreadsheets) with business users' main daily work environment (Salesforce).
     
  6. Personal goal: blast through data integration projects so I can free up time to work on Trailhead modules and learn to be better able to leverage a lot of the technologies I learned about "tips & tricks" for. There's a lot out there that I'd like to do, but at 8 hours apiece, one really has to work to free up the time. But they're so well-written.

Wednesday, October 18, 2017

Recursion-Reduction Tips For Apex Trigger Code

I wanted to share with you some lessons I learned while working on a trigger.


Because a trigger against a "just-saved" record can kick off another "save" operation on that record (typically the point of a trigger), the very same trigger can get re-fired during what's known as the same "execution context."

In Apex trigger programming, it's considered best-practice to make sure that any subsequent re-firings of the same trigger against the same record don't waste valuable CPU cycles when this happens (because Salesforce limits them within an "execution context").

Therefore, when writing a trigger that "does ____ for a just-saved record," it's important to make sure that, at some point, the trigger saves the ID of that record into a "I already did _____ on all of these records" flag that's viewable across all of these recursions (usually a class-level "Static"-flagged "Set<Id>"-typed variable in the trigger handler).

And, of course, you need to program your trigger to pay attention to its own flags and avoid running expensive "consider doing ____" code against records already in that "already-did-____" set of IDs.


The interesting question is: When do you set the "I already did _____" flag?
 

  • In certain cases, one can trust that all field-values contributing to a trigger's yes/no decision of "should I do _____ to this just-saved record?" will be set the moment that the record is first saved.
     
    In those cases, the most efficient place in your trigger code to set the "I already did _____ on this record" flag is "as soon as the trigger has seen the record for the first time, no matter whether it ends up qualifying for 'doing ____' or not."
     
    That's how I usually write my triggers if I can, since it's the most efficient way to write the trigger.
     
     
  • However, in certain cases (often discovered when people test your newly-written trigger and tell you that it fails under normal usage circumstances through 3rd-party record-editing environments like external portals), the values contributing to the answer to "should I do ____ to this record?" change so that the answer goes from "no" to "yes" in the middle of the "execution context."
     
    For example, other "triggers" or equivalent pieces of code do some post-processing to the just-saved record, and it's only after those pieces of code re-save the record that the answer flips to "yes."
     
    In those cases, the most efficient place in your trigger code to set the "I already did ______" on this record" flag is "as soon as the trigger has determined that it needs to do ______ to the record."
     
    This, unfortunately, will make the trigger run "Should I do _____?" checks all the way through the execution context for records that remain "no" throughout. That's why it's less efficient.
     
    But sometimes, it's simply necessary in cases where the answer can flip from "no" to "yes" mid-execution-context.
     
     

If one is comfortable authoring / editing the triggers/processes/workflows that are responsible for such impactful mid-execution-context value-changes, sometimes it's possible to refactor them so that they're simply part of the same "trigger handler" code as the one you're in the middle of writing.
You could precisely control the order of code execution "after a record is saved" and author these actions in a way that ensure there will never be any "mid-execution-context surprise value-changes."
That might let you use the more efficient recursion-reduction pattern instead.

Sometimes, though, there's nothing you can do but choose the 2nd option.

Monday, October 16, 2017

Python for Salesforce Developers: "Just Push Play" Run-Specified-Tests Code Deployment

Warning: the attached Python script makes you put a password straight into the code and helps you make an end-run around all sorts of change-management best practices.

Basically, if you're not a little horrified to see this script publicly shared, you probably don't understand what it's capable of enough to be using it -- so please dont!

That said, for circumstances in which you were going to ignore a lot of change-management issues anyway, or just want to do a "check only" deploy of code to a Salesforce org, etc., this code basically lets you type a handful of classes / pages / etc. into a Python script, type in your username & password, say which tests you want to run and how you want to deploy it, and see the results really quickly.

The idea is to be almost as handy, to a developer, for small changes, as right-clicking on code files in Eclipse and "deploying" from there -- the problem with Eclipse being that it doesn't have a "Run Specified Tests" option.

At some point I might create something similar to this that lets you log into two Salesforce orgs, download a smattering of code from one to your local hard drive (instead of already having to have downloaded it with Eclipse), and proceed with the deploy from there. That really feels like playing with laziness/sloppiness fire, though (somehow seems to take out the "Would I really have deployed this just with Eclipse, anyway?" factor).

This code is probably best just for "checkOnly=TRUE" deploys. For real deploys, it's probably still best to Run All Tests, and using a "Change Set" is a lot better for anyone else who might have to stumble into your org a few weeks later and see what's been happening as far as code deploys. (And that's just a bare minimum of version control for simply-maintained orgs.)

A few notes on the code:

  • Change "#'''" to "'''" around blocks of code to quickly toggle them off (no point, for example, re-logging in if your Python IDE already has your session ID in memory, and you don't want to fire up a new "deploy" just to run the "check deployment status" code again).
  • "thingsToAdd" is where most of the "what you need to type" exists.
  • You'll also need to set "username" & "password" values (including your security token appended to your password) in "toOrgLoginEnvelope" -- I recommend remembering to change it back by saving this script to your hard drive (if you do save it) with a filename that includes something like "INCLUDES PASSWORD" so, as you exit the IDE, you remember to change things back.
  • "inputstr" will need the actual path to where you've used Eclipse to download the files you have "ready to deploy" to on your hard drive.
  • "outpbase" should be somewhere easy to find and delete later, like a sub-folder on your desktop.
  • "deployEnvelope" will need "checkOnly" set inside the XML itself, "testLevel" set just above it, and a (brackets-and-comma-delimited) list of tests to run towards the end of the "runtests" parameter inside the parentheses that follow the XML (if n/a, use "[]").
import os
import shutil
import base64
from xml.etree import ElementTree
import xml.dom.minidom
import requests
import re   

def getUniqueElementValueFromXmlString(xmlString, elementName):
    #Extracts an element value from an XML string.
    #For example, invoking getUniqueElementValueFromXmlString('<?xml version="1.0" encoding="UTF-8"?><foo>bar</foo>', 'foo') should return the value 'bar'.
    elementsByName = xml.dom.minidom.parseString(xmlString).getElementsByTagName(elementName)
    elementValue = None
    if len(elementsByName) > 0: elementValue = elementsByName[0].toxml().replace('<' + elementName + '>', '').replace('</' + elementName + '>', '')
    return elementValue

metadataAPIVer = '40.0'

#'''
toOrgLoginEnvelope = """<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><login xmlns="urn:partner.soap.sforce.com"><username>{username}</username><password>{password}</password></login></soapenv:Body></soapenv:Envelope>""".format(
username='username', password='password')
tor = requests.post('https://login.salesforce.com/services/Soap/u/'+metadataAPIVer, toOrgLoginEnvelope, headers={'content-type':'text/xml','charset':'UTF-8','SOAPAction':'login'})
tosessid = getUniqueElementValueFromXmlString(tor.content, 'sessionId')
tohost = getUniqueElementValueFromXmlString(tor.content, 'serverUrl').replace('http://', '').replace('https://', '').split('/')[0].replace('-api', '')
toorgid = re.sub(r'^.*/([a-zA-Z0-9]{15})$', r'\1', getUniqueElementValueFromXmlString(tor.content, 'serverUrl'))
toapiver = re.sub(r'^.*/([0-9.]+)/[a-zA-Z0-9]{15}$', r'\1', getUniqueElementValueFromXmlString(tor.content, 'serverUrl'))
#'''

# EXAMPLE CODE:  thingsToAdd = {'classes':[''],'pages':['']}
thingsToAdd = {'classes': {'singCaps':'ApexClass','ext':'cls','toUpl':['OpportunityETLHandler','OpportunityETLTest']}, 'pages': {'singCaps':'ApexPage','ext':'page','toUpl':['OpportunityETLPage']}}
inputstr = 'C:\\EXAMPLEFOLDER\\EclipseWorkspace\\My Sandbox\\src\\'
outpbase = 'C:\\EXAMPLETEMPFOLDER\\temppkgtouploadfromeclipse\\'
outpfiles = outpbase + '\\filesbeforezip\\'
outpzip = outpbase + 'uploadme'

#'''
# BEGIN:  Code to create package
if not os.path.exists(outpbase): os.makedirs(outpbase)
if not os.path.exists(outpfiles): os.makedirs(outpfiles)
pkgroot = ElementTree.Element('Package', attrib={'xmlns':'http://soap.sforce.com/2006/04/metadata'})
for folder in thingsToAdd.keys():
    innerDict = thingsToAdd[folder]
    typesElem = ElementTree.Element('types')
    if not os.path.exists(outpfiles+folder+'\\'): os.makedirs(outpfiles+folder+'\\')
    for item in innerDict['toUpl']:
        membersElem = ElementTree.Element('members')
        membersElem.text = item
        typesElem.append(membersElem)
        shutil.copy(inputstr+folder+'\\'+item+'.'+innerDict['ext'], outpfiles+folder+'\\'+item+'.'+innerDict['ext'])
        shutil.copy(inputstr+folder+'\\'+item+'.'+innerDict['ext']+'-meta.xml', outpfiles+folder+'\\'+item+'.'+innerDict['ext']+'-meta.xml')
    namesElem = ElementTree.Element('name')
    namesElem.text = innerDict['singCaps']
    typesElem.append(namesElem)
    pkgroot.append(typesElem)
verElem = ElementTree.Element('version')
verElem.text = metadataAPIVer
pkgroot.append(verElem)
with open(outpfiles+'package.xml', 'w', newline='') as fw:
    dom_string = xml.dom.minidom.parseString(ElementTree.tostring(pkgroot)).toprettyxml(encoding='UTF-8', indent='    ')
    dom_string = '\n'.join([s for s in dom_string.decode('UTF-8').splitlines() if s.strip()]) + '\n'
    fw.write(dom_string)
# END:  Code to create package
#'''

#'''
# BEGIN:  Code to create ZIP
shutil.make_archive(base_name=outpzip, format='zip', root_dir=outpfiles, base_dir='./')
zipString = None
with open(outpzip+'.zip', 'rb') as f: zipString = base64.b64encode(f.read()).decode('UTF-8')
# END:  Code to create ZIP
#'''

# TO DO:  Figure out how to run several tests.  Maybe it's just a comma-separation?  I think it it's that in the point-and-click Change Set UI.

#'''
# BEGIN:  Code to deploy ZIP
testLevel = 'RunSpecifiedTests' # Most common values will be 'RunSpecifiedTests' or 'RunLocalTests'
deployEnvelope = """<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:met="http://soap.sforce.com/2006/04/metadata">
      <soapenv:Header>
         <met:SessionHeader>
            <met:sessionId>{sessionid}</met:sessionId>
         </met:SessionHeader>
      </soapenv:Header>
      <soapenv:Body>
         <met:deploy>
             <met:zipFile>{zipfile}</met:zipFile>
             <met:deployOptions>
                 <met:checkOnly>true</met:checkOnly>
                 <met:rollbackOnError>true</met:rollbackOnError>
                 {runtests}
                 <met:singlePackage>true</met:singlePackage>
                 <met:testLevel>{testlev}</met:testLevel>
             </met:deployOptions>
         </met:deploy>
      </soapenv:Body>
      </soapenv:Envelope>""".format(sessionid=tosessid, zipfile=zipString, testlev=testLevel, runtests=''.join(['<met:runTests>'+x+'</met:runTests>' for x in ['OpportunityETLTest1','OpportunityETLTest2']]) if testLevel=='RunSpecifiedTests' else '')
deploytor = requests.post('https://'+tohost+'/services/Soap/m/'+toapiver+'/'+toorgid, deployEnvelope, headers={'content-type': 'text/xml', 'charset': 'UTF-8', 'SOAPAction': 'deploy'})
# END:  Code to deploy ZIP
#'''

#'''
# BEGIN:  Code to check deploy
checkDeployStatusEnvelope = """<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:met="http://soap.sforce.com/2006/04/metadata">
      <soapenv:Header>
         <met:SessionHeader>
            <met:sessionId>{sessionid}</met:sessionId>
         </met:SessionHeader>
      </soapenv:Header>
      <soapenv:Body>
         <met:checkDeployStatus>
             <met:asyncProcessId>{process_id}</met:asyncProcessId>
             <met:includeDetails>true</met:includeDetails>
         </met:checkDeployStatus>
      </soapenv:Body>
      </soapenv:Envelope>""".format(sessionid=tosessid, process_id=getUniqueElementValueFromXmlString(deploytor.content, 'id'))

checkdeploytor = requests.post('https://'+tohost+'/services/Soap/m/'+toapiver+'/'+toorgid, checkDeployStatusEnvelope, headers={'content-type': 'text/xml', 'charset': 'UTF-8', 'SOAPAction': 'checkDeployStatus'})

#print(checkdeploytor.content)
#print()
print('Id: ', getUniqueElementValueFromXmlString(checkdeploytor.content, 'id'))
print('Done: ', getUniqueElementValueFromXmlString(checkdeploytor.content, 'done'))
print('Success: ', getUniqueElementValueFromXmlString(checkdeploytor.content, 'success'))
print('Status: ', getUniqueElementValueFromXmlString(checkdeploytor.content, 'status'))
print('Problem: ', getUniqueElementValueFromXmlString(checkdeploytor.content, 'problem'))
print('NumberComponentsTotal: ', getUniqueElementValueFromXmlString(checkdeploytor.content, 'numberComponentsTotal'))
print('RunTestResult: ', getUniqueElementValueFromXmlString(checkdeploytor.content, 'runTestResult'))
print('NumberTestsTotal: ', getUniqueElementValueFromXmlString(checkdeploytor.content, 'numberTestsTotal'))
#print(getUniqueElementValueFromXmlString(checkdeploytor.content, 'details'))
# END:  Code to check deploy
#'''

Monday, October 9, 2017

DemandTools MassImpact equivalent of UPDATE...SET...WHERE SQL

The CRMFusion company makes powerful Salesforce-record-editing software called DemandTools.

Below are some screenshots of setting up its "MassImpact" (single-table-editing) module to do a job equivalent to running an "UPDATE...SET...WHERE" DML SQL statement against a traditional database, for data cleansing within a specific single table.

For example, to turn all values of table Contact, field Home_Phone__c that are filled with nothing but 10 digits in a row, no punctuation, into a properly formatted US phone number, you might traditionally use the following Oracle-database-friendly DML SQL:

UPDATE Contact
SET Home_Phone__c = REGEXP_REPLACE(Home_Phone__c,'^(\d{3})(\d{3})(\d{4})$','(\1) \2-\3')
WHERE REGEXP_LIKE(Home_Phone__c,'^(\d{3})(\d{3})(\d{4})$')

In DemandTools, you would set up a MassImpact "scenario" as follows:

  • Step 1: Tell DemandTools that you want to operate on the "Contact" object/table, and say which fields you want to be able to see the values of while you screen go/no-go on potential updates in the 3rd step.

     
  • Step 2: Tell DemandTools that you want to potentially-update all records where "Home_Phone__c" isn't null (unfortunately, you can't use a regular expression in the "WHERE" with DemandTools – but step 3 has some dummy-proofing to get around too wide a selection),
    and say that you want to propose a parentheses-and-dashes-formatted replacement value for any of the returned values that consist of nothing but a string of 10 bare digits in Home_Phone__c.

     
  • Step 3: Ensure that you aren't pushing "UPDATE" calls for any records where there is no change to the value of Home_Phone__c, or where the new value of Home_Phone__c would be blank,
    and skim the records to make sure your logic is doing what you thought it would,
    and click "Update Records."


P.S. Just for geekiness, and to compare ease of use, here's some "Execute Anonymous" Salesforce Apex code along the same idea.
(Note: not tested at scale. Depending on your trigger/workflow/process builder load, might not actually work since it probably all runs in 1 "execution context" of post-DML CPU usage "governor limits," whereas DemandTools will run in truly separate "execution contexts" per 200 records to UPDATE.)

Map<Integer, List<Contact>> csToUpdate = new Map<Integer, List<Contact>>();
Integer csToUpdateCount = 0;
Integer currentBatch = 0;
List<Contact> cs = [SELECT Phone FROM Contact WHERE Phone <> null];
Pattern p = Pattern.compile('^(\\d{3})(\\d{3})(\\d{4})$');
for (Contact c : cs) {
 Matcher m = p.matcher(c.Phone);
 if(m.matches() == true) {
        csToUpdateCount++;
        currentBatch = (csToUpdateCount/200)+1;
        if (!csToUpdate.containsKey(currentBatch)) { csToUpdate.put(currentBatch, new List<Contact>()); }
        c.Phone = m.replaceFirst('($1) $2-$3');
        (csToUpdate.get(currentBatch)).add(c);
 }
}
if (csToUpdate.size() > 0) {
    for (List<Contact> csToU : csToUpdate.values()) {
        UPDATE csToU;
    }
}

Tuesday, September 26, 2017

Python To Facilitate Adding A New Field To Multiple Custom Report Types (involves XML)

Quick code dump: if you download all your "report types" with Eclipse, this can help you put together "fresh" copies of them for use with the Workbench Deploy web site (just zip them up). Very handy if you just created a custom field on an object that's involved in dozens of report types.

This is an example of why it's good to be able to work with XML, too, in Python!

Hopefully I can clean up (or delete & replace) this post at a later date.

import os
from xml.etree import ElementTree
import xml.dom.minidom

def stripNSFromET(etRoot):
    etRoot.tag = etRoot.tag.split('}', 1)[1] # strip all namespaces level 1
    for el in etRoot:
        if '}' in el.tag:
            el.tag = el.tag.split('}', 1)[1] # strip all namespaces level 2
            for el3 in el:
                if '}' in el3.tag:
                    el3.tag = el3.tag.split('}', 1)[1] # strip all namespaces level 3
                    for el4 in el3:
                        if '}' in el4.tag:
                            el4.tag = el4.tag.split('}', 1)[1] # strip all namespaces level 4
                            for el5 in el4:
                                if '}' in el5.tag:
                                    el5.tag = el5.tag.split('}', 1)[1] # strip all namespaces level 5
    return etRoot

def isTableOfInterest(tableTagText, fieldsDatabaseTableAPIName):
    if fieldsDatabaseTableAPIName == 'Contact':
        return tableTagText == 'Contact' or tableTagText.endswith('.Contacts') or tableTagText.endswith('OpportunityContactRoles')
    else:
        return tableTagText == fieldsDatabaseTableAPIName or tableTagText.endswith('.'+fieldsDatabaseTableAPIName)

outpstr = 'C:\\example\\temppackagereporttypes\\'
pstr = 'C:\\examplepath\\EclipseWorkspace\\orgfolder\\src\\reportTypes\\'
fstr = 'C:\\examplepath\\EclipseWorkspace\\orgfolder\\src\\reportTypes\\Contact_With_Campaign_Members.reportType'

metadataAPIVer = '40.0'
fieldsDBTableAPIName = 'Contact'
fieldAPINames = ['New_Field_1__c','New_Field_2__c']
fieldAPINames.sort()

# SOME NOTES OF INTEREST:
# Contact is not only referred to in the normal way, but also often as OpportunityContactRole coming off an "Opportunity."  Not sure if ALL such OCRs are actually able to take Contact fields.
# This could make it a bit difficult to decide into what "Section," exactly, to add a new database field when adding it to every ReportType:
    # A given "Section" tag within a ReportType's XML file is NOT limited to containing "Columns" elements with the same database "Table" value.  For example, Graduate_Admissions.reportType, with a base object of Contact, has a section named "Recruit data" with fields from both the "Contact.Program_Interests__r" and "Contact" tables in it.
    # A given database "Table" value can appear in multiple "Section" tags within a ReportType's XML file (e.g. "Contact" fields could be spread across multiple sections).
    # CampaignMember often ends up repeated -- once as a top-level, once below itself -- in a ReportType where it is included (with fields duplicated and everything).  Good thing I'm not yet adding any CampaignMember fields to reports.

# SOME CODE TO REMEMBER:
# The number of tables mentioned in a ReportType file:  len({e.text for e in root.findall('./sections/columns/table')})
# The actual tables mentioned in a ReportType file:  {e.text for e in root.findall('./sections/columns/table')}
# The number of "sections" in a ReportType file:  len(root.findall('./sections'))
# The labels of the actual "sections" in a ReportType file:  {e.find('masterLabel').text for e in root.findall('./sections')}
# Iterating over a dict:  for k, v in d.items():


"""
Algorithm per ReportType file, parsed into an XML-ElementTree node called "root":
- Figure out how many 1st-level nodes called "sections" there are.
- For each 1st-level "section" node, if it's "of interest," set it aside as a key to a dict called "d",
    and a list of all its relevant child 2nd-level "columns" nodes as the values.
- Loop through "d" and set aside any keys that have the most list-items as a value, of all keys in "d";
    set aside a list of any such keys as "topSections"
- If there was just 1 such "top section" key, set that 1st-level "section" node aside as the variable "se"
    (This is, visually to the end user, where we'll be adding the new field.)
- If there was more than 1, arbitrarily pick 1 and set that 1st-level "section" node aside as the variable "se"
- Once we've picked a 1st-level "section" node as "se," cache a dict "uniqueTableNames" 
    of the distinct words appearing among its grandchild (3rd-level) "table" tag values as "key"
    and the count-of-table-tag-per-distinct-word as "value."
- Presuming there were any (I guess this serves as a sort of dummy-check for the section),
    instantiate a new ElementTree "columns" node called "newColumn,"
    append "newColumn" to "se" (make it 2nd-level)
    and flesh out "newColumn" with details of the field we want to add.
    (When fleshing it out, we arbitrarily pick a value for the 3rd-level "table" tag if "uniqueTableNames" had several keys.)

"""

changedThese = []
pkgroot = None
for i in os.listdir(pstr):
    root = None
    if not i.startswith('wrt_'):
        with open(pstr+i, 'r', encoding='utf-8') as f: # Open a ReportType metadata XML file
            root = stripNSFromET(ElementTree.parse(f).getroot()) # Store a reference to the root of the XML file currently open in a variable
            d = {}
            allFieldsForTableOfInterestColsInAllSectionsOfInterest = []
            for sec in root.findall('./sections'):
                columnsOfInterest = [e for e in sec.findall('columns/table/..') if isTableOfInterest(e.find('table').text, fieldsDBTableAPIName)]
                if len(columnsOfInterest) > 0:
                    d[sec] = columnsOfInterest # Add to "d" any "section" and a list of applicable "columns" inside it
            if len(d) < 1:
                continue # This file is not of interest if nothing got added to "d" (if it's not a ReportType that includes any fields of the object of interest) -- move on to next file
            else:
                print('\r\n' + i + ', baseObject:  ' + root.find('./baseObject').text) # Display which file we are working with and what its "Base Object" is
                se = None
                if len(d) > 0: # Why did I have this at ">1" when I found it 6 months later?  Should it be >0?
                    allFieldsForTableOfInterestColsInAllSectionsOfInterest = [item.find('field').text for sublist in d.values() for item in sublist]
                    if all(fldNm in allFieldsForTableOfInterestColsInAllSectionsOfInterest for fldNm in fieldAPINames):
                        continue # This file is not of interest if all fields are already in it
                    topSections = {k for k, v in d.items() if len(v) == max([len(arr) for arr in d.values()])}
                    if len(topSections) >= 1:
                        if len(topSections) == 1:
                            se = (next(iter(topSections))) # There was only 1 top-ranked section -- pick it
                        else:
                            topSecsWithLabelLikeTableNewFieldIsFrom = [s for s in topSections if s.find('masterLabel').text in [fieldsDBTableAPIName, fieldsDBTableAPIName+'s']]
                            if len(topSecsWithLabelLikeTableNewFieldIsFrom) > 0:
                                se = (next(iter(topSecsWithLabelLikeTableNewFieldIsFrom))) # I don't really care which it is, honestly
                            else:
                                se = (next(iter(topSections))) # I tried my best -- moving on.  Just picking a section.
                    else:
                        se = (next(iter(d.keys()))) # It was a 1-section file -- just pick the one section
                if se:
                    uniqueTableNames = {tbstr : [e.text for e in se.findall('columns/table')].count(tbstr) for tbstr in [e.text for e in se.findall('columns/table')]}
                    if len(uniqueTableNames) >= 1: # We can just tack our new column onto the only section that already has other Contact values
                        changedAnything = False
                        for fieldAPIName in fieldAPINames:
                            if fieldAPIName not in allFieldsForTableOfInterestColsInAllSectionsOfInterest:
                                changedAnything = True
                                newColumn = ElementTree.Element('columns')
                                se.append(newColumn)
                                newColumn.append(ElementTree.Element('checkedByDefault'))
                                newColumn.find('checkedByDefault').text = 'false'
                                newColumn.append(ElementTree.Element('field'))
                                newColumn.find('field').text = fieldAPIName
                                newColumn.append(ElementTree.Element('table'))
                                if len(uniqueTableNames) == 1:
                                    newColumn.find('table').text = next(iter(uniqueTableNames))
                                elif len(uniqueTableNames) > 1:
                                    newColumn.find('table').text = max(uniqueTableNames, key=uniqueTableNames.get)
                        if changedAnything:
                            se[:] = sorted(se, key=lambda x: x.tag) # Put masterLabel tag back at the end of the section.  We don't need to re-sort the fields because we might screw up where people were expecting to see them, if they weren't yet in alphabetical order.
                            root.set('xmlns','http://soap.sforce.com/2006/04/metadata')
                            if not os.path.exists(outpstr): os.makedirs(outpstr)
                            if not os.path.exists(outpstr+'reportTypes'): os.makedirs(outpstr+'reportTypes')
                            with open(outpstr+'reportTypes\\'+i, 'w', newline='') as fw:
                                dom_string = xml.dom.minidom.parseString(ElementTree.tostring(root)).toprettyxml(encoding='UTF-8', indent='    ')
                                dom_string = '\n'.join([s for s in dom_string.decode('UTF-8').splitlines() if s.strip()]) + '\n'
                                fw.write(dom_string)
                            changedThese.append(i[:-11])
if len(changedThese) > 0:
    pkgroot = ElementTree.Element('Package', attrib={'xmlns':'http://soap.sforce.com/2006/04/metadata'})
    typesElem = ElementTree.Element('types')
    for x in changedThese:
        membersElem = ElementTree.Element('members')
        membersElem.text = x
        typesElem.append(membersElem)
    namesElem = ElementTree.Element('name')
    namesElem.text = 'ReportType'
    typesElem.append(namesElem)
    pkgroot.append(typesElem)
    verElem = ElementTree.Element('version')
    verElem.text = metadataAPIVer
    pkgroot.append(verElem)
    with open(outpstr+'package.xml', 'w', newline='') as fw:
        dom_string = xml.dom.minidom.parseString(ElementTree.tostring(pkgroot)).toprettyxml(encoding='UTF-8', indent='    ')
        dom_string = '\n'.join([s for s in dom_string.decode('UTF-8').splitlines() if s.strip()]) + '\n'
        fw.write(dom_string)

print('all done')

Wednesday, September 13, 2017

Vendor-Provided Unit Test Bug "Fix" -- Just Add "SeeAllData=true"!

Wow, does one of our managed-package vendors have me riled up.

I detected a bug in their code that means an "@isTest"-flagged unit test can't INSERT a new record from various tables (including "User" records of a certain type of "Profile").

In other words, I can't write unit tests that have anything to do with several of our core tables.

I reported the bug and pointed out that:

  • such INSERT statements work fine against "the real database," executed in the "Execute Anonymous Window" of the Developer Console, and that
  • they work fine with the "Run Test" button in "@isTest(SeeAllData=True)"-flagged unit tests (a big no-no), but that
  • as soon as you wrap such an INSERT statement in a test method/class annotated simply with "@isTest," it fails against various closed-source-code triggers in their managed package.

Clearly, their triggers have some weird bugs that they never detected because they "cheated" when writing all their unit tests by adding "(SeeAllData=True)" to "@isTest" (that's the case in all the open-source work they custom-developed for us).
Their codebase doesn't seem to be able to hold up to an empty database the way Salesforce code is supposed to, so it seems they simply make sure they're never showing it off to the customer on an empty database and have all unit tests look at "actual data" in the database.
Lovely.

So what'd they do in response to my problem?

Added "(SeeAllData=True)" to my proof-of-bug code and told me everything was all better.

*glares*

No.

It. most. definitely. is. not.


P.S. In the 29 new lines of code you "added" to my proof-of-bug unit test as well, vendor, you exposed a 2nd broken closed-source trigger when I turned off your "(SeeAllData=True)." So thanks for that, I guess -- saved me opening a 2nd ticket? *cough* #NotThatYouShouldHaveFoundItYourselfOrAnything

Friday, July 14, 2017

Python & Oracle SQL Hash + Regexp Replace Equivalents -- random note to self

Oracle SQL, for pulling from ERP:

standard_hash(regexp_replace(COLUMNNAMEGOESHERE, '\s', ''), 'SHA1')

Python, for cleaning up old values already loaded from ERP into Salesforce:

[hashlib.sha1(str(val).encode('utf-8')).hexdigest().upper() for val in (PANDASDATAFRAMEVARIABLENAMEGOESHERE['COLUMNNAMEGOESHERE'].str.replace(r'\s', ''))]

Note to self, lest I forget. This strips the whitespace (including line breaks) out of text, then condenses it into a 40-character hash of itself. Used for turning a big chunk of text that I expect to be unique in a table into a shorter Salesforce "unique external ID" text field when the table containing the text doesn't have a standard "primary key."

Thursday, July 6, 2017

REST APIs and Salesforce - A Primer

A non-programmer colleague was getting advice to integrate an external database with Salesforce by building a "REST API" inside of Salesforce, and I chimed in because for a long time, 3 different ways of referring to "REST APIs" and "Salesforce" confused me.
 

Here's a breakdown of them that I hope will facilitate discussions with developers and help you learn the right kind of programming for your needs, if you're teaching yourself:
 

  1. When people mention "building REST APIs inside of Salesforce," they mean using the "Apex" language to program a specific Salesforce database ("org") to ACCEPT requests FROM THE OUTSIDE WORLD to engage in a computer-to-computer data-transferring conversation over the web.

    The data can potentially flow both ways (more in a second), but only via the outside world knocking on your Salesforce org's door and saying, "Hey, I've got another communications request!" in the exact manner that you've said you want your Salesforce org to be communicated with.

    That's what "writing an API" means -- it means using code to define how a computer is willing to have its door knocked on and be talked to.

    When you "build a REST API" inside of Salesforce, you're building a door in your fortress and posting a guard by it who's cautious about how she interacts with strangers on the other side.
    • You might program your Salesforce org to know what to do with data from the outside world, but only when delivered correctly (e.g. "If you have something for the fortress, bring it in a series of pink 12x12x12 boxes, only paint the first one green and the last one purple, and put your name and return address on all of them -- if we like them, we'll figure out what to do with the stuff inside the boxes from there, and after we're done, we'll post a receipt on the door telling you whether we handled them neatly or burned them").
    • Or you might program your Salesforce org to hand out data from inside the fortress to the outside world, if asked just right ("Tap out 'shave-and-a-haircut-two-bits,' then the morse code for your name and return address, and a description of the data you want, and we'll FedEx a list to you").
    Or you might set up communications protocols for each, to really facilitate 2-way conversation--yet note that every piece of that conversation is initiated by some "outsider" piece of software knocking on your fortress's door in a specific way, no matter which way the "data you care about" is flowing.
     
     
  2. There's ALSO a general-purpose API, built into every Salesforce database ("org") in existence, that uses the "REST" communications protocol. Salesforce defines and maintains how it works. It's a door that Salesforce the corporation cut into your fortress and posted a guard at.

    This "API" is what the authors of "ETL" tools like Apex Data Loader & dataloader.io & Jitterbit & Zapier study carefully before writing such tools.

    Technically, you can write your own tool taking advantage of this protocol, but most people don't bother, since other companies have done such great work.

    Also, if you do option #1, you can sometimes INSERT more data in fewer "API calls" than by using tools that leverage the general-purpose Salesforce "REST" data API.

    Both option #1 & option #2 give you the chance to write code inside your Salesforce database that post-processes incoming data according to your business's needs -- although in the case of approach #1, you might bundle that logic into the back end of the API for efficiency, whereas in the case of approach #2, you have to use "triggers" and such.
     
     
  3. Finally, you can write Apex code within your Salesforce org that goes knocking on the doors of OTHER fortresses.

    Doing so might involve writing Apex that uses the "REST" protocol and someone else's "API," but it's not at all the same thing as the 1st concept we discussed.

    When learning how to do this, you'll often hear terms like "writing asynchronous Apex" and "making HTTP requests with Apex."

Thoughts:

Most admins' approaches to connecting an outside database land in the realm of approach #2, using someone else's "ETL" tool -- maybe with some "developer" help to write "triggers" to post-process incoming data according to your business needs. It's the easiest approach by far.

Some "apps" and "packages" and "plugins" might have done #1 as part of the software package they deliver to you (that is, they might cut a hole in your fortress and post a guard).

Personally, I've barely dabbled in "approach #2, roll-your-own-ETL-tool," and I've only been to classes and read books on approaches #1 & #3.

Watching from the outside, it seems to me that "approach #1" & "approach #2, roll-your-own-ETL-tool" achieve pretty similar business-need ends and require similar levels of developer skill.

  • It's just a question of whether you prefer to be constrained to programming in Salesforce's language, yet have more control over Salesforce database performance (approach #1),
  • Or whether you prefer to be a little less efficient, but have more control over the language in which you program ("approach #2, roll-your-own-ETL-tool").

(Although even in the case of approach #1, you'll probably be writing some sort of "ETL" software in your favorite language that does the actual knocking on your Salesforce database's ("org's") door -- it'll just probably be a lot simpler to write than whatever you would've written that relies solely upon Salesforce's standard, built-in API.)


Finally, a brief analogy for any Oracle developers following along:

REST-based data transmission : data flow in & out of Salesforce databases :: ODBC/JDBC-based data transmission : data flow in & out of traditional relational databases

Specifically, the built-in "approach #2" to using REST might be most comparable -- not sure how many database management systems let you write your own APIs into them (#1) or make callouts to internet services from them (#3).

Wednesday, June 28, 2017

Neat PDF About When To Consider Python

Neat poster here (PDF) by Roxanne Johnson indicating when to consider Python for data projects, when to stick with Excel, how to find self-help resources if you delve into Python, etc.

Large page and small print, so zoom in.

Wednesday, May 24, 2017

Job-Hunting Advice For Career Changers -- Talking About Transferable Skills

I'm mentoring a student worker as she searches for jobs after finishing her graduate degree at our university. She came to graduate school full-time to learn skills in a new field, which means she has recently learned a number of new programming languages in an academic setting, rather than on the job. Unsure how to keep recruiters on the phone and find some way to make her several years of work experience before and during graduate school "count" towards a job using these programming languages, she asked me if I had any advice. Here are some tips I gave her.

  • The written job description is gold to you.
    • The bottom half usually describes the technical skills you're afraid of seeming too new at. Don't get too wrapped up in it.
    • The top half usually describes the job itself. It's where a career-changer like you will find the most "source material" for talking about why you're qualified to do the job.
      • You wish they would "give you a chance" to "learn on the job," right?
        • Well, they won't. Not if you put it like that.
        • But if you point out how you've done things just like that, only different, but pretty much the same, and HERE'S HOW -- well, now you sound like someone ready to "jump into the deep end of the pool," don't you?
          Welcome to the world of "selling your transferable skills."
      • The "what this position does" "top half" of a job description is where you find out exactly what you need to say your prior experience is "just like," and where you'll figure out the difference between your prior experience and the future job (so that you can explain what other transferable skills you have that address the difference).
    • If you don't have a written job description because a recruiter is summarizing a job orally and focuses on the "bottom half," confidently direct them to tell you about about the "top half" so you can direct the rest of the phone call from a position of strength.
      • Example: "Ah, yes, I have worked with those technologies in my graduate coursework, and I spent a year at ___ and three years at ___ doing related work with [related technology or transferable experience/skills] -- that might be a good fit. Can you tell me more about the job responsibilities and day-to-day work?"
  • Rehearse, rehearse, rehearse. Practice makes perfect.
    • Treat this like you would ask a child to treat a role in a "school play."
      1. Memorize the lines
      2. Look in the mirror and practice the lines -- master your projection of confidence and your ability to remember your lines while looking someone in the eyes / get over any sense of disgust at watching yourself.
      3. If you have time, look in the mirror again and practice making your lines your own -- practice being easy and relaxed and you while delivering them.
  • Of course, you have to write your own lines -- you're the playwright as well. But you already know what the play is going to be about -- it's going to be about the job in the advertisement and about you! So write that script!
    • Print a copy of the job description and a copy of your resume. Maybe grab some spare paper if there isn't a lot of room in the margins.
    • Look through the "job duties" and TRANSLATE them from the way they're written to "something your grandmother would understand when she asks what you do for a living."
    • Look through the "job duties" and find the IMPLIED job duties. (For example, if you have to "deliver" something, implied transferable skills & experiences might be "ahead of schedule" or "to very important people." Another example -- if it talks about "different" people you'll be working with ... implied is, "You'll be pulled in a million different directions and need to be able to handle it politely." If you've done that before ... yay, you have experience in something they need!)
    • Now, for all the plain-English and implied real job duties, figure out what you've already done, at school or at work, and write notes to yourself.
    • Finally, go through your notes and put together little stories about how great at the job you'd be, mentioning that you see they need the position to do ____, and how that sounds like the time in ____ when you _____ed.
      • Repeat until you're out of such stories -- then start this process over for the next job advertisement!
      • You might want to have a few different versions of any story that's too technical -- the "highly technical" (but still 60 seconds or less) version and the "explain it to a 6-year-old in 15 seconds or less" version. Always start by sharing the "explain it to a 6-year-old, 15 seconds or less" version of a story, only going into the "highly technical" version if pressed for more details.
      • When talking about how you have ___ experience on the job and ___ experience at school, don't leave it to the interviewer to infer that the two can be added together into a competence. This is YOUR JOB and is why you're rehearsing. If you'd like, you can start with phrases like, "In my experience" or "Given my experiences on the job and in my coursework," which have 3 advantages:
        1. You don't have to worry about seeming like you're being contradictory -- it's just your opinion. So it's polite / deferential.
        2. It's hard for someone to challenge the accuracy of. It's just your opinion. So it's confident (because it's hard to challenge).
        3. It brings attention to the fact that you have experience and that you can put all of your experience together into one big package that's useful to them.
  • This same process can also be used when deciding what to write in a cover letter -- only you have to be much, much more concise and not tell as many stories.
  • Finally, if you're worried about professional "word choice" when answering difficult questions, read Alison Green's 'Ask A Manger' advice blog religiously. Eventually, you'll pick up on her style, which I think is excellent.

Good luck!

Python CSV Example - Back-Filling Nearby Zip Code Distances

Today I used the following small Python script to add a bunch of distances to a new field called "Miles_From_Us__c" on 40,000 pre-existing Contact records.

The process was as follows:

  1. Export "SELECT ID, MAILINGPOSTALCODE, MILES_FROM_US__C WHERE MAILINGPOSTALCODE != NULL AND MAILINGPOSTALCODE != '' AND MILES_FROM_US__C = NULL" from our Salesforce org to a CSV file on my hard drive at "c:\tempexample\contactzipextract.csv" using the "Apex Data Loader" software.
  2. Run the Python script in this blog post instead of bashing my head against the wall with Excel.
  3. Import the CSV file on my hard drive at "c:\tempexample\contactzipupdate.csv" back into our Salesforce org as an "UPDATE" operation using the "Apex Data Loader" software.

 

import pandas

zips = ['11111','22222','33333','44444']

zipstodist = {'11111':0, '22222':1, '33333':9, '44444':21}

df = pandas.read_csv('c:\\tempexample\\contactzipextract.csv', dtype='object')

df = df[df['MAILINGPOSTALCODE'].str[:5].isin(zips)]

df['MILES_FROM_US__C'] = df['MAILINGPOSTALCODE'].str[:5].apply(lambda x : zipstodist[x])

df[['ID','MILES_FROM_US__C']].to_csv('c:\\tempexample\\contactzipupdate.csv', index=0)

Here's an explanation of what my script is doing, line by line, for people new to Python:

  1. Says that I'll be using the "Pandas" plug-in to Python.
  2. Says which U.S. postal codes ("zip codes") I care about as a Python "list" that I give a nickname of "zips."
    • I got this from the first column of a CSV file of "zip codes of interest" that a consultant sent one of my users.
  3. Defines the "distance from us," in miles, of each zip code of interest, as a Python "dict" that I give a nickname of "zipstodist."
    • I got this from the first and second columns of a CSV file of "zip codes of interest" that a consultant sent one of my users.
    • I could have cross-checked the two CSV files in Python, but I had already transformed the CSV data into this list style when writing some Salesforce Apex code for future maintenance, so copying/pasting it into Python was easier in this case.
  4. Reads my CSV file into Python as a Python "Pandas dataframe" that I give a nickname of "df."
  5. Filters out any rows whose first 5 characters of the value in the "MAILINGPOSTALCODE" column aren't among my zip codes of interest.
  6. Sets each row's"MILES_FROM_US__C" to the distance corresponding to the first 5 characters of that row's "MAILINGPOSTALCODE" value.
  7. Exports the "ID" & "MILES_FROM_US__C" columns of my filtered, transformed dataset out to a new CSV file on my hard drive.

Tuesday, April 25, 2017

A Brief UPDATE Script: Oracle SQL vs. Salesforce Apex+SOQL

In a copy of Salesforce using EnrollmentRx, we are capturing the details of every submission-from-a-student on a table attached to "Contact" (PK-FK) called "Touch_Point__c."

When such a "Touch_Point__c" record is created, if it is the first created for a given "Contact," a trigger copies its "Lead_Source__c" value over to the corresponding "Contact" record's "LeadSource" field.

Midway through an advertising campaign, a decision was made to change the string used for a certain departmental landing page's "Lead_Source__c" value from "Normal Welcome Page" to "Landing Page."

We'd caught up on back-filling "Lead_Source__c" values on old "Touch_Point__c" table records.

However, we hadn't yet back-filled the corresponding "LeadSource" fields on "Contact" in the case where such "Touch_Point__c" records had been the first in existence for a given "Contact."
(We wanted to leave "Contact" records alone where none of the altered-after-the-fact "Touch_Point__c" were actually the first "Touch_Point__c" record for the "Contact.")

I wrote a little script that's the equivalent of a complex "UPDATE" statement and am sharing it here for colleagues from the Oracle SQL world.
(Please excuse any typos or inefficiencies -- the real data set was small, so I didn't care about performance, and I didn't actually run the Oracle.)


Here's some Oracle SQL that I believe would've done the job, if Salesforce were a normal Oracle database:

UPDATE Contact
SET LeadSource = (
    SELECT
        Lead_Source__c
    FROM Touch_Point__c
    INNER JOIN (
        SELECT Contact__c, MIN(CreatedDate) AS MIN_CR_DT
        FROM Touch_Point__c
        GROUP BY Contact__c
 ) qEarliestTP
    ON Touch_Point__c.Contact__c = qEarliestTP.Contact__c AND Touch_Point__c.CreatedDate = qEarliestTP.MIN_CR_DT 
    WHERE Contact.Id = Touch_Point__c.Contact__c
 AND Lead_Source__c='Landing Page'
 AND Dept_Name__c='Math'
 AND utm_source__c is not null
 AND extract(year from CreatedDate) >= extract(year from current_date)
)
WHERE Contact.Id IN (
    SELECT
        Touch_Point__c.Contact__c
    FROM Touch_Point__c
    INNER JOIN (
        SELECT Contact__c, MIN(CreatedDate) AS MIN_CR_DT
        FROM Touch_Point__c
        GROUP BY Contact__c
 ) qEarliestTP
    ON Touch_Point__c.Contact__c = qEarliestTP.Contact__c AND Touch_Point__c.CreatedDate = qEarliestTP.MIN_CR_DT 
    WHERE Contact.Contact__c = Touch_Point__c.Contact__c
 AND Lead_Source__c='Landing Page'
 AND Dept_Name__c='Math'
 AND utm_source__c is not null
 AND extract(year from CreatedDate) >= extract(year from current_date)
)
AND Contact.LeadSource='Normal Welcome Page'

Here's the Salesforce Apex code (with embedded SOQL) I wrote to do the job instead, since Salesforce doesn't give you a full-on SQL-type language.**

// Loop through every record in the "Touch_Point__c" table, setting it aside in a map, keyed by its foreign key to a the "Contact," if it is the earliest-created "Touch_Point__c" for that Contact
Map<Id, Touch_Point__c> cIDsToEarliestTP = new Map<Id, Touch_Point__c>();
List<Touch_Point__c> allTps = [
  SELECT Id, Contact__c, CreatedDate
  FROM Touch_Point__c
  ORDER BY Contact__c, CreatedDate ASC
 ];
for (Touch_Point__c tp : allTPs) {
 // The "ORDER BY" in allTPs should make this logic short-circuit at the first half of the "IF," but 2nd half will dummy-check if the list is, for some reason, out of order.
    if (!cIDsToEarliestTP.containsKey(tp.Contact__c) || cIDsToEarliestTP.get(tp.Contact__c).CreatedDate > tp.CreatedDate) {
        cIDsToEarliestTP.put(tp.Contact__c, tp);
    }
}

// Loop through every "landing page visit"-typed record in the "Touch_Point__c" table, updating a modified in-memory copy of the record in the "Contact" table it references to a list of "Contact" records called "csToUpdate" ONLY IF the "landing page"-related TouchPoint is also the "earliest-created" TouchPoint for that Contact record -- then call a DML operation on that in-memory list to persist it to the database.
List<Contact> csToUpdate = new List<Contact>();
List<Touch_Point__c> mathLandingTPs = [
  SELECT Id, Contact__c, Lead_Source__c, Contact__r.LeadSource, Dept_Name__c, utm_source__c, CreatedDate
  FROM Touch_Point__c
  WHERE Lead_Source__c='Landing Page'
  AND Contact__r.LeadSource='Normal Welcome Page'
  AND Dept_Name__c='Math'
  AND utm_source__c<>null
  AND CreatedDate>=THIS_YEAR
 ];
for (Touch_Point__c tp : mathLandingTPs ) {
    if (cIDsToEarliestTP.containsKey(tp.Contact__c) && cIDsToEarliestTP.get(tp.Contact__c).Id == tp.Id) {
        csToUpdate.add(new Contact(Id=tp.Contact__c, LeadSource=tp.Lead_Source__c));
    }
}
UPDATE csToUpdate;

Oracle programmers, I imagine your colleagues might yell at you if you used PL/SQL to hand-iterate over smaller SQL queries in Oracle rather than using native SQL do the work for you. In Salesforce, that's simply the way it's done.

**Note that more complex code might be required -- e.g. you might have to run the same code several times with a row-count limit on it -- since Salesforce is pretty picky about the performance of triggers fired in response to a DML statement. (A normal Oracle database has its limits, too, of course, but they're likely far less strict if it's your own in-house database than with Salesforce.)

Tuesday, April 4, 2017

Python for Salesforce Administrators - Introduction to XML and JSON

XML and JSON are like each other, but not like CSV

We've talked about how useful Python can be for processing table-style data stored in "CSV" plain-text files.

The key properties of table-style data are that:

  1. The "table" always has a certain number of columns
  2. Every single row in the table has the exact same set of "columns" (the exact same "keys") as every other row and as the table at large
  3. Every single row in the table is capable of having a value in each of these columns ... a "blank" placeholder still needs to be indicated if it doesn't.
  4. Every single row in the table can only have one value in each of these columns. (If you have only one "First Name" column, no row can have two first names.)
  5. Each conceptual "item" in the data is represented by a "row"
  6. Each conceptual "item" (row) can have no more than 1 level of "key-value" properties (a "column header" being the "key" and a given cell beneath a column header, in a specific row, being the "value").
    Example: There's no such thing as having the notion of a "Name" that breaks down at a lower level into "First Name" and "Last Name."
    Sure, the database you're exporting a "CSV" file from might have an automatically computed / "formula" field called "Name" that is just a space-separated merge of "First Name" and "Last Name." But the exported "CSV" file itself -- a plain-text representation of your data -- will show "Name," "First Name," and "Last Name" as independent "columns" whose values are at an equal "level" to each other.

There are other styles of data that can be stored in plain-text files as well.

The two main problems with table-style data that alternative textual representations of data try to get around are:

  1. Giving each conceptual "item" in the data key-value properties that are "nested" inside each other
  2. Letting each conceptual "item" in the data have "keys" that have nothing to do with the "keys" that other conceptual "items" in the data have

A plain-text file where punctuation indicates the start/end of each conceptual "item" in the data, and where the "keys" (and their values) inside of each conceptual "item" are also indicated by careful use of punctuation, can handle both of these requirements.

The two most common formats today are "XML" and "JSON." Plain-text exports of your current Salesforce configuration are often formatted in either of these styles.

  • In both formats, conceptual "items" can have more conceptual "items" nested inside of them.
  • In both formats, there is a way (in XML, 2) of defining the "keys and their values" possessed by each conceptual "item"
  • In XML, the word for the thing representing a conceptual "item" is an "element."
  • In JSON, the word for the thing representing a conceptual "item" is an "object."
  • Despite the "element" vs. "object" linguistic difference, XML and JSON represent the same type of data (nested data where each conceptual "item" gets to define its own "keys" and specify values for them).

We'll have a lot of examples in this post.

  • To view my XML examples graphically, paste them here and click "Tree View".
  • To view my JSON examples graphically, paste them here and click "Tree View".

XML

The punctuation that XML uses to define the beginning and end of an "element" is a "tagset." It looks like this:

<Person></Person>

As you can see, it's the same word, surrounded by less-than and greater-than signs, with the one indicating the "end" of the element starting with a forward-slash.

Each piece in the greater-than or less-than signs is considered a "tag," hence "tagset" for the notion of including them both (kind of like "parenthesis" versus "a set of parentheses").

The fact that the tagset exists in your text file means that it exists as a conceptual "item" in your data.

It doesn't matter whether or not there's anything typed between the tags (after the first greater-than, before the last less-than). This is now a conceptual item that "exists" in your data, simply because the tagset exists.

If it doesn't have anything between the tags, you can think of it a little like a row full of nothing but commas in a CSV file. It's there ... it's just blank.

(In fact, there's even a fancy shortcut for typing such tagsets: this single tag is equivalent to a tagset with nothing in the middle like the one above -- note the forward-slash before the greater-than sign:)

<Person/>

Note that already, though, our "blank" element has one big thing different about it than a row in a CSV file does: it has a name! Rows don't have names. We'll come back to this, but this is why XML has two ways of indicating an element's "keys and their values." By giving each element a name, XML allows the element itself to be used as a "key" definition for the larger context inside of which the element finds itself.

Here's an example:

<Shirt>
 <Color></Color>
 <Fabric></Fabric>
</Shirt>

There are 3 conceptual "items," or "elements," in this data, each of which has a name.

All 3 can stand alone as "elements" in the grammar of XML. Analogy:

You can write an English sentence that has multiple complete sentences inside of it; to write a sentence with multiple complete sentences inside, simply separate the two with semicolons.

However, the fact that the elements named "color" and "fabric" are nested between the tags of the element named "shirt" means that they are also indicating that this particular shirt has keys named "color" and "fabric" (the values to both of which are currently blank).

The line breaks and tabs aren't necessary in XML (even for saying where "color" stops and "fabric" begins), but they help humans read XML.

Now might be a good time to show you the other way of indicating that a particular shirt has a "color" and "fabric," but that their values are blank:

<Shirt Color="" Fabric=""></Shirt>
Or, in shortcut notation, since there's now nothing inside the "Shirt" tagset:
<Shirt Color="" Fabric=""/>

Note that this isn't always treated EXACTLY the same as our nested-tag example when it comes to programming languages that read XML. Some software might argue that there's more of a "nothingness" in the nested-tags example (it truly doesn't have a color), and there's more of a "value without any letters in it"-ness in the inside-the-Shirt-opening-tag example. Just sometimes, though, and that's often you, the programmer, deciding to make that distinction.

The big difference, though, is that in this case, "color" and "fabric" are not standalone elements.
They are "attributes" of the element named "Shirt".

You can't put more standalone "elements" between the quotes after the "=" of an "attribute." You're done. Only a plain-text value can go there.

You can't do this:

<Shirt Color="" Fabric="<Washable></Washable>"></Shirt>

But you can do this:

<Shirt>
 <Color></Color>
 <Fabric>
  <Washable></Washable>
 </Fabric>
</Shirt>

You also can't give any element more than one "attribute" of the same name, whereas you can nest as many same-named "elements" inside of an element as you like.

You can't do this:

<Shirt Color="" Color="" Fabric=""></Shirt>

But you can do this:

<Shirt>
 <Color></Color>
 <Color></Color>
 <Fabric>
  <Washable></Washable>
 </Fabric>
</Shirt>

Those are the main differences between the two ways XML gives you to define key-value pairs on an element.

  1. Attributes are, by definition, "the end of the line" when it comes to the "key" definitions attached to an element (and can't conflict with each others' names)
  2. The names of elements nested inside an element also serve as "key" definitions for the outer element, but they're "fuzzier" than attributes.
    This is usually considered a good thing.
    This desire for "fuzziness" & "repeatability" & "nested-ness" is one of the two reasons people choose a "nested" data format instead of a table-style format to represent their data.

A word of warning: this is also valid code:

<Shirt Color="" Fabric="">
 <Color></Color>
 <Color></Color>
 <Fabric>
  <Washable></Washable>
 </Fabric>
</Shirt>

A human might look at the shirt above and think it has 3 colors and 2 fabrics. It's probably better to think of it the way the computer thinks of it -- that the shirt above has 1 color attribute, 1 fabric attribute, 2 full-on elements nested within it each named "color," and 1 full-on element nested within it named "fabric."


Now let's give our shirt some "values!"

First of all, it's essential to remember that in a way, all these example' elements "keys" already had values. The values for the "keys" were just blank, or they were other elements**.

**(Think about the examples where an element named "washable" was nested inside of an element named "fabric" which was nested inside of an element named "shirt." The value for the "shirt" element's "fabric" key wasn't exactly blank -- the value for that key was more like: "an element called 'washable.'")

But when I say "values" for keys like "color" or "fabric" or "washable," you're probably thinking about things like the word "blue" or the word "red" or the word "leather" or the word "yes" or the word "no." So let's talk about those.

In XML, any given "element" can have exactly 0 or 1 plain-text "value" (like "leather" or "blue") between the tags that show where its boundaries are.
The only other thing that can go "inside" the element besides its (optional) plain-text "value" is more elements.

Here's a really simple element with a plain-text value:

<Shape>
 Rectangle
</Shape>

It's not common for the "outer-most" elements in an XML-formatted piece of data to have values--especially because valid XML always has just 1 outermost element. (If you don't care, you can just make up a name like "RootElement" for the tagset that holds all the "elements" you actually think of as your data.).

But even sometimes "2nd-outer-most" elements don't have values. Particularly when they represent some sort of abstract real-world object with a lot of complexity that you want to capture, they have 0 values but a lot of elements nested inside them.

Although you could describe a fleet of cars like this:

<RootElement>
 <Car>
  First car's Vehicle Identification Number here
 </Car>
 <Car>
  Second car's Vehicle Identification Number here
 </Car>
</RootElement>

The above code implies that the conceptual "items" that you've given names of "car" truly are their Vehicle Identification Numbers. Yet they're really not, are they? They're heavy chunks of steel taking up space in the real world. There isn't really a plain-text value that captures what they are. Therefore, you won't really see a lot of XML like that. Although the word "Car" is, technically, a "key" to each "element's" "value," in this case, it doesn't quite make sense to give "Car" a "value."

Here's a more realistic way of writing the data, using nested elements to show that each car has 1 "key" of "VIN" and that the "value" for that "VIN" key is filled in on both cars:

<RootElement>
 <Car>
  <VIN>
   First car's Vehicle Identification Number here
  </VIN>
 </Car>
 <Car>
  <VIN>
  Second car's Vehicle Identification Number here
  </VIN>
 </Car>
</RootElement>

Here's another realistic way of expressing the same idea, only using attributes to show each car's key & values:

<RootElement>
 <Car VIN="First car's Vehicle Identification Number here">
 </Car>
 <Car VIN="Second car's Vehicle Identification Number here">
 </Car>
</RootElement>

Or, for short (using attributes):

<RootElement>
 <Car VIN="First car's Vehicle Identification Number here"/>
 <Car VIN="Second car's Vehicle Identification Number here"/>
</RootElement>

In this little example, we're actually just dealing with multiple conceptual "items," each of which has the exact same keys as each other, and which has just 1 value per key, so remember that table-style (CSV) data could've easily represented the same data -- in this case, we've just got a 1-column CSV file:

"VIN"
"First car's Vehicle Identification Number here"
"Second car's Vehicle Identification Number here"

I digress -- but it's good to recognize what's going on in your data, and which types of plain-text files are capable of representing it.

Going back to our shirt example, let's say that our data set includes just 1 shirt, that it's "blue and red and green" and made out of "leather and cotton" and that the leather isn't washable but the cotton is.
Our XML representation of our data might look like this:

<Shirt>
 <Color>
  Blue
 </Color>
 <Color>
  Red
 </Color>
 <Color>
  Green
 </Color>
 <Fabric>
  Leather
  <Washable>
  No
  </Washable>
 </Fabric>
 <Fabric>
  Cotton
  <Washable>
  Yes
  </Washable>
 </Fabric>
</Shirt>

There isn't really a good way to represent that concept of what traits the shirt possesses in a single row of a CSV file, is there? This is where XML and JSON shine!

Read the XML above carefully. What you have is:

  • 1 element with a name of "shirt" that has 0 plain-text "values," but has 5 more elements nested inside of it
  • 3 elements with a name of "color" (nested inside the one named "shirt"), none of which have any elements nested inside of them, but each of which have 1 plain-text value
  • 2 elements with a name of "fabric" (nested inside the one named "shirt"), each of which have exactly 1 plain-text value, and each of which also have 1 more element nested inside of them
  • 2 elements with a name of "washable" (nested inside various ones named "fabric"), none of which have any elements nested inside of them, but each of which have 1 plain-text value

There also isn't really a good way to represent this shirt using "attributes" on the "shirt" tagset (because it has multiple colors and multiple fabrics, and because the fabrics have nested elements of their own). However, since each "fabric" only has exactly 1 "washable" key & value, you could use attributes for that as follows:

<Shirt>
 <Color>
  Blue
 </Color>
 <Color>
  Red
 </Color>
 <Color>
  Green
 </Color>
 <Fabric Washable="No">
  Leather
 </Fabric>
 <Fabric Washable="Yes">
  Cotton
 </Fabric>
</Shirt>

The choice is up to you, depending on which way you think it's easier to fetch/modify the values using code and which way you think it's easier for humans to read.

Another choice that's up to you is whether "Leather" is what the fabric truly is (the way "blue" is an adjective and therefore describes what the color truly is), or whether the notion of a fabric is too fuzzy in the real world to capture in a single word (like with our car) and should've been a key-value pair with a key like "name."
It's the same choice we had to make when deciding whether a car was its VIN or whether it had a VIN.
Outer-ward elements representing complex concepts usually just have key-value pairs (like with our car or our shirt examples).
For elements at "deeper" levels of nesting, you'll need to decide whether they "are" something (the optional 1 plain-text value they get) or whether they merely "have" things (nested elements & attributes).
That's a judgment call for you to make based on how your data is going to be used. All organization involves judgment calls trading flexibility against simplicity.
When it comes to writing software to process XML someone else already wrote, it's good to be able to recognize which judgment call they made (because the programming-language commands for extracting the two styles of writing key-value pairs are different).


JSON

The punctuation that JSON uses to define the beginning and end of an "object" is a set of "curly braces." It looks like this:

{}

  • Q: Hey waiddaminute -- that "object" doesn't have a name! I thought you said JSON "objects" and XML "elements" were pretty equivalent in terms of both being representations of conceptual "items" in the same style of organizing data!
  • A: Similar. But not the same. Good catch. JSON doesn't give objects a name. Nor do they get an optional "plain-text value" standing apart from anything nested inside of them. JSON objects look pretty different from XML elements.

Also, you can't just put JSON objects back-to-back the way you can put XML elements back-to-back; this isn't valid JSON:

{}
{}

Instead, you have to put them inside square-brackets and separate them with commas (remember not to put a comma after the last one, since nothing comes next--easy copy/paste mistake when you're putting each one on its own line). This is how you show 2 JSON objects at the same level as each other:

[
 {},
 {}
]

Note that the line breaks and tabs, however, are still for human benefit only.

Also, you don't have to include them inside any sort of "RootElement" container. That right there is valid JSON.

But getting back to our "JSON objects don't have names" problem ... what's the equivalent of this XML in JSON?

<Person/>

There isn't an exact translation, but one representation could be:

{
 "type": "Person"
}

In other words, you're making up an "attribute" (a "key") for the JSON object, calling it "type," and giving it a "value" of "Person" (yup, JSON objects have attributes, and like XML element attributes, you can only use a given attribute-name once!) You could have called it anything -- "type" is nothing special.

Similarly, either this XML:

<Shirt>
 <Color></Color>
 <Fabric></Fabric>
</Shirt>

Or this XML:

<Shirt Color="" Fabric=""></Shirt>

Might become this JSON:

{
 "type" : "Shirt",
 "Color" : null,
 "Fabric" : null
}

(Though, getting back to that thing I mentioned earlier about whether an empty tagset is somehow "emptier" than an empty attribute set of quotes, you might argue that the values of this JSON object's "Color" & "Fabric" attributes/keys should be two quotes in a row, rather than the special keyword 'null'. More than I want to get into right now, but software than processes JSON would see the two differently -- null is emptier than the empty-quotes.)

The biggest difference from XML that the lack of names in JSON introduces is this notion of having to make up your own keyword for the name if you really think it needs a name.

Also, as far as how-to-type-JSON, "attribute" names & values on a JSON object are separated from each other with a colon, and there needs to be a comma between attribute name-and-value pairs. (Again, don't forget not to put a comma after the last attribute name-value pair!)

Furthermore, attribute names are often inside quotes in JSON, and the value can be something that doesn't have quotes around it (we haven't gotten there yet).

Sometimes your data might not need names. If you have a bunch of conceptual "items" back-to-back at the same level of nesting, and none of them need a "plain-text value" representing what they truly are, and all of them just have key-value pairs describing what they "have," JSON is a lot shorter to type. (Especially if you take out all the tabs & line breaks I'm putting in to make this blog readable. JSON authors love to take out line breaks & tabs -- if you run into such code, paste it here and click "Beautify" to read it more easily ... just make sure the "code" you're punching in isn't confidential company information!) Consider this example.

Here's some XML representing a fleet of cars, each of which have different sets of key-value traits we care about tracking (we'll use "attribute" style and "tagset-with-nothing-inside shorthand here), but all of which are cars.

<RootElement>
 <Car color="blue" trim="chrome" trunk="hatchback"/>
 <Car appeal="sporty" doors="2"/>
 <Car doors="4" color="red" make="Ford"/>
</RootElement>

Maybe we already know they're all cars based on the context of our data. A JSON equivalent, without forcing each one to have a silly attribute like "type" (with a value of "car"), could be:

[
 {
  "color" : "blue",
  "trim" : "chrome",
  "trunk" : "hatchback"
 },
 {
  "appeal" : "sporty",
  "doors" : "2"
 },
 {
  "doors" : "4",
  "color" : "red",
  "make" : "Ford"
 }
]

So far, JSON doesn't look much more concise. But that note I made about "attribute values in JSON don't have to be in double-quotes" earlier is where its power really lies. Let's take a look at a shirt with two colors (blue, red) and 1 fabric (nylon). Here's the XML:

<Shirt>
 <Color>blue</Color>
 <Color>red</Color>
 <Fabric>nylon</Fabric>
</Shirt>

And here's some similar JSON (note that I didn't bother to force it to be called "shirt" and that I decided that saying "colors" would make more sense than "color"):

{
 "Colors" : ["blue","red"],
 "Fabric" : "nylon"
}

We're using the same square-brackets-and-commas notation to make a list out of "blue" and "red" that we used to put a bunch of JSON objects together into a data set full of cars. The entirety of the set of brackets is the value of this JSON object's attribute/key called "colors."

As you can see, XML and JSON get pretty different when it comes to writing down the fact that a conceptual "item" in your data set has multiple "keys" all with the same name, each with a different value.

  1. XML works like I just described it (multiple key-pair values, where the key names are the same as each other). JSON doesn't allow that. That's something special to XML. If you like that about XML, there you go--use XML. :-)
  2. In JSON, you just give your object one "key" (maybe making the name of that "key" a plural noun) and you shove all those values into a list.

So, to recap the kinds of "value" you can give an attribute/"key" belonging to a JSON "object" (conceptual item):

  • We've seen that the "value" for a given attribute/"key" of a JSON object can be a piece of plain text (put it between double-quotes).
  • We've seen it be the keyword 'null' (no quotes around it).
  • We've seen it be a list (square-brackets, with commas separating multiple values).
    Note that you can certainly just have one value inside the square brackets, or they can even be empty if the list is empty. Just keep in mind that it's still a list of values to a computer, even if it's empty or a list of size 1.
  • We're about to see that it can be another JSON "object" (we're about to nest things, just like we did in XML!)

Let's go back to our data set that includes just 1 shirt, which is "blue and red and green" and made out of "leather and cotton," where the leather isn't washable but the cotton is.
A JSON representation of our data might look like this:

{
 "type" : "Shirt",
 "colors" : ["blue","red","green"],
 "fabrics" :
  [
   {
    "type" : "Leather",
    "washable" : "No"
   },
   {
    "type" : "Cotton",
    "washable" : "Yes"
   }
  ]
}

As a reminder, here was a short XML version of the same shirt:

<Shirt>
 <Color>
  Blue
 </Color>
 <Color>
  Red
 </Color>
 <Color>
  Green
 </Color>
 <Fabric Washable="No">
  Leather
 </Fabric>
 <Fabric Washable="Yes">
  Cotton
 </Fabric>
</Shirt>

What I notice the most is:

  • JSON is awkward when conceptual "items" in our data need to have attributes and a meaningful name of their own (like "shirt" and "leather" and "cotton")
  • XML is overly wordy by making conceptual "items" out of things that already were true "attributes" of what we humans really think of as our data's conceptual "items" (like "color" -- we're probably not really thinking of it as an "item" in our data set, because "color" is no more complicated than its value(s) -- it doesn't have any other traits about the color that we need to describe).

What's In It For You?

In the end, as a Salesforce administrator, what matters most is being able to recognize which kind of data the Salesforce servers have given you (or in which format the servers expect to receive data from you).

You don't exactly get to argue with Salesforce about which format they should have picked.

  • When you're picking apart data Salesforce sent you, you just need to know how to "parse" the file to extract the data of interest to you.
  • When you're composing a file to send Salesforce, you'll model it after an existing file they sent you (or technical documentation) to figure out exactly how they want you to arrange the details.

In future posts, we'll talk about writing Python code that can do both of these tasks.

  • Understanding the relationship between the textual representation of the data and what it means will be crucial to those exercises.
  • Understanding when you're looking at data that can be made "flat" and "consistent from one conceptual item in the data to the next" (that is, when you can interpret it like a table/CSV) and when you're looking at data that really doesn't have those characteristics is crucial as well. Remember how I "diverted" and pointed out that our fleet of cars with nothing but VINs could have just as easily been a 1-column CSV file? Ideally, you want to be able to make that kind of commentary, too, about entire XML/JSON files or about fragments of them.
    Remember to play with the graphical viewers (XML, JSON)!

Hopefully, this blog post will help you with both tasks by better understanding what the example data says and how it's shaped when you see it.


Table of Contents