Pages

Tuesday, September 26, 2017

Python To Facilitate Adding A New Field To Multiple Custom Report Types (involves XML)

Quick code dump: if you download all your "report types" with Eclipse, this can help you put together "fresh" copies of them for use with the Workbench Deploy web site (just zip them up). Very handy if you just created a custom field on an object that's involved in dozens of report types.

This is an example of why it's good to be able to work with XML, too, in Python!

Hopefully I can clean up (or delete & replace) this post at a later date.

import os
from xml.etree import ElementTree
import xml.dom.minidom

def stripNSFromET(etRoot):
    etRoot.tag = etRoot.tag.split('}', 1)[1] # strip all namespaces level 1
    for el in etRoot:
        if '}' in el.tag:
            el.tag = el.tag.split('}', 1)[1] # strip all namespaces level 2
            for el3 in el:
                if '}' in el3.tag:
                    el3.tag = el3.tag.split('}', 1)[1] # strip all namespaces level 3
                    for el4 in el3:
                        if '}' in el4.tag:
                            el4.tag = el4.tag.split('}', 1)[1] # strip all namespaces level 4
                            for el5 in el4:
                                if '}' in el5.tag:
                                    el5.tag = el5.tag.split('}', 1)[1] # strip all namespaces level 5
    return etRoot

def isTableOfInterest(tableTagText, fieldsDatabaseTableAPIName):
    if fieldsDatabaseTableAPIName == 'Contact':
        return tableTagText == 'Contact' or tableTagText.endswith('.Contacts') or tableTagText.endswith('OpportunityContactRoles')
    else:
        return tableTagText == fieldsDatabaseTableAPIName or tableTagText.endswith('.'+fieldsDatabaseTableAPIName)

outpstr = 'C:\\example\\temppackagereporttypes\\'
pstr = 'C:\\examplepath\\EclipseWorkspace\\orgfolder\\src\\reportTypes\\'
fstr = 'C:\\examplepath\\EclipseWorkspace\\orgfolder\\src\\reportTypes\\Contact_With_Campaign_Members.reportType'

metadataAPIVer = '40.0'
fieldsDBTableAPIName = 'Contact'
fieldAPINames = ['New_Field_1__c','New_Field_2__c']
fieldAPINames.sort()

# SOME NOTES OF INTEREST:
# Contact is not only referred to in the normal way, but also often as OpportunityContactRole coming off an "Opportunity."  Not sure if ALL such OCRs are actually able to take Contact fields.
# This could make it a bit difficult to decide into what "Section," exactly, to add a new database field when adding it to every ReportType:
    # A given "Section" tag within a ReportType's XML file is NOT limited to containing "Columns" elements with the same database "Table" value.  For example, Graduate_Admissions.reportType, with a base object of Contact, has a section named "Recruit data" with fields from both the "Contact.Program_Interests__r" and "Contact" tables in it.
    # A given database "Table" value can appear in multiple "Section" tags within a ReportType's XML file (e.g. "Contact" fields could be spread across multiple sections).
    # CampaignMember often ends up repeated -- once as a top-level, once below itself -- in a ReportType where it is included (with fields duplicated and everything).  Good thing I'm not yet adding any CampaignMember fields to reports.

# SOME CODE TO REMEMBER:
# The number of tables mentioned in a ReportType file:  len({e.text for e in root.findall('./sections/columns/table')})
# The actual tables mentioned in a ReportType file:  {e.text for e in root.findall('./sections/columns/table')}
# The number of "sections" in a ReportType file:  len(root.findall('./sections'))
# The labels of the actual "sections" in a ReportType file:  {e.find('masterLabel').text for e in root.findall('./sections')}
# Iterating over a dict:  for k, v in d.items():


"""
Algorithm per ReportType file, parsed into an XML-ElementTree node called "root":
- Figure out how many 1st-level nodes called "sections" there are.
- For each 1st-level "section" node, if it's "of interest," set it aside as a key to a dict called "d",
    and a list of all its relevant child 2nd-level "columns" nodes as the values.
- Loop through "d" and set aside any keys that have the most list-items as a value, of all keys in "d";
    set aside a list of any such keys as "topSections"
- If there was just 1 such "top section" key, set that 1st-level "section" node aside as the variable "se"
    (This is, visually to the end user, where we'll be adding the new field.)
- If there was more than 1, arbitrarily pick 1 and set that 1st-level "section" node aside as the variable "se"
- Once we've picked a 1st-level "section" node as "se," cache a dict "uniqueTableNames" 
    of the distinct words appearing among its grandchild (3rd-level) "table" tag values as "key"
    and the count-of-table-tag-per-distinct-word as "value."
- Presuming there were any (I guess this serves as a sort of dummy-check for the section),
    instantiate a new ElementTree "columns" node called "newColumn,"
    append "newColumn" to "se" (make it 2nd-level)
    and flesh out "newColumn" with details of the field we want to add.
    (When fleshing it out, we arbitrarily pick a value for the 3rd-level "table" tag if "uniqueTableNames" had several keys.)

"""

changedThese = []
pkgroot = None
for i in os.listdir(pstr):
    root = None
    if not i.startswith('wrt_'):
        with open(pstr+i, 'r', encoding='utf-8') as f: # Open a ReportType metadata XML file
            root = stripNSFromET(ElementTree.parse(f).getroot()) # Store a reference to the root of the XML file currently open in a variable
            d = {}
            allFieldsForTableOfInterestColsInAllSectionsOfInterest = []
            for sec in root.findall('./sections'):
                columnsOfInterest = [e for e in sec.findall('columns/table/..') if isTableOfInterest(e.find('table').text, fieldsDBTableAPIName)]
                if len(columnsOfInterest) > 0:
                    d[sec] = columnsOfInterest # Add to "d" any "section" and a list of applicable "columns" inside it
            if len(d) < 1:
                continue # This file is not of interest if nothing got added to "d" (if it's not a ReportType that includes any fields of the object of interest) -- move on to next file
            else:
                print('\r\n' + i + ', baseObject:  ' + root.find('./baseObject').text) # Display which file we are working with and what its "Base Object" is
                se = None
                if len(d) > 0: # Why did I have this at ">1" when I found it 6 months later?  Should it be >0?
                    allFieldsForTableOfInterestColsInAllSectionsOfInterest = [item.find('field').text for sublist in d.values() for item in sublist]
                    if all(fldNm in allFieldsForTableOfInterestColsInAllSectionsOfInterest for fldNm in fieldAPINames):
                        continue # This file is not of interest if all fields are already in it
                    topSections = {k for k, v in d.items() if len(v) == max([len(arr) for arr in d.values()])}
                    if len(topSections) >= 1:
                        if len(topSections) == 1:
                            se = (next(iter(topSections))) # There was only 1 top-ranked section -- pick it
                        else:
                            topSecsWithLabelLikeTableNewFieldIsFrom = [s for s in topSections if s.find('masterLabel').text in [fieldsDBTableAPIName, fieldsDBTableAPIName+'s']]
                            if len(topSecsWithLabelLikeTableNewFieldIsFrom) > 0:
                                se = (next(iter(topSecsWithLabelLikeTableNewFieldIsFrom))) # I don't really care which it is, honestly
                            else:
                                se = (next(iter(topSections))) # I tried my best -- moving on.  Just picking a section.
                    else:
                        se = (next(iter(d.keys()))) # It was a 1-section file -- just pick the one section
                if se:
                    uniqueTableNames = {tbstr : [e.text for e in se.findall('columns/table')].count(tbstr) for tbstr in [e.text for e in se.findall('columns/table')]}
                    if len(uniqueTableNames) >= 1: # We can just tack our new column onto the only section that already has other Contact values
                        changedAnything = False
                        for fieldAPIName in fieldAPINames:
                            if fieldAPIName not in allFieldsForTableOfInterestColsInAllSectionsOfInterest:
                                changedAnything = True
                                newColumn = ElementTree.Element('columns')
                                se.append(newColumn)
                                newColumn.append(ElementTree.Element('checkedByDefault'))
                                newColumn.find('checkedByDefault').text = 'false'
                                newColumn.append(ElementTree.Element('field'))
                                newColumn.find('field').text = fieldAPIName
                                newColumn.append(ElementTree.Element('table'))
                                if len(uniqueTableNames) == 1:
                                    newColumn.find('table').text = next(iter(uniqueTableNames))
                                elif len(uniqueTableNames) > 1:
                                    newColumn.find('table').text = max(uniqueTableNames, key=uniqueTableNames.get)
                        if changedAnything:
                            se[:] = sorted(se, key=lambda x: x.tag) # Put masterLabel tag back at the end of the section.  We don't need to re-sort the fields because we might screw up where people were expecting to see them, if they weren't yet in alphabetical order.
                            root.set('xmlns','http://soap.sforce.com/2006/04/metadata')
                            if not os.path.exists(outpstr): os.makedirs(outpstr)
                            if not os.path.exists(outpstr+'reportTypes'): os.makedirs(outpstr+'reportTypes')
                            with open(outpstr+'reportTypes\\'+i, 'w', newline='') as fw:
                                dom_string = xml.dom.minidom.parseString(ElementTree.tostring(root)).toprettyxml(encoding='UTF-8', indent='    ')
                                dom_string = '\n'.join([s for s in dom_string.decode('UTF-8').splitlines() if s.strip()]) + '\n'
                                fw.write(dom_string)
                            changedThese.append(i[:-11])
if len(changedThese) > 0:
    pkgroot = ElementTree.Element('Package', attrib={'xmlns':'http://soap.sforce.com/2006/04/metadata'})
    typesElem = ElementTree.Element('types')
    for x in changedThese:
        membersElem = ElementTree.Element('members')
        membersElem.text = x
        typesElem.append(membersElem)
    namesElem = ElementTree.Element('name')
    namesElem.text = 'ReportType'
    typesElem.append(namesElem)
    pkgroot.append(typesElem)
    verElem = ElementTree.Element('version')
    verElem.text = metadataAPIVer
    pkgroot.append(verElem)
    with open(outpstr+'package.xml', 'w', newline='') as fw:
        dom_string = xml.dom.minidom.parseString(ElementTree.tostring(pkgroot)).toprettyxml(encoding='UTF-8', indent='    ')
        dom_string = '\n'.join([s for s in dom_string.decode('UTF-8').splitlines() if s.strip()]) + '\n'
        fw.write(dom_string)

print('all done')

Wednesday, September 13, 2017

Vendor-Provided Unit Test Bug "Fix" -- Just Add "SeeAllData=true"!

Wow, does one of our managed-package vendors have me riled up.

I detected a bug in their code that means an "@isTest"-flagged unit test can't INSERT a new record from various tables (including "User" records of a certain type of "Profile").

In other words, I can't write unit tests that have anything to do with several of our core tables.

I reported the bug and pointed out that:

  • such INSERT statements work fine against "the real database," executed in the "Execute Anonymous Window" of the Developer Console, and that
  • they work fine with the "Run Test" button in "@isTest(SeeAllData=True)"-flagged unit tests (a big no-no), but that
  • as soon as you wrap such an INSERT statement in a test method/class annotated simply with "@isTest," it fails against various closed-source-code triggers in their managed package.

Clearly, their triggers have some weird bugs that they never detected because they "cheated" when writing all their unit tests by adding "(SeeAllData=True)" to "@isTest" (that's the case in all the open-source work they custom-developed for us).
Their codebase doesn't seem to be able to hold up to an empty database the way Salesforce code is supposed to, so it seems they simply make sure they're never showing it off to the customer on an empty database and have all unit tests look at "actual data" in the database.
Lovely.

So what'd they do in response to my problem?

Added "(SeeAllData=True)" to my proof-of-bug code and told me everything was all better.

*glares*

No.

It. most. definitely. is. not.


P.S. In the 29 new lines of code you "added" to my proof-of-bug unit test as well, vendor, you exposed a 2nd broken closed-source trigger when I turned off your "(SeeAllData=True)." So thanks for that, I guess -- saved me opening a 2nd ticket? *cough* #NotThatYouShouldHaveFoundItYourselfOrAnything

Friday, July 14, 2017

Python & Oracle SQL Hash + Regexp Replace Equivalents -- random note to self

Oracle SQL, for pulling from ERP:

standard_hash(regexp_replace(COLUMNNAMEGOESHERE, '\s', ''), 'SHA1')

Python, for cleaning up old values already loaded from ERP into Salesforce:

[hashlib.sha1(str(val).encode('utf-8')).hexdigest().upper() for val in (PANDASDATAFRAMEVARIABLENAMEGOESHERE['COLUMNNAMEGOESHERE'].str.replace(r'\s', ''))]

Note to self, lest I forget. This strips the whitespace (including line breaks) out of text, then condenses it into a 40-character hash of itself. Used for turning a big chunk of text that I expect to be unique in a table into a shorter Salesforce "unique external ID" text field when the table containing the text doesn't have a standard "primary key."

Thursday, July 6, 2017

REST APIs and Salesforce - A Primer

A non-programmer colleague was getting advice to integrate an external database with Salesforce by building a "REST API" inside of Salesforce, and I chimed in because for a long time, 3 different ways of referring to "REST APIs" and "Salesforce" confused me.
 

Here's a breakdown of them that I hope will facilitate discussions with developers and help you learn the right kind of programming for your needs, if you're teaching yourself:
 

  1. When people mention "building REST APIs inside of Salesforce," they mean using the "Apex" language to program a specific Salesforce database ("org") to ACCEPT requests FROM THE OUTSIDE WORLD to engage in a computer-to-computer data-transferring conversation over the web.

    The data can potentially flow both ways (more in a second), but only via the outside world knocking on your Salesforce org's door and saying, "Hey, I've got another communications request!" in the exact manner that you've said you want your Salesforce org to be communicated with.

    That's what "writing an API" means -- it means using code to define how a computer is willing to have its door knocked on and be talked to.

    When you "build a REST API" inside of Salesforce, you're building a door in your fortress and posting a guard by it who's cautious about how she interacts with strangers on the other side.
    • You might program your Salesforce org to know what to do with data from the outside world, but only when delivered correctly (e.g. "If you have something for the fortress, bring it in a series of pink 12x12x12 boxes, only paint the first one green and the last one purple, and put your name and return address on all of them -- if we like them, we'll figure out what to do with the stuff inside the boxes from there, and after we're done, we'll post a receipt on the door telling you whether we handled them neatly or burned them").
    • Or you might program your Salesforce org to hand out data from inside the fortress to the outside world, if asked just right ("Tap out 'shave-and-a-haircut-two-bits,' then the morse code for your name and return address, and a description of the data you want, and we'll FedEx a list to you").
    Or you might set up communications protocols for each, to really facilitate 2-way conversation--yet note that every piece of that conversation is initiated by some "outsider" piece of software knocking on your fortress's door in a specific way, no matter which way the "data you care about" is flowing.
     
     
  2. There's ALSO a general-purpose API, built into every Salesforce database ("org") in existence, that uses the "REST" communications protocol. Salesforce defines and maintains how it works. It's a door that Salesforce the corporation cut into your fortress and posted a guard at.

    This "API" is what the authors of "ETL" tools like Apex Data Loader & dataloader.io & Jitterbit & Zapier study carefully before writing such tools.

    Technically, you can write your own tool taking advantage of this protocol, but most people don't bother, since other companies have done such great work.

    Also, if you do option #1, you can sometimes INSERT more data in fewer "API calls" than by using tools that leverage the general-purpose Salesforce "REST" data API.

    Both option #1 & option #2 give you the chance to write code inside your Salesforce database that post-processes incoming data according to your business's needs -- although in the case of approach #1, you might bundle that logic into the back end of the API for efficiency, whereas in the case of approach #2, you have to use "triggers" and such.
     
     
  3. Finally, you can write Apex code within your Salesforce org that goes knocking on the doors of OTHER fortresses.

    Doing so might involve writing Apex that uses the "REST" protocol and someone else's "API," but it's not at all the same thing as the 1st concept we discussed.

    When learning how to do this, you'll often hear terms like "writing asynchronous Apex" and "making HTTP requests with Apex."

Thoughts:

Most admins' approaches to connecting an outside database land in the realm of approach #2, using someone else's "ETL" tool -- maybe with some "developer" help to write "triggers" to post-process incoming data according to your business needs. It's the easiest approach by far.

Some "apps" and "packages" and "plugins" might have done #1 as part of the software package they deliver to you (that is, they might cut a hole in your fortress and post a guard).

Personally, I've barely dabbled in "approach #2, roll-your-own-ETL-tool," and I've only been to classes and read books on approaches #1 & #3.

Watching from the outside, it seems to me that "approach #1" & "approach #2, roll-your-own-ETL-tool" achieve pretty similar business-need ends and require similar levels of developer skill.

  • It's just a question of whether you prefer to be constrained to programming in Salesforce's language, yet have more control over Salesforce database performance (approach #1),
  • Or whether you prefer to be a little less efficient, but have more control over the language in which you program ("approach #2, roll-your-own-ETL-tool").

(Although even in the case of approach #1, you'll probably be writing some sort of "ETL" software in your favorite language that does the actual knocking on your Salesforce database's ("org's") door -- it'll just probably be a lot simpler to write than whatever you would've written that relies solely upon Salesforce's standard, built-in API.)


Finally, a brief analogy for any Oracle developers following along:

REST-based data transmission : data flow in & out of Salesforce databases :: ODBC/JDBC-based data transmission : data flow in & out of traditional relational databases

Specifically, the built-in "approach #2" to using REST might be most comparable -- not sure how many database management systems let you write your own APIs into them (#1) or make callouts to internet services from them (#3).

Wednesday, June 28, 2017

Neat PDF About When To Consider Python

Neat poster here (PDF) by Roxanne Johnson indicating when to consider Python for data projects, when to stick with Excel, how to find self-help resources if you delve into Python, etc.

Large page and small print, so zoom in.

Wednesday, May 24, 2017

Job-Hunting Advice For Career Changers -- Talking About Transferable Skills

I'm mentoring a student worker as she searches for jobs after finishing her graduate degree at our university. She came to graduate school full-time to learn skills in a new field, which means she has recently learned a number of new programming languages in an academic setting, rather than on the job. Unsure how to keep recruiters on the phone and find some way to make her several years of work experience before and during graduate school "count" towards a job using these programming languages, she asked me if I had any advice. Here are some tips I gave her.

  • The written job description is gold to you.
    • The bottom half usually describes the technical skills you're afraid of seeming too new at. Don't get too wrapped up in it.
    • The top half usually describes the job itself. It's where a career-changer like you will find the most "source material" for talking about why you're qualified to do the job.
      • You wish they would "give you a chance" to "learn on the job," right?
        • Well, they won't. Not if you put it like that.
        • But if you point out how you've done things just like that, only different, but pretty much the same, and HERE'S HOW -- well, now you sound like someone ready to "jump into the deep end of the pool," don't you?
          Welcome to the world of "selling your transferable skills."
      • The "what this position does" "top half" of a job description is where you find out exactly what you need to say your prior experience is "just like," and where you'll figure out the difference between your prior experience and the future job (so that you can explain what other transferable skills you have that address the difference).
    • If you don't have a written job description because a recruiter is summarizing a job orally and focuses on the "bottom half," confidently direct them to tell you about about the "top half" so you can direct the rest of the phone call from a position of strength.
      • Example: "Ah, yes, I have worked with those technologies in my graduate coursework, and I spent a year at ___ and three years at ___ doing related work with [related technology or transferable experience/skills] -- that might be a good fit. Can you tell me more about the job responsibilities and day-to-day work?"
  • Rehearse, rehearse, rehearse. Practice makes perfect.
    • Treat this like you would ask a child to treat a role in a "school play."
      1. Memorize the lines
      2. Look in the mirror and practice the lines -- master your projection of confidence and your ability to remember your lines while looking someone in the eyes / get over any sense of disgust at watching yourself.
      3. If you have time, look in the mirror again and practice making your lines your own -- practice being easy and relaxed and you while delivering them.
  • Of course, you have to write your own lines -- you're the playwright as well. But you already know what the play is going to be about -- it's going to be about the job in the advertisement and about you! So write that script!
    • Print a copy of the job description and a copy of your resume. Maybe grab some spare paper if there isn't a lot of room in the margins.
    • Look through the "job duties" and TRANSLATE them from the way they're written to "something your grandmother would understand when she asks what you do for a living."
    • Look through the "job duties" and find the IMPLIED job duties. (For example, if you have to "deliver" something, implied transferable skills & experiences might be "ahead of schedule" or "to very important people." Another example -- if it talks about "different" people you'll be working with ... implied is, "You'll be pulled in a million different directions and need to be able to handle it politely." If you've done that before ... yay, you have experience in something they need!)
    • Now, for all the plain-English and implied real job duties, figure out what you've already done, at school or at work, and write notes to yourself.
    • Finally, go through your notes and put together little stories about how great at the job you'd be, mentioning that you see they need the position to do ____, and how that sounds like the time in ____ when you _____ed.
      • Repeat until you're out of such stories -- then start this process over for the next job advertisement!
      • You might want to have a few different versions of any story that's too technical -- the "highly technical" (but still 60 seconds or less) version and the "explain it to a 6-year-old in 15 seconds or less" version. Always start by sharing the "explain it to a 6-year-old, 15 seconds or less" version of a story, only going into the "highly technical" version if pressed for more details.
      • When talking about how you have ___ experience on the job and ___ experience at school, don't leave it to the interviewer to infer that the two can be added together into a competence. This is YOUR JOB and is why you're rehearsing. If you'd like, you can start with phrases like, "In my experience" or "Given my experiences on the job and in my coursework," which have 3 advantages:
        1. You don't have to worry about seeming like you're being contradictory -- it's just your opinion. So it's polite / deferential.
        2. It's hard for someone to challenge the accuracy of. It's just your opinion. So it's confident (because it's hard to challenge).
        3. It brings attention to the fact that you have experience and that you can put all of your experience together into one big package that's useful to them.
  • This same process can also be used when deciding what to write in a cover letter -- only you have to be much, much more concise and not tell as many stories.
  • Finally, if you're worried about professional "word choice" when answering difficult questions, read Alison Green's 'Ask A Manger' advice blog religiously. Eventually, you'll pick up on her style, which I think is excellent.

Good luck!

Python CSV Example - Back-Filling Nearby Zip Code Distances

Today I used the following small Python script to add a bunch of distances to a new field called "Miles_From_Us__c" on 40,000 pre-existing Contact records.

The process was as follows:

  1. Export "SELECT ID, MAILINGPOSTALCODE, MILES_FROM_US__C WHERE MAILINGPOSTALCODE != NULL AND MAILINGPOSTALCODE != '' AND MILES_FROM_US__C = NULL" from our Salesforce org to a CSV file on my hard drive at "c:\tempexample\contactzipextract.csv" using the "Apex Data Loader" software.
  2. Run the Python script in this blog post instead of bashing my head against the wall with Excel.
  3. Import the CSV file on my hard drive at "c:\tempexample\contactzipupdate.csv" back into our Salesforce org as an "UPDATE" operation using the "Apex Data Loader" software.

 

import pandas

zips = ['11111','22222','33333','44444']

zipstodist = {'11111':0, '22222':1, '33333':9, '44444':21}

df = pandas.read_csv('c:\\tempexample\\contactzipextract.csv', dtype='object')

df = df[df['MAILINGPOSTALCODE'].str[:5].isin(zips)]

df['MILES_FROM_US__C'] = df['MAILINGPOSTALCODE'].str[:5].apply(lambda x : zipstodist[x])

df[['ID','MILES_FROM_US__C']].to_csv('c:\\tempexample\\contactzipupdate.csv', index=0)

Here's an explanation of what my script is doing, line by line, for people new to Python:

  1. Says that I'll be using the "Pandas" plug-in to Python.
  2. Says which U.S. postal codes ("zip codes") I care about as a Python "list" that I give a nickname of "zips."
    • I got this from the first column of a CSV file of "zip codes of interest" that a consultant sent one of my users.
  3. Defines the "distance from us," in miles, of each zip code of interest, as a Python "dict" that I give a nickname of "zipstodist."
    • I got this from the first and second columns of a CSV file of "zip codes of interest" that a consultant sent one of my users.
    • I could have cross-checked the two CSV files in Python, but I had already transformed the CSV data into this list style when writing some Salesforce Apex code for future maintenance, so copying/pasting it into Python was easier in this case.
  4. Reads my CSV file into Python as a Python "Pandas dataframe" that I give a nickname of "df."
  5. Filters out any rows whose first 5 characters of the value in the "MAILINGPOSTALCODE" column aren't among my zip codes of interest.
  6. Sets each row's"MILES_FROM_US__C" to the distance corresponding to the first 5 characters of that row's "MAILINGPOSTALCODE" value.
  7. Exports the "ID" & "MILES_FROM_US__C" columns of my filtered, transformed dataset out to a new CSV file on my hard drive.