Pages

Monday, August 22, 2016

Python for Salesforce Administrators - Before You Begin Programming

Python is a programming language.

It's known for being extremely straightforward to read and write.

The "plain-old" version of the Python language comes with a good number of useful commands. On top of that, though, "modules" exist that, once downloaded into the proper folder on your computer, add even more commands to Python.

Python has gained a reputation as one of the best programming languages available for ordinary people who need to quickly process data. In my opinion, here's why:

  1. It's so easy to program
  2. So many useful data-processing "modules" are available
  3. Software is available that lets you run Python programs on your computer without "installing" anything (which you might not be allowed to do at work)

That said, there are a few confusing things I should warn you about.

Python 2 vs. Python 3

The first one is that there are TWO SLIGHTLY DIFFERENT versions of the language that are in use, side-by-side, right now. One is "Python 2." The other is "Python 3."

Between version 2 & 3, the inventors of Python and its "modules" changed the way you type certain commands to make them run.

This means that a program you write for "Python 2" may not run in software that can run "Python 3" programs, and vice-versa.

But more importantly, it means that some super-cool "module" you found online may only run in software than can run one version of Python or the other. (Because all those new commands that a "module" gives you are just mini-Python-programs behind the scenes, and they have the same compatibility issues as the software you write.)

Therefore, when you put software that can run Python on your computer, I recommend having 2 copies - one that can run "programs written for Python 2," and one that can run "programs written for Python 3."

If you have trouble running a program in one piece of software, try running the same program in the other piece of software. Or Googling how to change your code to make it more appropriate for the piece of software you're running it in.

Actually ... What I really recommend is starting with software to run "Python 3" and only adding something to run "Python 2" if you need to write a program that relies on a module that's incompatible with "Python 3."

The Language

As I've mentioned, Python is a programming language that you can use to write software.

Modules

"Modules" are downloadable Python mini-programs that can save you the trouble of writing hundreds of lines of code, letting you write just a few words of code instead. As far as you're concerned, they just make the set of available commands in Python bigger.

To refer to a "module's" command in a Python program you're writing, you have to write "include ModuleNameHere" somewhere above the line where you type that command.

In addition to that, the "module" itself (the mini-Python-program) needs to be downloaded to your computer somewhere that your "software that runs Python programs" can find it.

Typically, when you download "software that runs Python programs," it comes bundled with a few dozen of the most common modules pre-downloaded where they need to be.

For example, we may make use of "csv" and "pandas" - those often come in such bundles.

If for some reason a module you need doesn't, you'll need to download it into the proper folder on your computer yourself.

To make that easier, certain "pieces of software for running Python programs" come bundled with another program called "pip" and a command-line interface for interacting with it.

If your software has this, it's by far the easiest way to download modules to your computer.

Typically, you just type something like "pip install pandas" into that special command line, and it finds it on the internet and downloads it to exactly where it needs to go.

In my instructions, I'll presume you figured out how to "pip" or download-and-correctly-unzip any modules I refer to. (Always just try "include ModuleNameHere" in your code and see if your software complains about not being able to find it before going out of your way to download a module.) Feel free to write me in the comments area if you need more help.

"Interpreters" & "IDEs"

Finally, you'll want an "IDE." Again, it often comes bundled with "pieces of software for running Python programs." IDE stands for "Integrated Development Environment."

You'll typically save your software on your hard drive in plain-text files with the extension ".py".

You can edit these files in a simple text editor like "Notepad" and then type commands to "run" these files into the command-line interface of your "software that can run Python programs" (this is an interface to its "Python interpreter").

But even better is having a text-editor with a "play button" that runs whatever you're editing and displays any textual output in another corner of the screen. And that color-codes the code you're writing so you can tell what you're doing, and that tells you when you are missing something you should have written, etc. That's an IDE!

Click here to learn more about setting yourself up in Windows or, for this blog's exercises only, following along in your web browser.

"Print" statements

Writing text to "standard output" (which means "whatever part of your screen the software running your code thinks you might see some text in if it put it there") requires using a "print" command in the Python language.

I'm not sure why, but some "software that can run Python" seems to require that if you want the phrase, "Hello World" to show up in this "standard output" area, you write the command, "print 'hello world'", whereas other software seems to want you to write "print('hello world')".

Figure out early on what your software for running Python programs prefers, and write your code accordingly. Note that Stack Overflow (a Q&A site about programming) code examples or this blog's code examples may have "print" statements written differently than you need to write them.

My Windows-Centrism

I don't have access to a Mac, or a Linux machine, so I'll be writing this blog based on using Windows.

I have both "2.x.x.x" & "3.x.x.x" versions of WinPython on my machine, unzipped to folders I created under "My Documents" (rather than "installed" on my computer). I really like all the modules, "pip," and "IDE" features bundled with it, and the fact that I don't have to "install" it on my computer (i.e. have administrator rights).

Click here for an installation step-by-step guide for Windows & "Python 3."

Here is a list to a few other such bundles.


If you're on an Apple / Linux, I apologize, as I've completely glossed over actually setting up your computer so you can run Python. Give it a try yourself, but I am here if you need me - just leave a message in the comments.


Table of Contents

Python For Salesforce Administrators - Introduction & Table of Contents

When you're a Salesforce administrator, you have to keep your database clean.

To that end, you probably export a lot of records from Salesforce to CSV. Then you manipulate that CSV file in Excel. Finally, you re-import the resulting CSV into Salesforce.

Excel is great - but it may not always be the fastest tool available.

For example, what if you have two spreadsheets worth of data that you need to match against each other based on values in certain columns and combine?

You would have to use extremely clunky VLOOKUP() or INDEX(MATCH()) operations in Excel.

And if you wanted to match based on data in multiple columns, first you would have to copy/paste those columns' data "concatenated" into a new column.

There is a better way!

In this "Python For Salesforce Administrators" series of posts, I introduce Python as a tool for performing data-transformation operations that Salesforce administrators might find themselves needing to do.

(If this goes well, we might even move beyond CSV-style data into the "XML"-style data that represents the configuration of your Salesforce org.)

This post will be edited with links to posts in this series as they are published. There are 6 so far - check them out and stay tuned for more.

Table of Contents

Friday, July 29, 2016

Marketo Activities -> Salesforce Tasks with REST and Python

A year and a half ago, we had to turn off 2-way syncing in Marketo. It was messing up "Contact" record fields too badly.

Unfortunately, that also meant we lost the functionality of letting Marketo create a new "Task" for a Contact's "Activity History" every time we sent them an email. So we've got a year-and-a-half gap in such "Task" records in Salesforce to fill. (It's okay for it to be a one-off, because we're consolidating mass-mailing systems between departments and getting rid of Marketo.)

There's no easy "export a CSV file of every email we've ever sent every person in Marketo" feature in Marketo's point-and-click web-browser-based user interface. Tech support said we'd have to write code to query their REST API.

My basic algorithm is as follows:

  1. Use Python ("requests" library) code against the Marketo API to get a "dict" of every "Email Sent" activity in Marketo (and which LeadId it belongs to and the name of the email according to Marketo)
  2. Use Marketo's export functions against "All Leads" to get a CSV file containing a mapping of Marketo Lead IDs to Salesforce Contact IDs & save it to my hard drive.
  3. Use Salesforce's Data Loader to get a CSV file containing all existing "Task" records from the old "Marketo Sync" email tracking & save it to my hard drive.
  4. Use Python ("pandas" library) code to append Salesforce IDs from the CSV in step #2 to the "dict" in step #1, while subtracting emails Salesforce already knows about (match on "WhoId" & "Subject" in CSV from step #3). Write the result out to CSV.
  5. Use Salesforce's Data Loader to import the CSV from step #4 into the "Tasks" table of Salesforce

The Python script is:

import requests
baseurl = 'https://xxx.mktorest.com'
clientid = 'cid'
clientsecret = 'secret'
apitaskownerid = 'xxx' #(Marketo API sf user)
activsincedatetime = 'yyyy-mm-ddThh:mm:ss-GMTOffsetHH:00'
accesstoken=requests.get(baseurl + '/identity/oauth/token?grant_type=client_credentials' + '&client_id=' + clientid + '&client_secret=' + clientsecret).json()['access_token']
firstnextpagetoken = requests.get(baseurl + '/rest/v1/activities/pagingtoken.json' + '?sinceDatetime=' + activsincedatetime + '&access_token=' + accesstoken).json()['nextPageToken']

def getactivs(pagetok):
    ems = []
    activsbatchjson = requests.get(baseurl + '/rest/v1/activities.json' + '?nextPageToken=' + pagetok + '&activityTypeIds=INSERTNUMBERHERE' + '&access_token=' + accesstoken).json()
    if 'result' in activsbatchjson:
        ems.append(activsbatchjson['result'])
    if activsbatchjson['moreResult'] != True:
        return ems
    else:
        return getactivs(activsbatchjson['nextPageToken'])

emailssent = getactivs(firstnextpagetoken)[0]

import pandas
activdf = pandas.DataFrame(emailssent, columns=['leadId', 'primaryAttributeValueId', 'primaryAttributeValue', 'activityDate'])
leaddf = pandas.read_csv('c:\\temp\\downloadedmarketoleads.csv',usecols=['Id', 'Marketo SFDC ID'])
joindf = pandas.merge(activdf, leaddf, how='left', left_on='leadId', right_on='Id')
joindf.drop(['Id'], axis=1, inplace=True, errors='ignore')
joindf.rename(columns={'Marketo SFDC ID':'WhoId'}, inplace=True)
joindf['Status'] = 'Completed'
joindf['Priority'] = 'Normal'
joindf['OwnerId'] = apitaskownerid
joindf['IsReminderSet'] = False
joindf['IsRecurrence'] = False
joindf['IsHighPriority'] = False
joindf['IsClosed'] = True
joindf['IsArchived'] = True
joindf['Custom_field__c'] = 'Marketo Sync'
joindf['Subject'] = joindf['primaryAttributeValue'].map(lambda x: 'Was Sent Email: ' + x)
joindf['Description'] = 'Marketo email history backfill'

existingtasksdf = pandas.read_csv('c:\\temp\\downloadedsalesforcetasks.csv') # SELECT ActivityDate,CreatedDate,Description,LastModifiedDate,Subject,WhoId FROM Task WHERE Custom_field__c = 'Marketo Sync' AND Subject LIKE 'Was Sent Email: %' AND IsDeleted = FALSE
existingtasksdf['matched'] = True
not_existing = pandas.merge(joindf, existingtasksdf, how='left', on=['WhoId','Subject'])
not_existing = not_existing[pandas.isnull(not_existing['matched'])]
not_existing.drop(['matched'], axis=1, inplace=True, errors='ignore')
not_existing.to_csv(path_or_buf='c:\\temp\\newtasksreadyforinsert.csv', index=False)

(Note - not a lot of through given to security with respect to this script and its use of login credentials in a GET (although HTTPS) call. It's a one-off script, we're about to shut down the system, and I deleted the credentials from Marketo's configuration shortly after running the script. Your requirements may vary - you may need a more robust way of authenticating.)


P.S.

Updated code coming soon - I had to change the "getactivs" section of the code from a recursive solution (not sure why that came to mind first...it just did that day...) to an interative one because it errored out when fetching half a million records 300 at a time.

I also ended up dumping the Marketo API half-million-record response's output to CSV and commenting out the post-processing part of the code, then commenting out the API-fetch and running post-processing against that CSV. The API-based fetch wasn't always getting a complete data set. (I suspect my authorization key might have been expiring. However, by the time I ran it with code to debug the problem, I was working at start-of-business, and - probably due to lower network traffic - the fetch finally ran without issue, so I'll never know. All I know is I got my data once, and once works for me.)

Finally, I ran into some snags with data Pandas couldn't read from CSV in the API output (some sort of weird em-dash in the data, I think), so the updated code will include even more low-performance kludges to work around that. (Using the CSV module and a loop to build a list-of-dicts and having Pandas read that worked.)

Thursday, July 28, 2016

Wednesday, July 20, 2016

Big Job: Apex vs. Manual CSV File Manipulation vs. Python-Is-Cool

In our Salesforce org, we have an object called "Extended Contact Information" that's in a "detail" relationship hanging off of "Contact."

It lets us use record-types to cluster information about Contacts that is only of interest to certain internal departments (and lets us segment, department-by-department, their answers to questions about a Contact that they might feel differently about ... such as whether they're the best way to get ahold of a company).

We also have a checkbox field for each internal department that lives on Contact, and that needs to be checked "true" if a department is working with someone. (Whenever a department attaches an "Extended Contact Information" object to someone, we have a trigger in place that checks that department's corresponding box on "Contact," but it can also be checked by hand.)

Our computers mostly care about this checkbox, but humans care about the "Extended Contact Information" record, so today I did an audit and noticed 4,000 people from our "Continuing Ed" department were flagged in the the checkbox as working with that department but didn't have "Extended Contact Information" records of that type attached.

I wrote the following Apex to execute anonymously, but due to a bazillion triggers downstream of "Extended Contact Information" record inserts, it hits CPU time execution limits somewhere between 100 at a time & 300 at a time.

List<Contact> cons = [SELECT Id, Name FROM Contact WHERE Continuing_Ed__c = TRUE AND Id NOT IN (SELECT Contact_Name__c FROM Extended_Contact_Information__c WHERE RecordTypeId='082I8294817IWfiIWX') LIMIT 100];

List<Extended_Contact_Information__c> ecisToInsert = new List<Extended_Contact_Information__c>();

for (Contact c : cons) {
    ecisToInsert.add(new Extended_Contact_Information__c(
        Contact_Name__c = c.Id,
        RecordTypeId='082I8294817IWfiIWX'
    ));
}

insert ecisToInsert;

Probably the fastest things to do are one of the following:

  1. run this code manually 40 times at "LIMIT 100"
  2. export the 4,000 Contact IDs as a CSV, change the name of the "Id" column to "Contact_Name__c," add a "RecordTypeId" field with "082I8294817IWfiIWX" as the value in every row, and re-import it to the "Extended Contact Information" table through a data-loading tool

But, of course, the XKCD Automation Theory part of my brain wants to write a Python script to imitate option #2 and "save me the trouble" of exporting, copying/pasting, & re-importing data. Especially since, in theory, I may need to do this again.

TBD what I'll actually go with. Python code will be added to this post if I let XKCD-programmer-brain take over.

As a reminder to myself, here's a basic Python "hello-world":

from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
print(sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2"))

Output:

OrderedDict([('totalSize', 2), ('done', True), ('records', [OrderedDict([('attributes', OrderedDict([('type', 'Contact'), ('url', '/services/data/v29.0/sobjects/Contact/xyzzyID1xyzzy')])), ('Id', 'xyzzyID1xyzzy'), ('Name', 'Person One')]), OrderedDict([('attributes', OrderedDict([('type', 'Contact'), ('url', '/services/data/v29.0/sobjects/Contact/abccbID2abccb')])), ('Id', 'abccbID2abccb'), ('Name', 'Person Two')])])])

And this:

from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
cons = sf.query_all("SELECT Id, Name FROM Contact WHERE IsDeleted=false LIMIT 2")
for con in cons['records']:
    print(con['Id'])

Does this:

xyzzyID1xyzzy
abccbID2abccb

Whoops. Looks like there aren't any batch/bulk insert options in "simple-salesforce," and I don't feel like learning a new package. Splitting the difference and grabbing my CSV file with Python, but inserting it into Salesforce with a traditional data-loading tool.

from simple_salesforce import Salesforce
sf = Salesforce(username='un', password='pw', security_token='tk')
cons = sf.query_all("SELECT Id, Name FROM Contact WHERE Continuing_Ed__c = TRUE AND Id NOT IN (SELECT Contact_Name__c FROM Extended_Contact_Information__c WHERE RecordTypeId='082I8294817IWfiIWX') LIMIT 2")

import csv
with open('c:\tempsfoutput.csv', 'w', newline='') as csvfile:
    fieldnames = ['contact_name__c', 'recordtypeid']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames, lineterminator="\n")
    writer.writeheader()
    for con in cons['records']:    
        writer.writerow({'contact_name__c': con['Id'], 'recordtypeid': '082I8294817IWfiIWX'})

print('done')

And then the CSV file looks like this:

contact_name__c,recordtypeid
xyzzyID1xyzzy,082I8294817IWfiIWX
abccbID2abccb,082I8294817IWfiIWX

Friday, June 3, 2016

Database Normalization And Salesforce - Ponderings

One of the biggest challenges in moving from a highly structured Oracle environment into Salesforce is how denormalized Salesforce's querying / reporting / data-viewing limitations encourage one's data model to be.

This particularly seems to be an issue in higher education. Like banking or healthcare, most of its data is rigid, with the typical business user's number one concern being easily expressed as, "Please don't mess up my data." Overall, higher education's business needs beg to have a university's databases insanely normalized.

Except on the fringes, that is. On the fringes, a university's interactions with the outside world stop being about individual human beings who care about "their" data "not being messed up." Prime "fringe" examples include:

  1. Email-address-based marketing, which is free and great, but hard to pin down to an actual person because people share
  2. Continuing education and seminars, where it's not really a big deal to swap attendees - if it's corporate training, who at the sending company cares who comes as long as someone gets trained?

These "fringe of the university's business" use cases, plus staff's desire to see the two sets of data married, seem to tempt universities to look at databases beyond their 20-year-old ERPs. (Of course such "data marriages" also desire the tall order of doing so without polluting vital information like the admission-graduation pipeline, employment history and paychecks, donation history!)

  • Unfortunately, this marriage means you're back to squeezing "fuzzy" data into a normalized structure. Which, in theory, you could do in your ERP if it gave you an easy way to add tables (and user interfaces to those tables).
  • Then there's the route of struggling to squeeze highly-normalized "vital" data into a "flat" structure like Salesforce encourages. Unfortunately, this can mean you're always fighting inaccuracy, lag, duplication...

Integrating Salesforce and our ERP (plus a few new data stores) is a much longer road than I was expecting, and it all keeps coming down to issues with matching a data model to the real world.

Banner is an amazing set of highly normalized tables for "don't lose my data" core university functions.
(If you get a look at the EnrollmentRx plugin for Salesforce as a Banner user, you'll be amazed - it's basically SPAIDEN/SRARECR/SAAADMS/SOATEST built into a newer database and restricted to "1 Contact/SPAIDEN record per email address." Everyone re-invents the wheel for higher ed because, like banking, some things just don't change.)

I think Sungard/Ellucian really missed the boat here.

They could have raked in the big bucks if they'd "taken care of the fringes" and:

  1. Given Banner just a few tweaks for easily adding/removing tables (and corresponding user interfaces)
  2. dded a good way to upsert the database 1-record-at-a-time
    (e.g. web forms - as FormAssembly's Salesforce connector proves, you really just need to have a user interface that lets a form-handler-configuration-managing user build complex SQL queries sorted by LastModifiedDate descending - not everything has to be "common-matched" - similar idea for fast deduplication ... DemandTools, which is to PL/SQL what Cognos is to SQL, exists because a good CRUD API to the Salesforce database exists)
  3. Engineered a great "over"-layer for email tracking and sending - especially by admissions departments.

For example, what if there were some sort of API that let you quickly create/update/delete SPACMNT records from what's otherwise basically a Cognos report? Like, a little pencil-shaped "edit" button near the list of comments, but otherwise, nothing is editable - it's just the read-only report (doing nice things like hiding details of inactive records and only showing you the most-recent SRARECR/SAAADMS/SGASTDN data available, as determined by the report-author)? Or a little "edit" button next to "status" that lets you pick from a picklist, and then when you commit it, the "interactive report" author has programmed logic to update SRARECR/SAAADMS/SGASTDN accordingly? That's basically what one of our main departments wants. A "collapsed, current, relevant" view of dispersed back-end data, with a few "edit" buttons interspersed so they don't have to surf through the normalized tables themselves to make updates.

The other major reason we tried Salesforce was because it had plugins for the web/email era and Banner didn't. (And a final reason was to accommodate those "fuzzy"-work departments that were using spreadsheets, Access databases, Rolodexes, etc.)

I often wonder where we'd be if we'd been able to do all this development straight into our highly-normalized ERP instead of half-rebuilding Banner inside of Salesforce.

What would we have done if we could have web'd/emailed/custom-tabled/easy-writeback'ed our existing database + data model instead of replicating our tried-and-true data model inside of Salesforce? What could we have done if Ellucian had tacked on the user-friendly concepts that Salesforce made popular instead of leaving their customers in the 1990s?

Salesforce's good luck and Ellucian's loss, I suppose. But I wish I often wish I could have my cake and eat it too - full normalization support and modern-database flexibility.

Since I can't, I still believe that the ideal for universities is to roll out Salesforce "from Rolodex to ERP," not "from ERP to Rolodex."

Thursday, May 26, 2016

Big Data

I recently gave a friend who wants to career-change into data science some advice for starting to learn about big data, and thought I'd share it here.

Firstly, decide whether you want to specialize in "data science" / "analysis" or whether you want to specialize in making the databases such people work with, well, work.

  • For the former, get good at statistics and program something at least once.
  • For the latter, get good at programming and expose yourself to a few statistics & algebra fundamentals if you didn't as a youth.
  • For either, learn what I suggest below about big data itself.

Okay, so you've decided you are willing to learn stats or programming - or you already know them and you want to talk your way into a job that will let you learn the "big data" details on the job. What do you need to know about "big data" to show that you have the fundamental knowledge to pick up the details after you start work?

Your homework, for learning about "big data," is not just to read, but to study all 21 articles in this series by Pinal Dave. And by that, I mean be able to answer the questions I suggest below about anything you learn from it (and from various Google tangents I hope you follow as you run into unfamiliar terms).

  • #1: Pay special attention to "the 3 v's" (velocity, variety, & volume - the 3 data problems that make people call data "big"). Always ask yourself:
    • "Which 3-v's problems does this approach to handling big data attempt to solve?"
    • "How?"
    • "How well is/isn't it doing so?
      • "In what situations is it strong/weak at solving those problems?"
      • "Why?"

Do not skip those questions. Seriously - study this like it's school. Grab a blank notebook, take notes on things you learn, and in those notes, answer these questions.

It is being able to answer these questions about the things you learn that will make you understand it so well "you could explain it to a 6-year-old." Which also means it you can explain it to the non-technical department heads interviewing you for a job and explain how well-suited this deep understanding makes you to solve their particular business problems. See what I'm doing here?


If you want to go deeper, I suggest that you especially learn about the 5 main types of database involved in "big data" (again, do the taking-notes-with-the-above-questions thing as you learn about these).

  1. "Relational"
    (This is your classic FileMakerPro / Microsoft Acces / Oracle database. The one you think of as a database. It's a bunch of Excel spreadsheets that cross-reference each other, and each record in a "spreadsheet" has a defined set of values you're allowed to fill in (think of the column headers.))
  2. "Key-Value"
  3. "Columnar"
    (A specialized form of key-value database. Learn how, plus why it's different enough to get its own name.)
  4. "Document"
    (A specialized form of key-value database. Learn how, plus why it's different enough to get its own name.)
  5. "Graph"
    (Some versions are a specialized form of key-value database. Learn how, plus why it's different enough to get its own name.)

About each of these 5 database types (deeply enough to do "compare & contrast" between them), learn:

  • "Which '3 Vs' problems + other problems does it try to address, how, and how well / for what types of data storage-retrieval needs?"
  • "What types of data storage & retrieval is it optimized for, speed-wise?"
  • "What types of data storage & retrieval is it optimized for, coding-wise? (What operations do & don't give programmers a nervous breakdown?)"
  • "How well can it be 'distributed' so that multiple computers can break up & simultaneously work on sub-pieces of a 'store' or 'retrieve' or 'retrieve-and-aggregate' (min, max, avg, etc.) request?"

Here's why:

If you want to start a successful catering business out of your home, you need to have some sense of how a kitchen's layout and tools impact what's easy to cook in it.

No point adding ice cream to the menu if you can't easily freeze things.

Same idea with "big data." It's important to be able to recognize the pros & cons of a given environment + set of tools for solving a given problem.

Again, if an interviewer says a company is having trouble with their [insert brand here] database and that they're trying to solve [insert problem here], how important to the company would you be if you're the person who can see that there's a fundamental mismatch between what they're trying to do and how they're organizing the data they need to do it with? (Or if there isn't a mismatch and they just need a good analyst/programmer on board, yay, you will be able to recognize that you could have interesting tasks ahead of you.)

For this deeper dive, I highly recommend a book called "Making Sense of NoSQL."


Finally, be aware that this is all just new ways of thinking about applying very old mathematics and logic to solve data storage/retrieval/analysis/visualization problems in light of the problems being "bigger" (see "3-V's").

Which is to say that you can't go wrong studying the old stuff.

  • For the analysts, it's particularly heavy in statistics and visualization from around the 1800's.
  • For the programmers, it's particularly heavy in "information retrieval" principles from throughout the 1900's.

Good luck!