epsman.epsProcessor module

Class for ePolyScat job post-processing, inc. notebook generation & pushing repo to web.

19/02/21 v1, moved from previous uber-class to keep functionality cleaner and reflect usage.

CURRENTLY: just set as per old base class, but should clean up with only used functions and/or inheritance.

class epsman.epsProcessor.epsProcessor(host=None, user=None, IP=None, password=None, mol=None, orb=None, batch=None, genFile=None, verbose=1)[source]

Bases: epsJob

Class for ePolyScat job post-processing, inc. notebook generation & pushing repo to web.

Main function is local and remote path management (via pathlib), job configuration set-up (manual), and local & remote file IO (via Fabric/Invoke).

19/02/21 v1, moved from previous uber-class to keep functionality cleaner and reflect usage.

CURRENTLY: just set as per old base class, but should clean up with only used functions and/or inheritance. UPDATE: set to inherit from epsJob class for comms & util methods.

buildArch(localLoop=True, dryRun=True, hide=True)

Build archives/packages for job.

Wrapper for repo/pkgFiles.py, which runs on host machine.

Parameters:
  • localLoop (bool, default = True) – Run version of code with job loop on local machine - one Fabric call per job to pkg. This is useful for dryRun, but not for packaging large dirs on remote. If false, run all code on remote.

  • dryRun (bool, default = True) – Get fileLists to be archived, but don’t build. Most useful for localLoop = True case.

  • hide (bool, default = True) – If false, print all Fabric output to screen (localLoop = True case only) If true, only summary data is printed.

  • do (To) –

  • -----

  • files. (- Search logic for electronic structure) –

  • Call (-) –

buildSite()

Run existing scripts to build index.rst and build HTML (Sphinx).

NOTE: currently requires git push to be run manually.

buildUploads(Emin=3, repo='Zenodo', repoDryRun=True, verbose=False, dryRun=True, eStructCp=True, eSourceDir=None, nbSubDirs=False, schema='2016', writeDict=None)

Build notebook file list + details + archives.

Note this will process & package all jobs in nbProcDir.

Parameters:
  • Emin (int, optional, default = 3) – Minimum number of E points for job packaging.

  • repo (str, default = 'Zenodo') – Set repo for uploading. Currenly only supports Zenodo.

  • repoDryRun (bool, default = True) – Set to False to initiate job with repo (passed to initRepo(dryRun = repoDryRun)). If True, details will be printed to screen only.

  • verbose (bool, default = False) – If True, print repo details.

  • dryRun (bool, default = True) – Set to True to get file info etc, but skip writing archives.

  • eStructCp (bool, default = True) – Copy electronic structure file(s) to job dirs. Currently only set for single file, should add search logic here.

  • eSourceDir (str or path object, optional, default = None) – If supplied, use this as the source path for electronic structure files instead of original job definition.

  • nbSubDirs (bool, default = False) – Search for notebooks in subdirs. Default is to search in root dir only, as set in self.hostDefn[self.host][‘nbProcDir’] Note this is only used if reconstructing nbFileList.

  • writeDict (bool, default = None) – Set to overwrite ndDetails() dictionary. If not set, user will be prompted to overwrite if dict exists.

TODO: - Fix inconsistent handling of subDirs. Currently set for getNotebookList(), but not remote glob functions. - move repo stuff to separate function, this will be called after archives are built.

Repo will only need files as set, plus job details.

checkArchFiles(key=None, archName=None, verbose=False)

Check archive file contents on remote and compare with local contents list

Parameters:
  • key (int or str) – Item key in self.nbDetails

  • archName (str or Path object) – If supplied, use instead of self.nbDetails[key][‘archName’]

  • supplied. (Either key or archName must be) –

Returns:

  • localListRel, archFiles (list) – Local files with relative paths & archive file list.

  • fileComp (list) – Difference between lists.

  • result (Fabric object) – Full return from remote, .stdout includes archFiles and file details.

checkRepoFiles(key=None, searchString=None)

Check repo remote files

Supply either item key or search string.

TODO: add comparison with local file list if key supplied.

cpESFiles(dryRun=True, eSourceDir=None)
delRepoItem(key)

Delete item from repo (Zenodo) - for unpublished items only.

fileListCheck(key=None, verbose=True, errorCheck=True)

Pkg job filelist sorting & summary

Parse pkg file list (nbDetails[key][‘pkgFileList’]) for details: - Dirs - Files and types - Check against expected numbers

NOTE: this is currently done based on path formats, since os or path methods only work on local machine. This may not be robust.

getArchLogs()

Get archive logs from host (for remote run with nohup)

getNotebookJobList(subDirs=True, verbose=True)

Get job list from host - scan nbProcDir for ePS .out files.

Mainly for use by runNotebooks(), but can call separately to reconstuct job list.

Parameters:
  • subDirs (bool, optional, default = True) – Include subDirs in processing.

  • verbose (bool, optional, default = True) – Print jobList to screen.

getNotebookList(subDirs=True, verbose=True)

Get notebook list from host - scan nbProcDir for ePS .ipynb files.

Use to generate/reconstuct notebook list. If list is already set, old and new lists will be displayed, and user prompted for overwrite.

Parameters:
  • subDirs (bool, optional, default = True) – Include subDirs in processing.

  • verbose (bool, optional, default = True) – Print jobList to screen.

getNotebooks()

Get remote notebook files.

initRepo(key, manualVerify=True, dryRun=True, verbose=True, update=False)

Basic API calls for repo uploading

Will implement repo record creation from item in nbDetails.

CURRENTLY SET FOR ZENODO https://developers.zenodo.org/?python http://localhost:8888/notebooks/python/epsman/repo/Zenodo_API_tests_Dec2019.ipynb

TODO

nbDetailsSummary()

Print notebook stats.

nbWriteHeader(writeDict=None, hide=False, verbose=False)

Read job info and set header cell for ePSproc Notebooks for repo upload.

pkgOverride(keyList=None, pkgFlag=True, titleFlag=None)

Override packaging defaults and force packaging of job.

keyListint, str, list

Items to override. Keys in self.nbDetails.

pkgFlagbool, default = True

Set packing variables to True, including override of ArchFileCheck

titleFlagbool, default = None

Override default job title - set to True/False or None. If None, prompt for each job.

publishRepoItem(key, manualVerify=True)

Publish item/record on Zenodo.

publishUploads(manualVerify=True)

Finialise repo upload by publishing.

readNBdetailsJSON(overwrite=None)

Read previously written nbDetails dictionary from JSON file.

If overwrite = None, prompt for overwrite if details exist.

runNotebooks(subDirs=True, template='nb-tpl-JR-v4', scp='nb-sh-JR', multiEChunck=False)

Set up and run batch of ePSproc notebooks using Jupyter-runner.

  • Create job list for directory.

  • Set params list for jupyter-runner

  • Run on remote.

Parameters:
  • self (epsJob structure) –

    Contains path and job settings:

    • ’nbProcDir’ used for post-processing.

      Defaults to ‘systemDir’ if not set.

    • ’templateDir’ for Jupyter-runner template notebook.

      Defaults to ‘scpdir’ if not set.

  • subDirs (bool, optional, default = True) – Include subDirs in processing.

  • template (str, optional, default = 'nb-tpl-JR') – Jupyter notebook template file for post-processing. File list set in self.scrDefn, assumed to be in self.hostDefn[self.host][‘scpdir’] unless self.hostDefn[self.host][‘nbTemplateDir’] is defined.

  • scp (str, optional, default = 'nb-sh-JR') – Script for running batch job on remote. TODO: should set scp and template relations in input dict.

  • multiEChunck (bool, optional, default = False) – Set to true for consolidated handling of E-chuncked jobs. In this case, process batch of files in a single notebook. NOTE: this also requires a compatible template file. NOTE: no error checking here yet, should add checks rather than manual setting.

  • do (To) –

  • -----

  • install? (- Templates dir from module? Should be able to get with inspect... not sure if templates included in) –

searchRepo(key, searchString=None, verbose=False)

Search Zenodo for item

setESFiles(eSourceDir=None, verbose=False)

Set electronic structure file from job info.

Use alternative path eSourceDir if passed.

Check also if file exists.

setNotebookTemplate(template='nb-tpl-JR-v4')

Set post-processing job template

Mainly for use by runNotebooks(), but can call separately to reconstuct settings.

Parameters:

template (str, optional, default = 'nb-tpl-JR-v4') – Jupyter notebook template file for post-processing. File list set in self.scrDefn, assumed to be in self.hostDefn[self.host][‘scpdir’] unless self.hostDefn[self.host][‘nbTemplateDir’] is defined.

submitUploads(local=False)

Submit uploads to repo - for packaged jobs, upload files to initialized repo from local or remote machine.

tidyNotebooks(rename=True, overrideFlag=False, cp=True, dryRun=False, multiEChunck=False)

Tidy up autogenerated notebooks from Jupyter-runner (from epsman._epsProc.py()).

Assumes numerical ordering matches current self.jobList.

Parameters:
  • rename (bool, default = True) – Flag for file renaming to match ePS job file.

  • overrideFlag (bool, optional, default = False) – Set to true to confirm/override autoset notebook names.

  • cp (bool, default = True) – If true, make copy of notebook file in ePS job directory.

  • dryRun (bool, default = False) – Set to True for dry run, print file commands but don’t execute.

  • multiEChunck (bool, optional, default = False) – Set to true for consolidated handling of E-chuncked jobs. In this case, process batch of files in a single notebook. NOTE: no error checking here yet, should add checks rather than manual setting.

updateArch(fileIn, archName, dryRun=True)

Add file to existing archive.

NOTE: if file exists in archive it will be skipped, not be updated, since python ZipFile does not support this. NOTE: if file path root is different from archive root path (as set in call below) it will be addded to the archive root, otherwise relative path will be preserved. TODO: error checking, will fail if file is missing.

Parameters:
  • fileIn (str or Path) – File to add to archive.

  • archName (str or Path) – Archive to add file to.

updateUploads(dryRun=True, verbose=False, repo='Zenodo')
updateWebNotebookFiles()

Update web dir with new notebooks.

uploadRepoFiles(key)

Upload files to repo (from local machine)

For remote run see repo/remoteUpload.py

writeJobJSON()

Write job JSON files on remote from existing master file self.jsonProcFile.

writeNBdetailsJSON()

Write nbDetails dictionary to JSON file and push to host.