epsman.epsProcessor module
Class for ePolyScat job post-processing, inc. notebook generation & pushing repo to web.
- 19/02/21 v1, moved from previous uber-class to keep functionality cleaner and reflect usage.
CURRENTLY: just set as per old base class, but should clean up with only used functions and/or inheritance.
- class epsman.epsProcessor.epsProcessor(host=None, user=None, IP=None, password=None, mol=None, orb=None, batch=None, genFile=None, verbose=1)[source]
Bases:
epsJobClass for ePolyScat job post-processing, inc. notebook generation & pushing repo to web.
Main function is local and remote path management (via pathlib), job configuration set-up (manual), and local & remote file IO (via Fabric/Invoke).
- 19/02/21 v1, moved from previous uber-class to keep functionality cleaner and reflect usage.
CURRENTLY: just set as per old base class, but should clean up with only used functions and/or inheritance. UPDATE: set to inherit from epsJob class for comms & util methods.
- buildArch(localLoop=True, dryRun=True, hide=True)
Build archives/packages for job.
Wrapper for repo/pkgFiles.py, which runs on host machine.
- Parameters:
localLoop (bool, default = True) – Run version of code with job loop on local machine - one Fabric call per job to pkg. This is useful for dryRun, but not for packaging large dirs on remote. If false, run all code on remote.
dryRun (bool, default = True) – Get fileLists to be archived, but don’t build. Most useful for localLoop = True case.
hide (bool, default = True) – If false, print all Fabric output to screen (localLoop = True case only) If true, only summary data is printed.
do (To) –
----- –
files. (- Search logic for electronic structure) –
Call (-) –
- buildSite()
Run existing scripts to build index.rst and build HTML (Sphinx).
NOTE: currently requires git push to be run manually.
- buildUploads(Emin=3, repo='Zenodo', repoDryRun=True, verbose=False, dryRun=True, eStructCp=True, eSourceDir=None, nbSubDirs=False, schema='2016', writeDict=None)
Build notebook file list + details + archives.
Note this will process & package all jobs in nbProcDir.
- Parameters:
Emin (int, optional, default = 3) – Minimum number of E points for job packaging.
repo (str, default = 'Zenodo') – Set repo for uploading. Currenly only supports Zenodo.
repoDryRun (bool, default = True) – Set to False to initiate job with repo (passed to initRepo(dryRun = repoDryRun)). If True, details will be printed to screen only.
verbose (bool, default = False) – If True, print repo details.
dryRun (bool, default = True) – Set to True to get file info etc, but skip writing archives.
eStructCp (bool, default = True) – Copy electronic structure file(s) to job dirs. Currently only set for single file, should add search logic here.
eSourceDir (str or path object, optional, default = None) – If supplied, use this as the source path for electronic structure files instead of original job definition.
nbSubDirs (bool, default = False) – Search for notebooks in subdirs. Default is to search in root dir only, as set in self.hostDefn[self.host][‘nbProcDir’] Note this is only used if reconstructing nbFileList.
writeDict (bool, default = None) – Set to overwrite ndDetails() dictionary. If not set, user will be prompted to overwrite if dict exists.
TODO: - Fix inconsistent handling of subDirs. Currently set for getNotebookList(), but not remote glob functions. - move repo stuff to separate function, this will be called after archives are built.
Repo will only need files as set, plus job details.
- checkArchFiles(key=None, archName=None, verbose=False)
Check archive file contents on remote and compare with local contents list
- Parameters:
key (int or str) – Item key in self.nbDetails
archName (str or Path object) – If supplied, use instead of self.nbDetails[key][‘archName’]
supplied. (Either key or archName must be) –
- Returns:
localListRel, archFiles (list) – Local files with relative paths & archive file list.
fileComp (list) – Difference between lists.
result (Fabric object) – Full return from remote, .stdout includes archFiles and file details.
- checkRepoFiles(key=None, searchString=None)
Check repo remote files
Supply either item key or search string.
TODO: add comparison with local file list if key supplied.
- cpESFiles(dryRun=True, eSourceDir=None)
- delRepoItem(key)
Delete item from repo (Zenodo) - for unpublished items only.
- fileListCheck(key=None, verbose=True, errorCheck=True)
Pkg job filelist sorting & summary
Parse pkg file list (nbDetails[key][‘pkgFileList’]) for details: - Dirs - Files and types - Check against expected numbers
NOTE: this is currently done based on path formats, since os or path methods only work on local machine. This may not be robust.
- getArchLogs()
Get archive logs from host (for remote run with nohup)
- getNotebookJobList(subDirs=True, verbose=True)
Get job list from host - scan nbProcDir for ePS .out files.
Mainly for use by runNotebooks(), but can call separately to reconstuct job list.
- Parameters:
subDirs (bool, optional, default = True) – Include subDirs in processing.
verbose (bool, optional, default = True) – Print jobList to screen.
- getNotebookList(subDirs=True, verbose=True)
Get notebook list from host - scan nbProcDir for ePS .ipynb files.
Use to generate/reconstuct notebook list. If list is already set, old and new lists will be displayed, and user prompted for overwrite.
- Parameters:
subDirs (bool, optional, default = True) – Include subDirs in processing.
verbose (bool, optional, default = True) – Print jobList to screen.
- getNotebooks()
Get remote notebook files.
- initRepo(key, manualVerify=True, dryRun=True, verbose=True, update=False)
Basic API calls for repo uploading
Will implement repo record creation from item in nbDetails.
CURRENTLY SET FOR ZENODO https://developers.zenodo.org/?python http://localhost:8888/notebooks/python/epsman/repo/Zenodo_API_tests_Dec2019.ipynb
TODO
- nbDetailsSummary()
Print notebook stats.
- nbWriteHeader(writeDict=None, hide=False, verbose=False)
Read job info and set header cell for ePSproc Notebooks for repo upload.
- pkgOverride(keyList=None, pkgFlag=True, titleFlag=None)
Override packaging defaults and force packaging of job.
- keyListint, str, list
Items to override. Keys in self.nbDetails.
- pkgFlagbool, default = True
Set packing variables to True, including override of ArchFileCheck
- titleFlagbool, default = None
Override default job title - set to True/False or None. If None, prompt for each job.
- publishRepoItem(key, manualVerify=True)
Publish item/record on Zenodo.
- publishUploads(manualVerify=True)
Finialise repo upload by publishing.
- readNBdetailsJSON(overwrite=None)
Read previously written nbDetails dictionary from JSON file.
If overwrite = None, prompt for overwrite if details exist.
- runNotebooks(subDirs=True, template='nb-tpl-JR-v4', scp='nb-sh-JR', multiEChunck=False)
Set up and run batch of ePSproc notebooks using Jupyter-runner.
Create job list for directory.
Set params list for jupyter-runner
Run on remote.
- Parameters:
self (epsJob structure) –
Contains path and job settings:
- ’nbProcDir’ used for post-processing.
Defaults to ‘systemDir’ if not set.
- ’templateDir’ for Jupyter-runner template notebook.
Defaults to ‘scpdir’ if not set.
subDirs (bool, optional, default = True) – Include subDirs in processing.
template (str, optional, default = 'nb-tpl-JR') – Jupyter notebook template file for post-processing. File list set in self.scrDefn, assumed to be in self.hostDefn[self.host][‘scpdir’] unless self.hostDefn[self.host][‘nbTemplateDir’] is defined.
scp (str, optional, default = 'nb-sh-JR') – Script for running batch job on remote. TODO: should set scp and template relations in input dict.
multiEChunck (bool, optional, default = False) – Set to true for consolidated handling of E-chuncked jobs. In this case, process batch of files in a single notebook. NOTE: this also requires a compatible template file. NOTE: no error checking here yet, should add checks rather than manual setting.
do (To) –
----- –
install? (- Templates dir from module? Should be able to get with inspect... not sure if templates included in) –
- searchRepo(key, searchString=None, verbose=False)
Search Zenodo for item
- setESFiles(eSourceDir=None, verbose=False)
Set electronic structure file from job info.
Use alternative path eSourceDir if passed.
Check also if file exists.
- setNotebookTemplate(template='nb-tpl-JR-v4')
Set post-processing job template
Mainly for use by runNotebooks(), but can call separately to reconstuct settings.
- Parameters:
template (str, optional, default = 'nb-tpl-JR-v4') – Jupyter notebook template file for post-processing. File list set in self.scrDefn, assumed to be in self.hostDefn[self.host][‘scpdir’] unless self.hostDefn[self.host][‘nbTemplateDir’] is defined.
- submitUploads(local=False)
Submit uploads to repo - for packaged jobs, upload files to initialized repo from local or remote machine.
- tidyNotebooks(rename=True, overrideFlag=False, cp=True, dryRun=False, multiEChunck=False)
Tidy up autogenerated notebooks from Jupyter-runner (from
epsman._epsProc.py()).Assumes numerical ordering matches current self.jobList.
- Parameters:
rename (bool, default = True) – Flag for file renaming to match ePS job file.
overrideFlag (bool, optional, default = False) – Set to true to confirm/override autoset notebook names.
cp (bool, default = True) – If true, make copy of notebook file in ePS job directory.
dryRun (bool, default = False) – Set to True for dry run, print file commands but don’t execute.
multiEChunck (bool, optional, default = False) – Set to true for consolidated handling of E-chuncked jobs. In this case, process batch of files in a single notebook. NOTE: no error checking here yet, should add checks rather than manual setting.
- updateArch(fileIn, archName, dryRun=True)
Add file to existing archive.
NOTE: if file exists in archive it will be skipped, not be updated, since python ZipFile does not support this. NOTE: if file path root is different from archive root path (as set in call below) it will be addded to the archive root, otherwise relative path will be preserved. TODO: error checking, will fail if file is missing.
- Parameters:
fileIn (str or Path) – File to add to archive.
archName (str or Path) – Archive to add file to.
- updateUploads(dryRun=True, verbose=False, repo='Zenodo')
- updateWebNotebookFiles()
Update web dir with new notebooks.
- uploadRepoFiles(key)
Upload files to repo (from local machine)
For remote run see repo/remoteUpload.py
- writeJobJSON()
Write job JSON files on remote from existing master file self.jsonProcFile.
- writeNBdetailsJSON()
Write nbDetails dictionary to JSON file and push to host.