Step I/O Design
Design
The Step architecture is designed such that
a Step’s intended sole responsibility is to perform the calculation
required. Any input/output operations are handled by the surrounding
Step architecture. This is to help facilitate the use of Step’s
from both a command-line environment, and from an interactive Python
environment, such as Jupyter notebooks or IPython.
For command-line usage, all inputs and outputs are designed to come from and save to files.
For interactive Python use, inputs and outputs are expected to be
Python objects, negating the need to save and reload data after every
Step call. This allows users to write Python scripts without having
to worry about doing I/O at every step, unless, of course, if the user
wants to do so.
The high-level overview of the input/output design is given in Writing a step. The following discusses the I/O API and best practices.
To facilitate this design, a basic Step is suggested to have the
following structure:
class MyStep(stpipe.Step):
spec = '' # Desired configuration parameters
def process(self, input):
return do_calculation(input)
When run from the command line:
strun MyStep input_data.asdf
the result will be saved in a file called:
input_data_mystep.asdf
Similarly, the same code can be used in a Python script or interactive environment as follows:
result = MyStep.call(input)
# result contains the resulting data
# which can then be used by further steps or
# other functions.
#
# when done, the data can be saved with the DataModel.save
# method
result.save('my_final_results.asdf')
Input Source
All input files, except for references files provided by CRDS,
are expected to be co-resident in the same directory. That directory
is determined by the directory in which the primary input file
resides. For programmatic use, this directory is available in the
Step.input_dir attribute.
Output
When Files are Created
Whether a Step produces an output file or not is ultimately
determined by the built-in parameter option save_results. If
True, output files will be created. save_results is set under a
number of conditions:
Explicitly through a parameter file or as a command-line option.
Implicitly when a step is called by
strun.
Output File Naming
File names are constructed based on three components: basename, suffix, and extension:
basename_suffix.extension
The extension will often be the same as the primary input file. This
will not be the case if the data format of the output needs to be
something different, such as a text table with .ecsv extension.
Similarly, the basename will usually be derived from the primary input file. However, there are some caveats discussed below.
Ultimately, the suffix is what Step use to identify their output.
A Step’s suffix is defined in a couple of different ways:
By the
Step.nameattribute. This is the default.By the
suffixparameter.Explicitly in the code. Often this is done in
Pipelineswhere a single pipeline creates numerous different output files.
Basename Determination
Most often, the output file basename is determined through any of the following, given from higher precedence to lower:
The
--output_filecommand-line option.The
output_fileparameter option.Primary input file name.
If the output is a
DataModel, from theDataModel.meta.filename.
In all cases, if the originating file name has a known suffix on it,
that suffix is removed and replaced by the Step’s own suffix.
In very rare cases, when there is no other source for the basename, a
basename of step_<step_name> is used. This can happen when a
Step is being programmatically used and only the save_results
parameter option is given.
Sub-Steps and Output
Normally, the value of a parameter option is completely local to
the Step: A Step, called from another Step or Pipeline, can
only access its own parameters. Hence, options such as
save_results do not affect a called Step.
The exceptions to this are the parameters output_file and
output_dir. If either of these parameters are queried by a Step,
but are not defined for that Step, values will be retrieved up
through the parent. The reason is to provide consistency in output
from Step and Pipeline. All file names will have the same
basename and will all appear in the same directory.
As expected, if either parameter is specified for the Step in
question, the local value will override the parent value.
Also, for output_file, there is another option,
search_output_file, that can also control this behavior. If set to
False, a Step will never query its parent for its value.
Output API: When More Control Is Needed
In summary, the standard output API, as described so far, is basically “set a
few parameters, and let the Step framework handle the rest”. However, there
are always the exceptions that require finer control, such as saving
intermediate files or multiple files of different formats. This section
discusses the method API and conventions to use in these situations.
Save That Model: Step.save_model
If a Step needs to save a DataModel before the step completes, use
of stpipe.Step.save_model() is the recommended over
directly calling DataModel.save.
Step.save_model uses the Step framework and hence will honor the
following:
If
Step.save_resultsisFalse, nothing will happen.Will ensure that
Step.output_diris used.Will use
Step.suffixif not otherwise specified.Will determine the output basename through the
Stepframework, if not otherwise specified.
The basic usage, in which nothing is overridden, is:
class MyStep(Step):
def process(self, input):
# ...
result = some_DataModel
self.save_model(result)
The most common use case, however, is for saving some intermediate results that would have a different suffix:
self.save_model(intermediate_result_datamodel, suffix='intermediate')
See stpipe.Step.save_model() for further information.
Make That Filename: Step.make_output_path
For the situations when a filename is needed to be constructed before
saving, either to know what the filename will be or for data that is
not a DataModel, use stpipe.Step.make_output_path. By default, calling
make_output_path without any arguments will return what the default
output file name will be:
output_path = self.make_output_path()
This method encapsulates the following Step framework functions:
Will ensure that
Step.output_diris used.Will use
Step.suffixif not otherwise specified.Will determine the output basename through the
Stepframework, if not otherwise specified.
A typical use case is when a Step needs to save data that is not a
DataModel. The current Step architecture does not know how to
handle these, so saving needs to be done explicitly. The pattern of
usage would be:
# A table need be saved and needs a different
# suffix than what the Step defines.
table = some_astropy_table_data
if self.save_results:
table_path = self.make_output_path(suffix='cat', ext='ecsv')
table.save(table_path, format='ascii.ecsv', overwrite=True)