Quick Start

We use a very simple task definition file and submit it to a LSF batch system.

Hint

The default batch system currently is LSF, so if you do not change it, LSF will be used. Check out Batch Processing for more information.

Our task will be very simple: we want to create 100 files with some random number in it. Later, we will build the average of those numbers.

  1. Open a code editor and create a new file simple-example.py with the following content:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    import b2luigi
    import random
    
    
    class MyNumberTask(b2luigi.Task):
        some_parameter = b2luigi.IntParameter()
    
        def output(self):
            return b2luigi.LocalTarget(f"results/output_file_{self.some_parameter}.txt")
    
        def run(self):
            random_number = random.random()
            with self.output().open("w") as f:
                f.write(f"{random_number}\n")
    
    
    if __name__ == "__main__":
        b2luigi.set_setting("result_dir", "results")
        b2luigi.process([MyNumberTask(some_parameter=i) for i in range(100)],
                        workers=200)
    

    Each building block in (b2)luigi is a b2luigi.Task. It defines (which its run function), what should be done. A task can have parameters, as in our case the some_parameter defined in line 6. Each task needs to define, what it will output in its output function.

    Note

    We have defined a result path in the script with

    b2luigi.set_setting("results")
    

    You can ignore that for not - we will come back to it later.

    In our run function, we generate a random number and write it to the output file, which is named after the parameter of the task and stored in a result folder.

    Hint

    For those of you who have already used luigi most of this seems familiar. Actually, b2luigi’s task is a superset of luigi’s, so you can reuse your old scripts! b2luigi will not care, which one you are using. But we strongly advice you to use b2luigi’s task, as it has some more superior functions (see below).

    Please not that we could have imported b2luigi with

    import b2luigi as luigi
    

    to make the transition between b2luigi and luigi even simpler.

  2. Call the newly created file with python:

    python simple-example.py --batch
    

    Instead of giving the batch parameter in as argument, you can also add it to the luigi.process(.., batch=True) call.

    Each task will be scheduled as a batch job to your LSF queue. Using the dependency management of luigi, the batch jobs are only scheduled when all dependencies are fulfilled saving you some unneeded CPU time on the batch system. This means although you have requested 200 workers, you only need 100 workers to fulfill the tasks, so only 100 batch jobs will be started. On your local machine runs only the scheduling mechanism needing only a small amount of a single CPU power.

    Hint

    If you have no LSF queue ready or you do not want to run on the batch, you can also remove the batch argument. This will fall back to a normal luigi execution. Please see Batch Processing for more information on batch execution and the discussion of other batch systems.

  3. After the job is completed, you will see something like:

    The log files for each task are written to the logs folder.

    After a job is submitted, b2luigi will check if it is still running or not and handle failed or done tasks correctly.

  4. The defined output file names will in most of the cases depend on the parameters of the task, as you do not want to override your files from different tasks. However this means, you always need to include all parameters in the file name to keep them different. This cumbersome work can be handled by b2luigi automatically , which will also help you ordering your files at no cost. This is especially useful in larger projects, when many people are defining and executing tasks.

    This code listing shows the same task, but this time written using the helper functions given by b2luigi.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    import b2luigi
    import random
    
    
    class MyNumberTask(b2luigi.Task):
        some_parameter = b2luigi.IntParameter()
    
        def output(self):
            yield self.add_to_output("output_file.txt")
    
        def run(self):
            random_number = random.random()
    
            with open(self.get_output_file_name("output_file.txt"), "w") as f:
                f.write(f"{random_number}\n")
    
    
    if __name__ == "__main__":
        b2luigi.set_setting("result_dir", "results")
        b2luigi.process([MyNumberTask(some_parameter=i) for i in range(100)],
                        workers=200)
    

    Before continuing, remove the output of the former calculation.

    rm -rf results
    

    If you now call

    python simple-example.py --batch
    

    you are basically doing the same as before, with some very nice benefits:

    • The parameter values are automatically added to the output file (have a look into the results/ folder to see how it works and where the results are stored)
    • The output for different parameters are stored on different locations, so no need to fear overriding results.
    • The format of the folder structure makes it easy to work on it using bash commands as well as automated procedures.
    • Other files related to your job, e.g. the submission files etc. are also placed into this folder (this is why the very first example defined it already).
    • The default is to use the folder where your script is located.

    Hint

    In the example, the base path for the results is defined in the python file with

    b2luigi.set_setting("result_dir", "results")
    

    Instead, you can also add a settings.json with the following content in the folder where your script lives:

    {
    	"result_dir": "results"
    }
    

    The settings.json will be used by all tasks in this folder and in each sub-folder. You can use it to define project settings (like result folders) and specific settings for your local sub project. Read the documentation on b2luigi.get_setting() for more information on how to use it.

    Attention

    The result path (as well as any other paths, e.g. the log folders) are always evaluated relatively to your script file. This means results will always be created in the folder where your script is, not where your current working directory is. If you are unsure on the location, call

    python3 simple-example.py --show-output
    

    More on file systems is described in Batch Processing, which is also mostly true for non-batch calculations.

  5. Let’s add some more tasks to our little example. We want to use the currently created files and add them all together to an average number. So edit your example file to include the following content:

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    import b2luigi
    import random
    
    
    class MyNumberTask(b2luigi.Task):
        some_parameter = b2luigi.Parameter()
    
        def output(self):
            yield self.add_to_output("output_file.txt")
    
        def run(self):
            random_number = random.random()
    
            with open(self.get_output_file_name("output_file.txt"), "w") as f:
                f.write(f"{random_number}\n")
    
    
    class MyAverageTask(b2luigi.Task):
        def requires(self):
            for i in range(100):
                yield self.clone(MyNumberTask, some_parameter=i)
    
        def output(self):
            yield self.add_to_output("average.txt")
    
        def run(self):
            # Build the mean
            summed_numbers = 0
            counter = 0
            for input_file in self.get_input_file_names("output_file.txt"):
                with open(input_file, "r") as f:
                    summed_numbers += float(f.read())
                    counter += 1
    
            average = summed_numbers / counter
    
            with open(self.get_output_file_name("average.txt"), "w") as f:
                f.write(f"{average}\n")
    
    
    if __name__ == "__main__":
        b2luigi.set_setting("result_dir", "results")
        b2luigi.process(MyAverageTask(), workers=200)
    

    See how we defined dependencies in line 19 with the requires function. By calling clone we make sure that any parameters from the current task (which are none in our case) are copied to the dependencies.

    Hint

    Again, expert luigi users will not see anything new here.

    By using the helper functions b2luigi.Task.get_input_file_names() and b2luigi.Task.get_output_file() the output file name generation with parameters is transparent to you as a user. Super easy!

    When you run the script, you will see that luigi detects your already run files from before (the random numbers) and will not run the task again! It will only output a file in results/average.txt with a number near 0.5.

You are now ready to read some more documentation in API Documentation or have a look into the FAQ. Please also check out the different Run Modes.