Skip to content

Deployment on transient federates #567

@Jakio815

Description

@Jakio815

@kushalpaliwal01 and I am trying to automate the deployment script with transient federates.

Transient federates can join the federation any time, and it will be done as an event. The current examples create a reactor that spawns the federate by directly executing the binary file with parameters such as below.

/** Persistent federate that is responsible for lauching the transient federate */
reactor TransientExec(launch_time: time = 0, fed_instance_name: char* = "instance") {
  timer t(launch_time, 0)

  reaction(t) {=
    // Construct the command to launch the transient federate
    char mid_launch_cmd[512];
    sprintf(mid_launch_cmd,
        "%s/bin/federate__%s -i %s",
        LF_FED_PACKAGE_DIRECTORY,
        self->fed_instance_name,
        lf_get_federation_id()
    );
    ...
    int status = system(mid_launch_cmd);
    ...
  =}
}

However, the deployment script generated when specifying RTI hosts and federates is not compatible with this example. Let me explain why.

How the deployment script works

If we specify the hosts, then the lfc will generate two types of scripts.

  1. Distribution script (e.g., bin/SimpleFederated_distribute.sh), that ssh to the server, scp the whole file (e.g., fed-gen/SimpleFederated/src-gen/federate__fed1) and then compile it.
  2. The launch script (e.g., bin/SimpleFederated) then sshs into each target host, and launches the binary.

(FYI, check here for specifying hosts.)

So back to transient federates, we need to ssh into the target machine in some point, and execute the binary. This code does not ssh, and also it doesn't know which address to ssh into.

So let me introduce some ideas to make this work.

1. Pass the IP address as a parameter

Let's say we pass the IP address to the TransientExec reactor. Also let's say we have some code to ssh into the machine.

reactor TransientExec(target_host: char* = "000.000.000.000", ...) {
  ...
  reaction(t) {=
    ...
    \\ Also add some conditions to do ssh...
    sprintf(mid_launch_cmd,
        "%s/bin/federate__%s -i %s",
        LF_FED_PACKAGE_DIRECTORY,
        self->fed_instance_name,
        lf_get_federation_id()
    );
    ...
    int status = system(mid_launch_cmd);
    ...
  =}
}

Here is a simple example using this, Up is connected to a transient federate Down, and there is TransientExec that spawns Down at some timing. Also all federates are on different machines, and also the RTI.

reactor Up {
  output out: int
}

reactor Down {
  input in: int
}

federated reactor at host0{
  exec = new TransientExec(host3, down) at user@host1

  up = new Up() at user@host2;
  @transient
  down = new Down() at user@host3
  
  up.out -> down.in
}

The user should take care two things on this line exec = new TransientExec(host3, down) at user@host1.

  1. The IP of down (host3)
  2. The name of the federate down.

So it is kind of error-prone, the user has to match the IP and name.

Another problem is scalability, when we need to spawn multiple transient federates. Are we going to have 100 TransientExec when we have 100 Down instances?

So these are the limitations of this approach. Still, the transient federates should be able to be modeled by the user, but deployment should be automated by lfc.

2. Specifying timer and offset

So now, I say that we don't need that TransientExec reactor at all.
The user still needs to be able to model when the transient federate to start. So, we can give a timer and an offset as

  @transient
  mid = new Middle() on (1 sec, 0) at user@xx.xx.xx.xx ;

Meaning that the transient federate will spawn on the (offset, period).

However, I realized that this can have some limitations on timing behaviors, and also the user may want the transient federate to be spawned by an input port, not a timer, and this can not be supported by this approach.

3. Creating an API lf_start_transient_fed()

For the user to model the behavior of the starting timing of the transient federate, I thought it should be given as an API such as lf_start_transient_fed().

So it can be triggered by a timer or an input with any condition.

The problem is how to implement this. Since the transient federates only support Centralized coordination, I thought of a way the RTI spawns the transient federate. (and there should be an assumption that the RTI could ssh into all federates.)

For example,

reactor Up {
  output out: int
  timer t(1 s, 0)
  reaction(t) -> out {=
    lf_start_transient_fed(out);
  =}
}

reactor Down {
  input in: int
}

federated reactor at host0{
  up = new Up() at user@host1;
  @transient
  down = new Down() at user@host2
  
  up.out -> down.in
}

When the Up reactor triggers the lf_start_transient_fed() function, it sends a message to the RTI such as (MSG_TYPE_SPAWN_TRANS_FED) which includes the federate ID.

There are some design points.

  1. We use the output port, to specify which transient federate to spawn. The connections are created in code compilation time, so the user can model which federate to spawn using the connection (up.out -> down.in) (Didn't yet consider what happens if multiple connections from port.)
  2. The spawning should be done by the RTI because the federate Up does not know the IP address of the transient fed Down.
  3. However, one problem is that I realized the RTI also does not know the IP address of the transient fed Down. The RTI gets to know the federate's IP address only when the federate connects to the RTI. However, the federate did not even start, so there is no idea to know the IP address.

So, here I am stuck.

I'll update or close this when it's fixed.

lingua-franca: lf-lang/lingua-franca#2213
reactor-c: #358
Discussion: lf-lang/lingua-franca#2212

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions