You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: three types of TAG and their examples (#140)
I added the doc for the examples of three types of TAG including hierarchical FL, distributed training and parallel experiments under their example folder. Also, I fixed some typos of current doc and added the corresponding TAG doc to the main doc folder.
Copy file name to clipboardExpand all lines: docs/01-introduction.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -81,7 +81,7 @@ The non-ochestration mode is useful in one of the following situations:
81
81
* when the geo-distributed clusters are not under the management of one organization
82
82
* when participants of a FL job want to have a control over when to join or leave the job
83
83
84
-
In non-ochestration mode, the fleddge system is only responsible for managing (i.e., (de)allocation) non-data comsuming workers (e.g., model aggregating workers).
84
+
In non-ochestration mode, the fleddge system is only responsible for managing (i.e., (de)allocation) non-data consuming workers (e.g., model aggregating workers).
85
85
The system supports a hybrid mode where some are managed workers and others are non-managed workers.
86
86
87
87
Note that the flame system is in active development and not all the functionalities are supported yet.
Copy file name to clipboardExpand all lines: docs/04-examples.md
+8-51Lines changed: 8 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Examples
2
2
3
-
This section currently presents one example: FL training for MNIST. More examples will follow in the future.
3
+
This section currently presents one example: FL training for MNIST. More examples will follow in the future, and you will find the instruction under the README file of their particular folder.
4
4
5
5
## MNIST
6
6
@@ -82,7 +82,7 @@ flamectl get jobs
82
82
```
83
83
84
84
### Step 7: start a job
85
-
Before staring your job, you can always use `flamectl get` to check each step is set up corretly. For more info, check
85
+
Before staring your job, you can always use `flamectl get` to check each step is set up correctly. For more info, check
86
86
```bash
87
87
flamectl get --help
88
88
```
@@ -157,52 +157,9 @@ The log for a task is similar to `task-61bd2da4dcaed8024865247e.log` under `/var
157
157
As an alternative, one can check the progress at MLflow UI in the fiab setup.
158
158
Open a browser and go to http://mlflow.flame.test.
159
159
160
-
## Hierarchical MNIST
161
-
Likewise, the hierarchical FL example follows the same fashion.
The zip file should contain code of every code specified in the schema.
179
-
180
-
### Step 4:
181
-
```bash
182
-
flamectl create dataset dataset_eu_germany.json
183
-
flamectl create dataset dataset_eu_uk.json
184
-
flamectl create dataset dataset_na_canada.json
185
-
flamectl create dataset dataset_na_us.json
186
-
```
187
-
Flame will assign a trainer to each dataset. As each dataset has a `realm` specified, the middle aggreagator will be created based on the corresponding `groupby` tag. In this case, there will be one middle aggregator for Europe (eu) and one for North America (na).
188
-
189
-
### Step 5:
190
-
Put all four dataset IDs into `job.json`, and change training hyperparameters as you like.
191
-
```json
192
-
"fromSystem": [
193
-
"62439c3725fe244585396ad7",
194
-
"6243a10c25fe244585396af0",
195
-
"6243a13625fe244585396af2",
196
-
"6243a14525fe244585396af3"
197
-
]
198
-
```
199
-
200
-
### Step 6:
201
-
```bash
202
-
flamectl create job job.json
203
-
```
204
-
205
-
### Step 7:
206
-
```bash
207
-
flamectl start job ${Job ID}
208
-
```
160
+
For other examples, please visit their particular example directories:
161
+
-[Medical Image Multi-class Classification with PyTorch](../examples/medmnist/README.md)
162
+
-[Binary Income Classifcation with Tabular Dataset](../examples/adult/README.md)
163
+
-[Toy Example of Hierarchical FL](../examples/hier_mnist/README.md)
164
+
-[Toy Example of Parallel Experiments](../examples/parallel_experiment/README.md)
165
+
-[Toy Example of Distributed Training](../examples/distributed_training/README.md)
Copy file name to clipboardExpand all lines: docs/05-flame-basics.md
+99-12Lines changed: 99 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,13 +18,13 @@ The key benefits of the abstraction are:
18
18
Depending on the availability of different communication infrastructures and security policies,
19
19
a workload can be easily changed from one communication technology to another.
20
20
21
-
**High extensibility**: TAG makes it easy to support a variety of different topologies. Therefore, it can potentially support many different usecases easily.
21
+
**High extensibility**: TAG makes it easy to support a variety of different topologies. Therefore, it can potentially support many different use cases easily.
22
22
23
23
<palign="center"><imgsrc="images/role_channel.png"alt="role and channel"" /></p>
24
24
25
25
26
26
Now let us describe how TAG is enabled. TAG is comprised of two basic and yet simple building blocks: *role* and *channel*.
27
-
A *role* represents a vertex in TAG and should be associated with some hevaviors.
27
+
A *role* represents a vertex in TAG and should be associated with some behaviors.
28
28
To create association between role and its behavior, a (python) code must be attached to a role.
29
29
Once the association is done, a role is fully *defined*.
30
30
@@ -45,7 +45,7 @@ A channel also has two attributes: *groupBy* and *funcTags*.
45
45
46
46
**groupBy**: This attribute is used to group roles of the channel based on a tag.
47
47
Therefore, the groupBy attribute allows to build a hierarchical topology (e.g., a single-rooted multi-level tree), for instance, based on geographical location tags (e.g., us, uk, fr, etc).
48
-
Currently a string-based tag is supported. Future extensions may include more dynamic grouping based on dynamic metrics such as latency, data (dis)simiarlity, and so on.
48
+
Currently a string-based tag is supported. Future extensions may include more dynamic grouping based on dynamic metrics such as latency, data (dis)similarity, and so on.
49
49
50
50
**funcTags** This attribute (discussed later in detail) contains what actions a role would take on the channel.
51
51
As mentioned earlier, a role is associated with executable code.
@@ -54,13 +54,13 @@ We will discuss how to use funcTags correctly in the later part.
54
54
55
55
### TAG Example 1: Two-Tier Topology
56
56
In flame, a topology is expressed within a concept called *schema*.
57
-
A schema is a resuable component as a template.
57
+
A schema is a reusable component as a template.
58
58
The following presents a simple two-tier cross-device topology.
59
59
60
60
```json
61
61
{
62
62
"name": "A sample schema",
63
-
"description": "a sample schema to demostrate a TAG layout",
63
+
"description": "a sample schema to demonstrate a TAG layout",
64
64
"roles": [
65
65
{
66
66
"name": "trainer",
@@ -102,15 +102,15 @@ When datasets are selected (more details [here (not yet updated)]()), each datas
102
102
Therefore, in the flame system, **the number of datasets will drive the number of data-consuming workers** (e.g., trainer in this case).
103
103
Subsequently, the number of non data-consuming workers is derived from the entries in the *groupBy* feature (more on [later]()).
104
104
105
-
Now let's look at channels. Channels are expressed as a list. A channel consits of four key attributes: *name*, *pair*, *groupBy* and *funcTags*.
105
+
Now let's look at channels. Channels are expressed as a list. A channel consists of four key attributes: *name*, *pair*, *groupBy* and *funcTags*.
106
106
The *name* attribute is used to uniquely identify a channel.
107
107
The *pair* attribute contains two roles that constitute the channel; each role takes one of the channel.
108
108
For the correctness, roles in the pair must exist in the role list.
109
109
110
110
The *groupBy* attribute allows how to group or cluster workers of two ends (or roles) in the channel. It's optional.
111
111
If this attribute is not defined, workers belonging to the channel are grouped into a default group.
112
112
113
-
With *pair* and *groupBy*, a channel only specifies what roles consititue a channel and how they are grouped.
113
+
With *pair* and *groupBy*, a channel only specifies what roles constitute a channel and how they are grouped.
114
114
But it doesn't know what actions each role takes on the channel. The *funcTags* attribute allows *dynamic* binding of functions to a channel.
115
115
The software code attached to a role must define a set of functions that it wants to expose to users
116
116
so that the users can specify it in the schema. Therefore, it allows more complex operations on a channel.
### How to move from 2-tier to Hierarchical Topology
214
-
From 2-tier to hierarchical (e.g., 3-tier), you need to have one more role in between top aggreagator and trainer, so you add middle aggreagator into the topology (i.e., schema), which also require you to define new channels connecting between each two roles. In order for the hierarchical concept to work, the `groupBy` of upstream channel shouldn't be more specific than the downstream channel.
213
+
####How to move from 2-tier to hierarchical topology
214
+
From 2-tier to hierarchical (e.g., 3-tier), you need to have one more role in between top aggregator and trainer, so you add middle aggregator into the topology (i.e., schema), which also require you to define new channels connecting between each two roles. In order for the hierarchical concept to work, the `groupBy` of upstream channel shouldn't be more specific than the downstream channel.
215
215
Likewise, when you want to expand to 4-tier topology, you will need a new channel definition connecting between two middle aggregators.
216
216
217
217
However, it is still unclear how workers are grouped together at run time.
218
218
A brief answer is as follows: in the flame system, before workers are created, they are configured with an attribute called *realm*.
219
219
This attribute is a logical hierarchical value which is similar to a directory-like structure in a file system.
220
220
It basically dictates where workers should be created and to which path the workers belong in the logical hierarchy.
221
221
Given this hierarchical information, users can judiciously choose grouping labels.
222
-
Further discussion is available [here (not yet updated)]().
222
+
223
+
### TAG Example 3: Parallel Experiments
224
+
Flame system allows multiple identical TAGs to run in parallel based on the `groupBy` tag, such as allowing a 2-tier FL task to run in parallel for 3 geographical regions simultaneously (see image below).
"description": "a sample schema to demonstrate the parallel experiment setting",
232
+
"roles": [
233
+
{
234
+
"name": "trainer",
235
+
"description": "It consumes the data and trains local model",
236
+
"isDataConsumer": true
237
+
},
238
+
{
239
+
"name": "aggregator",
240
+
"description": "It aggregates the updates from trainers",
241
+
}
242
+
],
243
+
"channels": [
244
+
{
245
+
"name": "param-channel",
246
+
"description": "Model update is sent from trainer to aggregator and vice-versa",
247
+
"pair": [
248
+
"trainer",
249
+
"aggregator"
250
+
],
251
+
"groupBy": {
252
+
"type": "tag",
253
+
"value": [
254
+
"default/us",
255
+
"default/eu",
256
+
"default/asia"
257
+
]
258
+
},
259
+
"funcTags": {
260
+
"trainer": ["fetch", "upload"],
261
+
"aggregator": ["distribute", "aggregate"]
262
+
}
263
+
}
264
+
]
265
+
}
266
+
```
267
+
268
+
This topology is the same as the 2-tier one except there are additional *value* in the *groupBy* tag.
269
+
270
+
### TAG Example 4: Distributed Learning
271
+
Flame system allows distributed training besides federated learning. In TAG, it's creating a self-loop (see image below) to allow channel communication between trainers so that algorithms such as ring all-reduce can be used to train the model utilizing multiple trainers.
Copy file name to clipboardExpand all lines: docs/08-flame-sdk.md
+63-2Lines changed: 63 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,9 +1,70 @@
1
1
# Flame SDK
2
2
3
3
## Selector
4
-
4
+
Users are able to implement new selectors in `lib/python/flame/selector/` which should return a dictionary with keys corresponding to the active trainer IDs (i.e., agent IDs). After implementation, the new selector needs to be registered into both `lib/python/flame/selectors.py` and `lib/python/flame/config.py`.
5
5
### Currently Implemented Selectors
6
6
1. Naive (i.e., select all)
7
+
```json
8
+
"selector": {
9
+
"sort": "default",
10
+
"kwargs": {}
11
+
}
12
+
```
7
13
2. Random (i.e, select k out of n local trainers)
14
+
```json
15
+
"selector": {
16
+
"sort": "random",
17
+
"kwargs": {
18
+
"k": 1
19
+
}
20
+
}
21
+
```
22
+
23
+
## Optimizer (i.e., aggregator of FL)
24
+
Users can implement new server optimizer, when the client optimizer is defined in the actual ML code, in `lib/python/flame/optimizer` which can take in hyperparameters if any and should return the aggregated weights in either PyTorch of Tensorflow format. After implementation, the new optimizer needs to be registered into both `lib/python/flame/optimizer.py` and `lib/python/flame/config.py`.
25
+
26
+
### Currently Implemented Optimizers
27
+
1. FedAvg (i.e., weighted average in terms of dataset size)
28
+
```json
29
+
# e.g.
30
+
"optimizer": {
31
+
"sort": "fedavg",
32
+
"kwargs": {}
33
+
}
34
+
```
35
+
2. FedAdaGrad (i.e., server uses AdaGrad optimizer)
36
+
```json
37
+
"optimizer": {
38
+
"sort": "fedadagrad",
39
+
"kwargs": {
40
+
"beta_1": 0,
41
+
"eta": 0.1,
42
+
"tau": 0.01
43
+
}
44
+
}
45
+
```
46
+
3. FedAdam (i.e., server uses Adam optimizer)
47
+
```json
48
+
"optimizer": {
49
+
"sort": "fedadam",
50
+
"kwargs": {
51
+
"beta_1": 0.9,
52
+
"beta_2": 0.99,
53
+
"eta": 0.01,
54
+
"tau": 0.001
55
+
}
56
+
}
57
+
```
58
+
4. FedYogi (i.e., servers use Yogi optimizer)
59
+
```json
60
+
"optimizer": {
61
+
"sort": "fedyogi",
62
+
"kwargs": {
63
+
"beta_1": 0.9,
64
+
"beta_2": 0.99,
65
+
"eta": 0.01,
66
+
"tau": 0.001
67
+
}
68
+
}
69
+
```
8
70
9
-
Users are able to implement new selectors in `lib/python/flame/selector/` which should return a dictionary with keys corresponding to the active trainer IDs (i.e., agent IDs). After implementation, the new selector needs to be registered into both `lib/python/flame/selectors.py` and `lib/python/flame/config.py`.
0 commit comments