DNAnexus applet for the `stepcount` Python package

Scripts, configuration files, and instructions to create a DNAnexus applet for the Python package stepcount (https://github.com/OxWearables/stepcount) for use on the DNAnexus platform.

✅ Prerequisites

Python 3.8 or higher
The DNAnexus dx toolkit: pip install dxpy
git

The following shows how to use Anaconda to satisfy the above prerequisites (you can use any Python environment manager):

Download & install Miniconda (light-weight version of Anaconda).
(Windows only) Open the Anaconda Prompt (Start Menu).
Create a new environment named dxpy with Python, Pip, and Git:
```
conda create -n dxpy python=3.9 pip git
```
Activate the environment:
```
conda activate dxpy
```
You should now see (dxpy) at the beginning of your command prompt.
Install dxpy:
```
pip install dxpy
```

🟢 You are now ready! You've created an environment called dxpy containing the DNAnexus CLI.

🔁 Next time: Just open the Anaconda Prompt and run:
conda activate dxpy
If you see (dxpy) in your prompt, you’re good to go.

🔐 DNAnexus Login & Basic Commands

Log in to DNAnexus:

dx login

Use your regular DNAnexus username/password.

Basic DNAnexus commands (prefixed with dx) mimic Unix commands:

Command	Meaning
`dx ls`	List files/folders
`dx cd`	Change directories
`dx mkdir`	Create a new folder
`dx rm`	Delete a file
`dx mv`	Move or rename a file

📖 For more: DNAnexus CLI Quickstart

🛠️ Building the Applet

Clone this repository:

git clone https://github.com/OxWearables/dnanexus-stepcount.git
cd dnanexus-stepcount/

Build the asset:
```
dx build_asset stepcount-asset
```
⏳ This takes 10–15 minutes and may show warnings—ignore them.
When complete, copy the asset ID (e.g., record-abc123). If you missed it:
```
dx describe stepcount-asset
```
Open the file stepcount/dxapp.json find this section:
```
"assetDepends": [
  {
    "id": "record-..."
  }
]
```
Replace "record-..." with the actual asset ID. Save and close.
Finally, build the applet:
```
dx build stepcount
```

▶️ Running the Applet

To begin, download a sample accelerometer file:

https://wearables-files.ndph.ox.ac.uk/files/data/samples/ax3/tiny-sample.cwa.gz

and upload it to your DNAnexus project: dx upload tiny-sample.cwa.gz

You can now run the applet on the uploaded sample file:

dx run stepcount -iinput_file=tiny-sample.cwa.gz

⏳ This takes 5–10 minutes.

This starts a new job on DNAnexus. The job ID shown in the output (e.g. job-AbCdE12345) can be used to track its progress in the DNAnexus web interface under the “Monitor” tab. Once the job finishes, an outputs/ folder will be created in your project. You can view its contents with dx tree outputs/ which should look like this:

outputs/
└── tiny-sample
    ├── tiny-sample-Bouts.csv.gz
    ├── tiny-sample-Daily.csv.gz
    ├── tiny-sample-DailyAdjusted.csv.gz
    ├── tiny-sample-Hourly.csv.gz
    ├── tiny-sample-HourlyAdjusted.csv.gz
    ├── tiny-sample-Info.json
    ├── tiny-sample-Minutely.csv.gz
    ├── tiny-sample-MinutelyAdjusted.csv.gz
    ├── tiny-sample-Steps.csv.gz
    ├── tiny-sample-Steps.png
    └── tiny-sample-StepTimes.csv.gz

🧯 Troubleshooting

Error: ('destination project is in region aws:xx-xxxx-x but "regionalOptions" do not contain this region. Please, update your "regionalOptions" specification',)
- Solution: Open stepcount/dxapp.json and search for the "regionalOptions" field:
```
"regionalOptions": {
    "aws:eu-west-2": {...}
}
```
  Change "aws:eu-west-2" to your project region as indicated in your error message.

🔁 Running on Multiple Files

The most straightforward way to process multiple files is to submit one dx run command per file. The example below shows how to automate this using standard Unix commands (it also works in the Windows Anaconda Prompt).

First, you'll need to generate a list of file paths you want to process. In this example, we're working with UK Biobank accelerometer data (about 100,000 files). We use the dx find data command to filter by field ID 90001 (UK Biobank ID for accelerometry), and then use awk to extract just the file paths:

dx find data --property field_id=90001 | awk '{print $6}' > my-files.txt

The resulting my-files.txt file should contain entries like:

/Bulk/Activity/Raw/54/5408734_90001_1_0.cwa
/Bulk/Activity/Raw/49/4945583_90001_1_0.cwa
/Bulk/Activity/Raw/20/2066665_90001_1_0.cwa
...

Finally, we use xargs to submit a job for each entry:

xargs -P10 -I {} sh -c 'dx run stepcount -iinput_file=":{}" -y --brief' < my-files.txt | tee my-jobs.txt

This will execute dx run stepcount ... for each entry in my-files.txt. It will also create a log file my-jobs.txt containing the list of submitted job IDs.

For additional batch processing strategies, see the tutorial by the UK Biobank team: https://github.com/UK-Biobank/UKB-RAP-Imaging-ML/blob/main/stepcount-applet/bulk_files_processing.ipynb

🛑 Terminating Multiple Jobs

If you need to terminate multiple job submissions, the my-jobs.txt file can be used as follows:

xargs -P10 -I {} sh -c 'dx terminate "{}"' < my-jobs.txt

📊 Collating Outputs from Multiple Runs

After running multiple jobs, you may want to merge their output files for further analysis. The stepcount package includes a secondary CLI tool, stepcount-collate-outputs, made for this purpose. To use it on DNAnexus, you'll need to create a separate applet (you can reuse the already created stepcount-asset asset, avoiding the time-consuming asset building process):

Open stepcount-collate-outputs/dxapp.json and find this section:
```
"assetDepends": [
  {
    "id": "record-..."
  }
]
```
Replace "record-..." with the asset ID you created earlier (i.e. stepcount-asset).
Build the applet:
```
dx build stepcount-collate-outputs
```

The applet can then be used as follows:

dx run stepcount-collate-outputs -iinput_file=my-outputs.txt

First, create the my-outputs.txt file listing the IDs of the files you want to collate. We will use dx find data for this. Assuming the files are in the outputs/ folder, run:

dx find data --path outputs/ --brief > my-outputs.txt

The resulting my-outputs.txt file will look like this:

project-GXJBY38JZ32Vb0588YVYx3Gy:file-Gx4k9hjJVz2Gb3gkV0p3XfVk
project-GXJBY38JZ32Vb0588YVYx3Gy:file-Gx4k9hjJVz28pPjj9p7vJqkX
project-GXJBY38JZ32Vb0588YVYx3Gy:file-Gx4k9hjJVz2P260x2PjZK0Gy
...

Note that, unlike the my-files.txt file from the previous section which listed file paths, this one lists file IDs.

Next, upload the list to DNAnexus:

dx upload my-outputs.txt

Finally, run the collate applet on the list:

dx run stepcount-collate-outputs -iinput_file=my-outputs.txt

💡 Tip: Speed Up File Collating by Selecting Only Needed Files

If you're dealing with hundreds of thousands of output files (e.g. UK Biobank), collating everything may be too slow.

The stepcount package creates several output types. For example, *-Info.json files have overall stats, *-Daily.csv files have daily summaries, and *-Hourly.csv files show hourly data.

You can speed things up by selecting only the files you need using the --name option in the dx find data command. For example, if you only want the *-Info.json files:

dx find data --path outputs/ --brief --name *-Info.json > only-info-outputs.txt

📌 Versioning for Reproducibility

To ensure reproducibility and follow best practices, we recommend explicitly pinning the version of the stepcount package in your asset.

Open the stepcount-asset/dxasset.json file.
Edit the "execDepends" section to include the desired version of stepcount. For example, to pin the version to 3.12.0, you would specify:
```
"execDepends": [
  {"name": "stepcount", "version": "3.12.0", "package_manager": "pip"},
  {...},
]
```
Save and close the file.
Rebuild the asset by running:
```
dx build_asset stepcount-asset
```

Find all available versions of stepcount here: 👉 https://github.com/OxWearables/stepcount/releases

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
stepcount-asset		stepcount-asset
stepcount-collate-outputs		stepcount-collate-outputs
stepcount		stepcount
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DNAnexus applet for the `stepcount` Python package

✅ Prerequisites

🔐 DNAnexus Login & Basic Commands

🛠️ Building the Applet

▶️ Running the Applet

🧯 Troubleshooting

🔁 Running on Multiple Files

🛑 Terminating Multiple Jobs

📊 Collating Outputs from Multiple Runs

💡 Tip: Speed Up File Collating by Selecting Only Needed Files

📌 Versioning for Reproducibility

About

Uh oh!

Releases

Packages

Uh oh!

Languages

OxWearables/dnanexus-stepcount

Folders and files

Latest commit

History

Repository files navigation

DNAnexus applet for the stepcount Python package

✅ Prerequisites

🔐 DNAnexus Login & Basic Commands

🛠️ Building the Applet

▶️ Running the Applet

🧯 Troubleshooting

🔁 Running on Multiple Files

🛑 Terminating Multiple Jobs

📊 Collating Outputs from Multiple Runs

💡 Tip: Speed Up File Collating by Selecting Only Needed Files

📌 Versioning for Reproducibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

DNAnexus applet for the `stepcount` Python package

Packages