Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization fixed in build_deepwalk_corpus_iter() for non-str walk items #99

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ DeepWalk

DeepWalk uses short random walks to learn representations for vertices in graphs.

This implementation extends the original `DeepWalk <https://github.com/phanein/deepwalk>`_ v.1.0.3 with numerical walk items besides the ``str``, which is required by `HARP (AAAI 2018) <https://github.com/eXascaleInfolab/HARP>`_ and overall provides efficient graph node embedding, where the nodes are specified by numerical ids.

Usage
-----

Expand Down
12 changes: 9 additions & 3 deletions deepwalk/walks.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from __future__ import print_function, division
import logging
from io import open
from os import path
Expand Down Expand Up @@ -55,7 +56,7 @@ def _write_walks_to_disk(args):
with open(f, 'w') as fout:
for walk in graph.build_deepwalk_corpus_iter(G=G, num_paths=num_paths, path_length=path_length,
alpha=alpha, rand=rand):
fout.write(u"{}\n".format(u" ".join(v for v in walk)))
print(" ".join(str(v) for v in walk), file=fout)
logger.debug("Generated new file {}, it took {} seconds".format(f, time() - t_0))
return f

Expand All @@ -82,8 +83,13 @@ def write_walks_to_disk(G, filebase, num_paths, path_length, alpha=0, rand=rando
files.append(file_)

with ProcessPoolExecutor(max_workers=num_workers) as executor:
for file_ in executor.map(_write_walks_to_disk, args_list):
files.append(file_)
file_ = None
try:
for file_ in executor.map(_write_walks_to_disk, args_list):
files.append(file_)
except TypeError as err:
logger.error('ERROR: {}, file_: {}, args_list: {}'.format(err, file_, args_list))
raise

return files

Expand Down
58 changes: 58 additions & 0 deletions run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#!/bin/bash
#
# \description Execution on multiple networks
#
# \author Artem V L <[email protected]> https://exascale.info

DIMS=128
NETS="blogcatalog dblp homo wiki youtube"
WORKERS=8
RESTRACER=./exectime # time
LOGDIR=embeds/logs
mkdir -p $LOGDIR

USAGE="$0 -h | [-d <dimensions>=${DIMS}] [-w <workers>=${WORKERS}]
-d,--dims - required number of dimensions in the embedding model
-w,--workers - maximal number of workers (parallel thread). Note: deepwalk training can be failed on non-small datasets with small number of workers
-h,--help - help, show this usage description

Examples:
\$ $0 -d 128 -w 4
"

while [ $1 ]; do
case $1 in
-h|--help)
# Use defaults for the remained parameters
echo -e $USAGE # -e to interpret '\n\
exit 0
;;
-d|--dims)
if [ "${2::1}" == "-" ]; then
echo "ERROR, invalid argument value of $1: $2"
exit 1
fi
DIMS=$2
echo "Set $1: $2"
shift 2
;;
-w|--workers)
if [ "${2::1}" == "-" ]; then
echo "ERROR, invalid argument value of $1: $2"
exit 1
fi
WORKERS=$2
echo "Set $1: $2"
shift 2
;;
*)
printf "Error: Invalid option specified: $1 $2 ...\n\n$USAGE"
exit 1
;;
esac
done

for NET in $NETS; do
$RESTRACER python3 -m deepwalk --format mat --input graphs/${NET}.mat --number-walks 80 --representation-size ${DIMS} --walk-length 40 --window-size 10 --workers $WORKERS --output embeds/embs_${NET}_${DIMS}-n80-l40-s10.w2v > "$LOGDIR/${NET}_${DIMS}-n80-l40-s10.log" 2> "$LOGDIR/${NET}_${DIMS}-n80-l40-s10.err" # &
done
# python3 -m deepwalk --format mat --input example_graphs/blogcatalog.mat --number-walks 80 --representation-size 128 --walk-length 40 --window-size 10 --workers 1 --output example_graphs/blogcatalog.w2v