This repository was archived by the owner on Nov 14, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
Setup Instructions
gregtheyoung edited this page Dec 12, 2014
·
9 revisions
The current code, found here at Github, has been tested to run on a Windows 2012 R2 Standard machine with a minimum of XXG RAM and XXG disk.
These instructions presume:
- Windows 2012 R2 Standard has already been installed, as is the case, e.g., with a new AWS (Amazon Web Service) EC2 (elastic cloud compute) instance.
- Nothing else has been installed except for the standard OS install.
- The machine has a C drive on which the OS has been installed and other software will be installed, and a D drive which will be used for all InventorDisambiguator code and data processing. Adjust the instructions accordingly if a different drive letter is used.
- Downloaded files are put in
d:\Downloads
. Adjust the instructions accordingly if a different drive letter is used. - The user of this document is familiar with installing software and use of the software once installed.
- For the sake of background, this setup has worked by using an AWS EC2 instance using the instance: “Microsoft Windows Server 2012 R2 Base - ami-beca16d6”.
- Download and install MSVC++ 2010 redist. This is needed for Octave
- http://www.microsoft.com/en-us/download/details.aspx?id=8328
- Download and install MSVC++ 2012 redist, x86. This is needed for PHP
- http://www.microsoft.com/en-us/download/details.aspx?id=30679
- When asked, the file you want is
VSU_4\vcredist_x86.exe
- [Note: deprecated - replaced with Julia below] Download and install Octave 3.6.4 precompiled for Windows Visual Studio from here:
- http://sourceforge.net/projects/octave/files/Octave%20Windows%20binaries/
- Run
octave-3.6.4-vs2010-setup.exe
using all default settings - Download and install PHP 5.5 (VC11 x86 Thread Safe) from this page: http://windows.php.net/download/
- http://windows.php.net/downloads/releases/php-5.5.17-Win32-VC11-x86.zip
- Unzip to
c:\PHP
- Install per instructions here. These two sets of instructions specifically:
1. http://us3.php.net/manual/en/install.windows.manual.php
2. http://us2.php.net/manual/en/install.windows.commandline.php
3. The specific steps that were done during the last install were:
- copy the php.ini-production into php.ini
- In php.ini
- Removed comment from line 721: extension_dir = "ext"
- Add
c:\php
to the system environment PATH variable - Add
.PHP
to the system environment PATHEXT variable - [Note - deprecated - see C++ setup below] Download and install Julia
- Download the Windows 64-bit from here: http://julialang.org/downloads/
- Run the downloaded file and specify C:\Julia
- Download and install MinGW, g++, the Eigen library, and zlib1.dll
- Download and install mingw 64-bit: http://sourceforge.net/projects/mingw-w64/
1. Select x86_64 when the option dialog is presented.
2. After install, add
C:\Program Files\mingw-w64\x86_64-4.9.2-posix-seh-rt_v3-rev0\mingw64\bin
to user environment PATH variable (presuming you installed version 4.9.2 - adjust as necessary). - Download and install the Eigen library
1. http://bitbucket.org/eigen/eigen/get/3.2.2.zip
2. Unzip to
c:\Eigen_3.2.2
- Download and install disambiguator code from https://github.com/CSSIP-AIR/InventorDisambiguator
- Put it into
d:\InventorDisambiguator
1. Note that by default if you use the “Download Zip” from the GitHub site, the zip file will have an extra directory of “PatentsProcessor-master” in it to reflect the “master” branch, so you may to use settings in your unzip tool (7zip) or move files so that the actual files begin directly underd:\PatentsProcessor
. - Compile the main.cpp file
1.
g++ --std=c++11 -o disambig -Wall -DNDEBUG -Ic:\Eigen_3.2.2 main.cpp
- Go to
d:\InventorDisambiguator
- Get a TSV (tab separated value) file from the PatentProcessor and put it in the directory. It will be produced via
run_consolidate.bat
as called bystart.py
. - It has the naming convention of
disambiguator_[MM]_[dd].tsv
where MM is the month it was produced and dd is the day of the month. For example:disambiguator_August_18.tsv
php Initialize_Input.php disambiguator_August_18.tsv
php Initialize_ID.php
php Matrixify_Attributes.php
- [Note - deprecated]
c:\software\octave-3.6.4\bin\octave.exe
source("load.m")
source("disambig.m")
disambig d:\InventorDisambiguator
- You should now have a
_disambiguator_output.tsv
file. - That file will be used by the PatentsProcessor. See that project for instructions on how to use this file to integrate back into the patents database.
- Note: when running on an EC2 c3.large (3.75G RAM), I would get a zend_mm_heap corrupted message when running
initialize_id.php
. Changing to a r2.2xlarge (61G RAM) fixed the problem. Didn't try intermediate sizes. - When running the
matrixify_attributes.php
, it was designed to use multiple threads via the PCTNL library. That is not supported for PHP on Windows. I removed that for now, so it will only run with one thread, and thus only use one core at a time. - When running the
Disambig.m
, it also only uses one core. Octave is single-threaded, as is MatLib (I believe). In order to make this part, which can run for a week!, perform better, the algorithm would have to be changed to make it parallel. Another thought is to do blocking and spawn an instance for each block. For example, since the current algorithm will never collapse if at least the first initial and last name are identical, then we could block them into, say, 26 blocks, with each block containing rows that share the same last initial. There'd be much more code change than that (e.g. creating unique IDs for disambiguated inventors), but it could possibly run in maybe 1/10th the time. - Using Matlab rather than Octave in a test on all 2005 patents: Matlab took 7 minutes, Octave was still going when killed after 48 hours.