I have spent all too much time trying to get Caffe running on a CentOS based cluster that I use. I was hoping this would be a straightforward process. Suffice it to say it has not been. Not even close. None of the problems I encountered were particularly challenging to solve. The complication came from the fact that I ran into one hurdle after another. I should note however that installing Caffe on my personal machine which runs Linux Mint 17.1 went smoothly.
I'm writing this post as a record of the problems I encountered and the solutions I used. Unfortunately because I wasn't expecting to need to write this I may be missing details. If you notice something missing please feel free to leave a comment and I will update this document. Similarly, if you know of better ways to solve any of these problems please feel free to share. I will not likely test the solutions my self unless I need to reinstall for some reason so your comments would be largely for posterity. One thing to note is that some if not many of the problems I encountered may be rather specific too the cluster I'm using. My apologies if the following doesn't address the problem you are experiencing.
The first problem I ran into was with protobuf. The problem was related to the members of a unions being defined as constant in src/google/protobuf/util/internal/datapiece.h. Specifically the union defining the types i32_, i64_, u32_, u64_, double_, float_, bool_, and str_. This problem appears to be fairly common according to a quick Google search and the fix is as simple as removing the const keyword. However the error itself can be a bit misleading as it doesn't lead one specifically to the offending lines.
The next problem I encountered was related to glog. Specifically it requires a newer version of autotools than was installed on the system I'm using. To solve this I performed a user space install of autoconf and automake. It proved tricky to install autoconf for reasons I still don't understand. Thanks to the cluster admins I was able to finally do so using these instructions. Automake simply required downloading the tarball, configuring for my home directory, and then performing make followed by make install. Unfortunately glog was still not happy! The included make file hardcoded aclocal-1.14 and the upgraded autotools gave me autotools-1.15. Bah! Executing autoreconf -ivf fixed that issue though. I am not completely certain this is a good solution however as I have not used autoreconf before.
The next hurdle was with gflags. Evidently it requires cmake which was not already installed. Downloading and installing cmake resolved this problem. One thing to note when performing a user space install of cmake is that you need to use the --prefix flag when calling the boostrap script to indicate that you want cmake installed in your home directory. I also found that I had to compile with -fPIC, so the complete cmake command I ended up using was CXXFLAGS="-fPIC" cmake -DCMAKE_INSTALL_PREFIX=~ .. in a build subdirectory of the repository.
I also had to install OpenCV. I didn't run into any trouble here. But a word of warning for those that have never built it before -- it takes a very long time!
Next up was leveldb. In this case I just cloned the github repository and ran make in it. From there caffe needs to be told where to find the header files and shared objects that were built. I told it as much by appending to the INCLUDE_DIRS and LIBRARY_DIRS lists respectively in caffe/Makefile.config. The headers are in the include subdirectory of the repository while the libraries will be placed at the root of the repository.
From there I found I needed lmdb. This library is developed under OpenLDAP. At the time of this writing they offer a github repository with just the lmdb code so I cloned it and built the library. From there I updated the INCLUDE_DIRS and LIBRARY_DIRS lists in caffe/Makefile.config to point to libraries/liblmdb within the repository.
Next I had to install the Google snappy library. In this case I had to get the tarball from the Google code repository not the github repository. For reasons I don't know or really care about it seems that build files are missing from the gihub repository.
The last problem I ran into was related to Atlas. I had previously performed a user space install of it but did not add the resulting lib directory to my LD_LIBRARY_PATH and LIBRARY_PATH environment variables. Doing so allowed me to finally able to execute make all in the Caffe repository and have it complete without errors. It took a while though, in part because I was using a single thread since I never knew what it would stumble over next. As such I advise throwing more threads at it by instead using the command make all -jX where X is the number of threads you want it to use.
One final note on the installation. Don't forget to add the appropriate atlas, leveldb, and liblmdb directories to LD_LIBRARY_PATH in your .bashrc.
At this point I'm really hoping it was worth the effort to install Caffe. Comparatively the Theano and Pylearn2 installations were so much easier on this same system.
and this guide can help if anybody has problems with installing caffe on mac: http://playittodeath.ru/how-to-install-caffe-on-mac-os-x-yosemite-10-10-4/
ReplyDeleteExcellent! Thanks for posting. Do you know whether it will work with a MacBook Air? A colleague has been having some trouble on that front.
ReplyDelete