README.md 10.1 KB
Newer Older
Ross Girshick's avatar
Ross Girshick committed
1
2
3
4
## R-CNN: *Regions with Convolutional Neural Network Features*

Created by Ross Girshick, Jeff Donahue, Trevor Darrell and Jitendra Malik at UC Berkeley EECS.

Ross Girshick's avatar
Ross Girshick committed
5
6
Acknowledgements: a huge thanks to Yangqing Jia for creating Caffe and the BVLC team, with a special shoutout to Evan Shelhamer, for maintaining Caffe and helping to merge the R-CNN fine-tuning code into Caffe.

Ross Girshick's avatar
Ross Girshick committed
7
8
9
10
### Introduction

R-CNN is a state-of-the-art visual object detection system that combines bottom-up region proposals with rich features computed by a convolutional neural network. At the time of its release, R-CNN improved the previous best detection performance on PASCAL VOC 2012 by 30% relative, going from 40.9% to 53.3% mean average precision. Unlike the previous best results, R-CNN achieves this performance without using contextual rescoring or an ensemble of feature types.

Ross Girshick's avatar
Ross Girshick committed
11
12
R-CNN was initially described in an [arXiv tech report](http://arxiv.org/abs/1311.2524) and will appear in a forthcoming CVPR 2014 paper.

Ross Girshick's avatar
Ross Girshick committed
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
### Citing R-CNN

If you find R-CNN useful in your research, please consider citing:

    @inproceedings{girshick14CVPR,
        Author = {Girshick, Ross and Donahue, Jeff and Darrell, Trevor and Malik, Jitendra},
        Title = {Rich feature hierarchies for accurate object detection and semantic segmentation},
        Booktitle = {Computer Vision and Pattern Recognition},
        Year = {2014}
    }

### License

R-CNN is released under the Simplified BSD License (refer to the
LICENSE file for details).

Ross Girshick's avatar
Ross Girshick committed
29
30
31
32
33
### PASCAL VOC detection results

Method         | VOC 2007 mAP | VOC 2010 mAP | VOC 2012 mAP
-------------- |:------------:|:------------:|:------------:
R-CNN          | 54.2%        | 50.2%        | 49.6%
Ross Girshick's avatar
Ross Girshick committed
34
R-CNN bbox reg | 58.5%        | 53.7%        | 53.3%
Ross Girshick's avatar
Ross Girshick committed
35
36
37

* VOC 2007 per-class results will be published soon at CVPR and on arXiv
* VOC 2010 per-class results are available on the [VOC 2010 leaderboard](http://host.robots.ox.ac.uk:8080/leaderboard/displaylb_dt.php?challengeid=6&compid=4)
Ross Girshick's avatar
Ross Girshick committed
38
* VOC 2012 per-class results are available on the [VOC 2012 leaderboard](http://host.robots.ox.ac.uk:8080/leaderboard/displaylb_dt.php?challengeid=11&compid=4)
Ross Girshick's avatar
Ross Girshick committed
39

Ross Girshick's avatar
Ross Girshick committed
40
41
### Installing R-CNN

42
43
44
45
0. **Prerequisites** 
  0. MATLAB (tested with 2012b on 64-bit Linux)
  0. Caffe's [prerequisites](http://caffe.berkeleyvision.org/installation.html#prequequisites)
0. **Install Caffe** (this is the most complicated part)
46
  0. Download this [tagged release of Caffe](https://github.com/BVLC/caffe/archive/rcnn-release.tar.gz)
47
  0. Follow the [Caffe installation instructions](http://caffe.berkeleyvision.org/installation.html)
Ross Girshick's avatar
Ross Girshick committed
48
49
50
  0. Let's call the place where you installed caffe `$CAFFE_ROOT` (you can run `export CAFFE_ROOT=$(pwd)`)
  0. **Important:** Make sure to compile the Caffe MATLAB wrapper, which is not built by default: `make matcaffe`
  1. **Important:** Make sure to run `cd $CAFFE_ROOT/data/ilsvrc12 && ./get_ilsvrc_aux.sh` to download the ImageNet image mean
51
52
0. **Install R-CNN**
  0. Let's assume you've placed the R-CNN source in a folder called `rcnn`
Ross Girshick's avatar
Ross Girshick committed
53
54
55
  0. Change into that directory: `cd rcnn`
  0. R-CNN expects to find Caffe in `external/caffe`, so create a symlink: `ln -sf $CAFFE_ROOT external/caffe`
  0. Start MATLAB (make sure you're in the `rcnn` folder): `matlab`
56
  0. You'll be prompted to download the [Selective Search](http://disi.unitn.it/~uijlings/MyHomepage/index.php#page=projects1) code, which we cannot redistribute. Afterwards, you should see the message `R-CNN startup done` followed by the MATLAB prompt `>>`.
57
58
  0. Run the build script: `>> rcnn_build()` (builds [liblinear](http://www.csie.ntu.edu.tw/~cjlin/liblinear/) and [Selective Search](http://www.science.uva.nl/research/publications/2013/UijlingsIJCV2013/)). Don't worry if you see compiler warnings while building liblinear, this is normal on my system.
  0. Check that Caffe and MATLAB wrapper are set up correctly (this code should run without error): `>> key = caffe('get_init_key');` (expected output is key = -2)
59
  0. Download the data package, which includes precompute models (see below).
Ross Girshick's avatar
Ross Girshick committed
60

61
62
63
64
65
**Common issues:** You may need to set an `LD_LIBRARY_PATH` before you start MATLAB. If you see a message like "Invalid MEX-file '/path/to/rcnn/external/caffe/matlab/caffe/caffe.mexa64': libmkl_rt.so: cannot open shared object file: No such file or directory" then make sure that CUDA and MKL are in your `LD_LIBRARY_PATH`. On my system, I use:

    export LD_LIBRARY_PATH=/opt/intel/mkl/lib/intel64:/usr/local/cuda/lib64
  

Ross Girshick's avatar
Ross Girshick committed
66
67
### Downloading precomputed models (the data package)

68
69
The quickest way to get started is to download precomputed R-CNN detectors. Currently we have detectors trained on PASCAL VOC 2007 train+val and 2012 train. Unfortunately the download is large (1.5GB), so brew some coffee or take a walk while waiting.

Ross Girshick's avatar
Ross Girshick committed
70
From the `rcnn` folder, run the data fetch script: `./data/fetch_data.sh`. 
Ross Girshick's avatar
Ross Girshick committed
71

72
This will populate the `rcnn/data` folder with `caffe_nets`, `rcnn_models` and `selective_search_data`. See `rcnn/data/README.md` for details.
Ross Girshick's avatar
Ross Girshick committed
73
74
75
76
77
78


### Running an R-CNN detector on an image

Let's assume that you've downloaded the precomputed detectors. Now:

Ross Girshick's avatar
Ross Girshick committed
79
80
1. Change to where you installed R-CNN: `cd rcnn`. 
2. Start MATLAB `matlab`.
Ross Girshick's avatar
Ross Girshick committed
81
82
  * **Important:** if you don't see the message `R-CNN startup done` when MATLAB starts, then you probably didn't start MATLAB in `rcnn` directory.
3. Run the demo: `>> rcnn_demo`
83
3. Enjoy the detected bicycle and person
Ross Girshick's avatar
Ross Girshick committed
84
85
86

### Training your own R-CNN detector on PASCAL VOC

87
88
89
90
91
Let's use PASCAL VOC 2007 as an example. The basic pipeline is: 

    extract features to disk -> train SVMs -> test
    
You'll need about 200GB of disk space free for the feature cache (which is stored in `rcnn/feat_cache` by default; symlink `rcnn/feat_cache` elsewhere if needed). **It's best if the feature cache is on a fast, local disk.** Before running the pipeline, we first need to install the PASCAL VOC 2007 dataset.
Ross Girshick's avatar
Ross Girshick committed
92
93
94

#### Installing PASCAL VOC 2007

95
0. Download the training, validation, test data and VOCdevkit:
Ross Girshick's avatar
Ross Girshick committed
96

97
  <pre>
Ross Girshick's avatar
Ross Girshick committed
98
99
100
  wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
  wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtest_06-Nov-2007.tar
  wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
101
102
103
104
105
  </pre>

0. Extract all of these tars into one directory, it's called `VOCdevkit`. 

  <pre>
Ross Girshick's avatar
Ross Girshick committed
106
107
108
  tar xvf VOCtrainval_06-Nov-2007.tar
  tar xvf VOCtest_06-Nov-2007.tar
  tar xvf VOCdevkit_08-Jun-2007.tar
109
110
111
112
113
114
115
116
117
118
  </pre>

0. It should have this basic structure:

  <pre>
  VOCdevkit/                           % development kit
  VOCdevkit/VOCcode/                   % VOC utility code
  VOCdevkit/VOC2007                    % image sets, annotations, etc.
  ... and several other directories ...
  </pre>
Ross Girshick's avatar
Ross Girshick committed
119

120
0. I use a symlink to hook the R-CNN codebase to the PASCAL VOC dataset:
Ross Girshick's avatar
Ross Girshick committed
121

122
  <pre>
Ross Girshick's avatar
Ross Girshick committed
123
  ln -sf /your/path/to/voc2007/VOCdevkit /path/to/rcnn/datasets/VOCdevkit2007
124
  </pre>
Ross Girshick's avatar
Ross Girshick committed
125
126
127
128
129
130
131
132
133
134
135
136
137
138

#### Extracting features

<pre>
>> rcnn_exp_cache_features('train');   % chunk1
>> rcnn_exp_cache_features('val');     % chunk2
>> rcnn_exp_cache_features('test_1');  % chunk3
>> rcnn_exp_cache_features('test_2');  % chunk4
</pre>

**Pro tip:** on a machine with one hefty GPU (e.g., k20, k40, titan) and a six-core processor, I run start two MATLAB sessions each with a three worker matlabpool. I then run chunk1 and chunk2 in parallel on that machine. In this setup, completing chunk1 and chunk2 takes about 8-9 hours (depending on your CPU/GPU combo and disk) on a single machine. Obviously, if you have more machines you can hack this function to split the workload.

#### Training R-CNN models and testing

139
140
Now to run the training and testing code, use the following experiments script:

Ross Girshick's avatar
Ross Girshick committed
141
142
143
144
<pre>
>> test_results = rcnn_exp_train_and_test()
</pre>

145
146
147
**Note:** The training and testing procedures save models and results under `rcnn/cachedir` by default. You can customize this by creating a local config file named `rcnn_config_local.m` and defining the experiment directory variable `EXP_DIR`. Look at `rcnn_config_local.example.m` for an example.


Ross Girshick's avatar
Ross Girshick committed
148
149
150
151
152
153
154
155
156
157
158
159
160
161
### Training an R-CNN detector on another dataset

It should be easy to train an R-CNN detector using another detection dataset as long as that dataset has *complete* bounding box annotations (i.e., all instances of all classes are labeled).

To support a new dataset, you define three functions: (1) one that returns a structure that describes the class labels and list of images; (2) one that returns a region of interest (roi) structure that describes the bounding box annotations; and (3) one that provides an test evaluation function.

You can follow the PASCAL VOC implementation as your guide:

* `imdb/imdb_from_voc.m   (list of images and classes)`  
* `imdb/roidb_from_voc.m (region of interest database)`
* `imdb/imdb_eval_voc.m   (evalutation)`  

### Fine-tuning a CNN for detection with Caffe

Ross Girshick's avatar
Ross Girshick committed
162
163
164
165
166
167
168
169
170
171
As an example, let's see how you would fine-tune a CNN for detection on PASCAL VOC 2012.

0. Create window files for VOC 2012 train and VOC 2012 val.
  0. Start MATLAB in the `rcnn` directory
  0. Get the imdb for VOC 2012 train: `>> imdb_train = imdb_from_voc('datasets/VOCdevkit2012', 'train', '2012');`
  0. Get the imdb for VOC 2012 val: `>> imdb_val = imdb_from_voc('datasets/VOCdevkit2012', 'val', '2012');`
  0. Create the window file for VOC 2012 train: `>> rcnn_make_window_file(imdb_train, 'external/caffe/examples/pascal-finetuning');`
  0. Create the window file for VOC 2012 val: `>> rcnn_make_window_file(imdb_val, 'external/caffe/examples/pascal-finetuning');`
  0. Exit MATLAB
0. Run fine-tuning with Caffe
Ross Girshick's avatar
Ross Girshick committed
172
  0. Copy the fine-tuning prototxt files: `cp finetuning/voc_2012_prototxt/pascal_finetune_* external/caffe/examples/pascal-finetuning/`
Ross Girshick's avatar
Ross Girshick committed
173
174
175
  0. Change directories to `external/caffe/examples/pascal-finetuning`
  0. Execute the fine-tuning code (make sure to replace `/path/to/rcnn` with the actual path to where R-CNN is installed):
  
Ross Girshick's avatar
Ross Girshick committed
176
177
178
179
180
  <pre>
  GLOG_logtostderr=1 ../../build/tools/finetune_net.bin \
  pascal_finetune_solver.prototxt \
  /path/to/rcnn/data/caffe_nets/ilsvrc_2012_train_iter_310k 2>&1 | tee log.txt
  </pre>
Ross Girshick's avatar
Ross Girshick committed
181
182
      
**Note:** In my experiments, I've let fine-tuning run for 70k iterations, although with hindsight it appears that improvement in mAP saturates at around 40k iterations.