Commit 54d2576f authored by Ross Girshick's avatar Ross Girshick
Browse files

RPN layers, Faster R-CNN training, misc improvements

parent b0758d0a
*.pyc
.ipynb_checkpoints
utils/*.c
utils/*.so
lib/build
Fast R-CNN
Faster R-CNN
Copyright (c) Microsoft Corporation
The MIT License (MIT)
Copyright (c) 2015 Microsoft Corporation
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
************************************************************************
THIRD-PARTY SOFTWARE NOTICES AND INFORMATION
This project, Faster R-CNN, incorporates material from the project(s)
listed below (collectively, "Third Party Code"). Microsoft is not the
original author of the Third Party Code. The original copyright notice
and license under which Microsoft received such Third Party Code are set
out below. This Third Party Code is licensed to you under their original
license terms set forth below. Microsoft reserves all other rights not
expressly granted, whether by implication, estoppel or otherwise.
1. Caffe, (https://github.com/BVLC/caffe/)
COPYRIGHT
All contributions by the University of California:
Copyright (c) 2014, 2015, The Regents of the University of California (Regents)
All rights reserved.
MIT License
All other contributions:
Copyright (c) 2014, 2015, the respective contributors
All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a
copy of this software and associated documentation files (the "Software"),
to deal in the Software without restriction, including without limitation
the rights to use, copy, modify, merge, publish, distribute, sublicense,
and/or sell copies of the Software, and to permit persons to whom the
Software is furnished to do so, subject to the following conditions:
Caffe uses a shared copyright model: each contributor holds copyright
over their contributions to Caffe. The project versioning records all
such contribution and copyright details. If a contributor wants to
further mark their specific copyright on a particular contribution,
they should indicate their copyright solely in the commit message of
the change when it is committed.
The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.
The BSD 2-Clause License
THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
************END OF THIRD-PARTY SOFTWARE NOTICES AND INFORMATION**********
# *Fast* R-CNN: Fast Region-based Convolutional Networks for object detection
### Disclaimer
Created by Ross Girshick at Microsoft Research, Redmond.
The official Faster R-CNN code (written in MATLAB) is available [here](https://github.com/ShaoqingRen/faster_rcnn).
If your goal is to reproduce the results in our NIPS 2015 paper, please use the [official code](https://github.com/ShaoqingRen/faster_rcnn).
### Introduction
This repository contains a Python *reimplementation* of the MATLAB code.
This Python implementation is built on a fork of [Fast R-CNN](https://github.com/rbgirshick/fast-rcnn).
There are slight differences between the two implementations.
In particular, this Python port
- is ~10% slower at test-time, because some operations execute on the CPU in Python layers (e.g., 220ms / image vs. 200ms / image for VGG16)
- gives similar, but not exactly the same, mAP as the MATLAB version
- is *not compatible* with models trained using the MATLAB code due to the minor implementation differences
**Fast R-CNN** is a fast framework for object detection with deep ConvNets. Fast R-CNN
- trains state-of-the-art models, like VGG16, 9x faster than traditional R-CNN and 3x faster than SPPnet,
- runs 200x faster than R-CNN and 10x faster than SPPnet at test-time,
- has a significantly higher mAP on PASCAL VOC than both R-CNN and SPPnet,
- and is written in Python and C++/Caffe.
# *Faster* R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Fast R-CNN was initially described in an [arXiv tech report](http://arxiv.org/abs/1504.08083).
By Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun (Microsoft Research)
This Python implementation contains contributions from Sean Bell (Cornell) written during an MSR internship.
Please see the official [README.md](https://github.com/ShaoqingRen/faster_rcnn/blob/master/README.md) for more details.
Faster R-CNN was initially described in an [arXiv tech report](http://arxiv.org/abs/1506.01497) and was subsequently published in NIPS 2015.
### License
Fast R-CNN is released under the MIT License (refer to the LICENSE file for details).
Faster R-CNN is released under the MIT License (refer to the LICENSE file for details).
### Citing Fast R-CNN
### Citing Faster R-CNN
If you find Fast R-CNN useful in your research, please consider citing:
If you find Faster R-CNN useful in your research, please consider citing:
@article{girshick15fastrcnn,
Author = {Ross Girshick},
Title = {Fast R-CNN},
Journal = {arXiv preprint arXiv:1504.08083},
@inproceedings{renNIPS15fasterrcnn,
Author = {Shaoqing Ren and Kaiming He and Ross Girshick and Jian Sun},
Title = {Faster {R-CNN}: Towards Real-Time Object Detection
with Region Proposal Networks},
Booktitle = {Advances in Neural Information Processing Systems ({NIPS})},
Year = {2015}
}
### Contents
1. [Requirements: software](#requirements-software)
2. [Requirements: hardware](#requirements-hardware)
......@@ -34,7 +44,6 @@ If you find Fast R-CNN useful in your research, please consider citing:
4. [Demo](#demo)
5. [Beyond the demo: training and testing](#beyond-the-demo-installation-for-training-and-testing-models)
6. [Usage](#usage)
7. [Extra downloads](#extra-downloads)
### Requirements: software
......@@ -53,33 +62,33 @@ If you find Fast R-CNN useful in your research, please consider citing:
### Requirements: hardware
1. For training smaller networks (CaffeNet, VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 3G of memory suffices
1. For training smaller networks (ZF, VGG_CNN_M_1024) a good GPU (e.g., Titan, K20, K40, ...) with at least 3G of memory suffices
2. For training with VGG16, you'll need a K40 (~11G of memory)
### Installation (sufficient for the demo)
1. Clone the Fast R-CNN repository
1. Clone the Faster R-CNN repository
```Shell
# Make sure to clone with --recursive
git clone --recursive https://github.com/rbgirshick/fast-rcnn.git
git clone --recursive https://github.com/rbgirshick/py-faster-rcnn.git
```
2. We'll call the directory that you cloned Fast R-CNN into `FRCN_ROOT`
2. We'll call the directory that you cloned Faster R-CNN into `FRCN_ROOT`
*Ignore notes 1 and 2 if you followed step 1 above.*
**Note 1:** If you didn't clone Fast R-CNN with the `--recursive` flag, then you'll need to manually clone the `caffe-fast-rcnn` submodule:
**Note 1:** If you didn't clone Faster R-CNN with the `--recursive` flag, then you'll need to manually clone the `caffe-fast-rcnn` submodule:
```Shell
git submodule update --init --recursive
```
**Note 2:** The `caffe-fast-rcnn` submodule needs to be on the `fast-rcnn` branch (or equivalent detached state). This will happen automatically *if you follow these instructions*.
**Note 2:** The `caffe-fast-rcnn` submodule needs to be on the `faster-rcnn` branch (or equivalent detached state). This will happen automatically *if you followed step 1 instructions*.
3. Build the Cython modules
```Shell
cd $FRCN_ROOT/lib
make
```
4. Build Caffe and pycaffe
```Shell
cd $FRCN_ROOT/caffe-fast-rcnn
......@@ -90,14 +99,15 @@ If you find Fast R-CNN useful in your research, please consider citing:
# and your Makefile.config in place, then simply do:
make -j8 && make pycaffe
```
5. Download pre-computed Fast R-CNN detectors
5. Download pre-computed Faster R-CNN detectors
```Shell
cd $FRCN_ROOT
./data/scripts/fetch_fast_rcnn_models.sh
./data/scripts/fetch_faster_rcnn_models.sh
```
This will populate the `$FRCN_ROOT/data` folder with `fast_rcnn_models`. See `data/README.md` for details.
This will populate the `$FRCN_ROOT/data` folder with `faster_rcnn_models`. See `data/README.md` for details.
These models were trained on VOC 2007 trainval.
### Demo
......@@ -110,37 +120,7 @@ To run the demo
cd $FRCN_ROOT
./tools/demo.py
```
The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007. The object proposals are pre-computed in order to reduce installation requirements.
**Note:** If the demo crashes Caffe because your GPU doesn't have enough memory, try running the demo with a small network, e.g., `./tools/demo.py --net caffenet` or with `--net vgg_cnn_m_1024`. Or run in CPU mode `./tools/demo.py --cpu`. Type `./tools/demo.py -h` for usage.
**MATLAB**
There's also a *basic* MATLAB demo, though it's missing some minor bells and whistles compared to the Python version.
```Shell
cd $FRCN_ROOT/matlab
matlab # wait for matlab to start...
# At the matlab prompt, run the script:
>> fast_rcnn_demo
```
Fast R-CNN training is implemented in Python only, but test-time detection functionality also exists in MATLAB.
See `matlab/fast_rcnn_demo.m` and `matlab/fast_rcnn_im_detect.m` for details.
**Computing object proposals**
The demo uses pre-computed selective search proposals computed with [this code](https://github.com/rbgirshick/rcnn/blob/master/selective_search/selective_search_boxes.m).
If you'd like to compute proposals on your own images, there are many options.
Here are some pointers; if you run into trouble using these resources please direct questions to the respective authors.
1. Selective Search: [original matlab code](http://disi.unitn.it/~uijlings/MyHomepage/index.php#page=projects1), [python wrapper](https://github.com/sergeyk/selective_search_ijcv_with_python)
2. EdgeBoxes: [matlab code](https://github.com/pdollar/edges)
3. GOP and LPO: [python code](http://www.philkr.net/)
4. MCG: [matlab code](http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/mcg/)
5. RIGOR: [matlab code](http://cpl.cc.gatech.edu/projects/RIGOR/)
Apologies if I've left your method off this list. Feel free to contact me and ask for it to be included.
The demo performs detection using a VGG16 network trained for detection on PASCAL VOC 2007.
### Beyond the demo: installation for training and testing models
1. Download the training, validation, test data and VOCdevkit
......@@ -150,7 +130,7 @@ Apologies if I've left your method off this list. Feel free to contact me and as
wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
```
2. Extract all of these tars into one directory named `VOCdevkit`
```Shell
......@@ -167,7 +147,7 @@ Apologies if I've left your method off this list. Feel free to contact me and as
$VOCdevkit/VOC2007 # image sets, annotations, etc.
# ... and several other directories ...
```
4. Create symlinks for the PASCAL VOC dataset
```Shell
......@@ -176,79 +156,31 @@ Apologies if I've left your method off this list. Feel free to contact me and as
```
Using symlinks is a good idea because you will likely want to share the same PASCAL dataset installation between multiple projects.
5. [Optional] follow similar steps to get PASCAL VOC 2010 and 2012
6. Follow the next sections to download pre-computed object proposals and pre-trained ImageNet models
### Download pre-computed Selective Search object proposals
Pre-computed selective search boxes can also be downloaded for VOC2007 and VOC2012.
```Shell
cd $FRCN_ROOT
./data/scripts/fetch_selective_search_data.sh
```
This will populate the `$FRCN_ROOT/data` folder with `selective_selective_data`.
6. Follow the next sections to download pre-trained ImageNet models
### Download pre-trained ImageNet models
Pre-trained ImageNet models can be downloaded for the three networks described in the paper: CaffeNet (model **S**), VGG_CNN_M_1024 (model **M**), and VGG16 (model **L**).
Pre-trained ImageNet models can be downloaded for the three networks described in the paper: ZF and VGG16.
```Shell
cd $FRCN_ROOT
./data/scripts/fetch_imagenet_models.sh
```
These models are all available in the [Caffe Model Zoo](https://github.com/BVLC/caffe/wiki/Model-Zoo), but are provided here for your convenience.
VGG16 comes from the [Caffe Model Zoo](https://github.com/BVLC/caffe/wiki/Model-Zoo), but is provided here for your convenience.
ZF was trained at MSRA.
### Usage
**Train** a Fast R-CNN detector. For example, train a VGG16 network on VOC 2007 trainval:
To train and test a Faster R-CNN detector use `experiments/scripts/faster_rcnn_alt_opt.sh`.
Output is written underneath `$FRCN_ROOT/output`.
```Shell
./tools/train_net.py --gpu 0 --solver models/VGG16/solver.prototxt \
--weights data/imagenet_models/VGG16.v2.caffemodel
```
If you see this error
```
EnvironmentError: MATLAB command 'matlab' not found. Please add 'matlab' to your PATH.
```
then you need to make sure the `matlab` binary is in your `$PATH`. MATLAB is currently required for PASCAL VOC evaluation.
**Test** a Fast R-CNN detector. For example, test the VGG 16 network on VOC 2007 test:
```Shell
./tools/test_net.py --gpu 1 --def models/VGG16/test.prototxt \
--net output/default/voc_2007_trainval/vgg16_fast_rcnn_iter_40000.caffemodel
```
Test output is written underneath `$FRCN_ROOT/output`.
**Compress** a Fast R-CNN model using truncated SVD on the fully-connected layers:
```Shell
./tools/compress_net.py --def models/VGG16/test.prototxt \
--def-svd models/VGG16/compressed/test.prototxt \
--net output/default/voc_2007_trainval/vgg16_fast_rcnn_iter_40000.caffemodel
# Test the model you just compressed
./tools/test_net.py --gpu 0 --def models/VGG16/compressed/test.prototxt \
--net output/default/voc_2007_trainval/vgg16_fast_rcnn_iter_40000_svd_fc6_1024_fc7_256.caffemodel
cd $FRCN_ROOT
./experiments/scripts/faster_rcnn_alt_opt.sh [GPU_ID] [NET] [--set ...]
# GPU_ID is the GPU you want to train on
# NET in {ZF, VGG_CNN_M_1024, VGG16} is the network arch to use
# --set ... allows you to specify fast_rcnn.config options, e.g.
# --set EXP_DIR seed_rng1701 RNG_SEED 1701
```
### Experiment scripts
Scripts to reproduce the experiments in the paper (*up to stochastic variation*) are provided in `$FRCN_ROOT/experiments/scripts`. Log files for experiments are located in `experiments/logs`.
**Note:** Until recently (commit a566e39), the RNG seed for Caffe was not fixed during training. Now it's fixed, unless `train_net.py` is called with the `--rand` flag.
Results generated before this commit will have some stochastic variation.
### Extra downloads
- [Experiment logs](http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/fast_rcnn_experiments.tgz)
- PASCAL VOC test set detections
- [voc_2007_test_results_fast_rcnn_caffenet_trained_on_2007_trainval.tgz](http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/voc_2007_test_results_fast_rcnn_caffenet_trained_on_2007_trainval.tgz)
- [voc_2007_test_results_fast_rcnn_vgg16_trained_on_2007_trainval.tgz](http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/voc_2007_test_results_fast_rcnn_vgg16_trained_on_2007_trainval.tgz)
- [voc_2007_test_results_fast_rcnn_vgg_cnn_m_1024_trained_on_2007_trainval.tgz](http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/voc_2007_test_results_fast_rcnn_vgg_cnn_m_1024_trained_on_2007_trainval.tgz)
- [voc_2012_test_results_fast_rcnn_vgg16_trained_on_2007_trainvaltest_2012_trainval.tgz](http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/voc_2012_test_results_fast_rcnn_vgg16_trained_on_2007_trainvaltest_2012_trainval.tgz)
- [voc_2012_test_results_fast_rcnn_vgg16_trained_on_2012_trainval.tgz](http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/voc_2012_test_results_fast_rcnn_vgg16_trained_on_2012_trainval.tgz)
- [Fast R-CNN VGG16 model](http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/voc12_submission.tgz) trained on VOC07 train,val,test union with VOC12 train,val
("alt opt" refers to the alternating optimization training algorithm described in the NIPS paper.)
Subproject commit bcd9b4eadc7d8fbc433aeefd564e82ec63aaf69c
Subproject commit 4115385deb3b907fcd428ac0ab53b694d741a3c4
This directory holds (*after you download them*):
- Pre-computed object proposals
- Caffe models pre-trained on ImageNet
- Fast R-CNN models
- Faster R-CNN models
- Symlinks to datasets
To download precomputed Selective Search proposals for PASCAL VOC 2007 and 2012, run:
```
./data/scripts/fetch_selective_search_data.sh
```
This script will populate `data/selective_search_data`.
To download Caffe models (CaffeNet, VGG_CNN_M_1024, VGG16) pre-trained on ImageNet, run:
To download Caffe models (ZF, VGG16) pre-trained on ImageNet, run:
```
./data/scripts/fetch_imagenet_models.sh
......@@ -20,13 +11,13 @@ To download Caffe models (CaffeNet, VGG_CNN_M_1024, VGG16) pre-trained on ImageN
This script will populate `data/imagenet_models`.
To download Fast R-CNN models trained on VOC 2007, run:
To download Faster R-CNN models trained on VOC 2007, run:
```
./data/scripts/fetch_fast_rcnn_models.sh
./data/scripts/fetch_faster_rcnn_models.sh
```
This script will populate `data/fast_rcnn_models`.
This script will populate `data/faster_rcnn_models`.
In order to train and test with PASCAL VOC, you will need to establish symlinks.
From the `data` directory (`cd data`):
......@@ -39,7 +30,7 @@ ln -s /your/path/to/VOC2007/VOCdevkit VOCdevkit2007
ln -s /your/path/to/VOC2012/VOCdevkit VOCdevkit2012
```
Since you'll likely be experimenting with multiple installs of Fast R-CNN in
Since you'll likely be experimenting with multiple installs of Fast/er R-CNN in
parallel, you'll probably want to keep all of this data in a shared place and
use symlinks. On my system I create the following symlinks inside `data`:
......@@ -51,6 +42,7 @@ ln -s /data/fast_rcnn_shared/cache
ln -s /data/fast_rcnn_shared/imagenet_models
# move the selective search data to a shared location and symlink to them
# (only applicable to Fast R-CNN training)
ln -s /data/fast_rcnn_shared/selective_search_data
ln -s /data/VOC2007/VOCdevkit VOCdevkit2007
......
......@@ -3,9 +3,9 @@
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/../" && pwd )"
cd $DIR
FILE=fast_rcnn_models.tgz
URL=http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/$FILE
CHECKSUM=5f7dde9f5376e18c8e065338cc5df3f7
FILE=faster_rcnn_models.tgz
URL=http://www.cs.berkeley.edu/~rbg/faster-rcnn-data/$FILE
CHECKSUM=ac116844f66aefe29587214272054668
if [ -f $FILE ]; then
echo "File already exists. Checking md5..."
......@@ -23,7 +23,7 @@ if [ -f $FILE ]; then
fi
fi
echo "Downloading Fast R-CNN demo models (0.96G)..."
echo "Downloading Faster R-CNN demo models (695M)..."
wget $URL -O $FILE
......
......@@ -4,8 +4,8 @@ DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/../" && pwd )"
cd $DIR
FILE=imagenet_models.tgz
URL=http://www.cs.berkeley.edu/~rbg/fast-rcnn-data/$FILE
CHECKSUM=8b1d4b9da0593fc70ef403284f810adc
URL=http://www.cs.berkeley.edu/~rbg/faster-rcnn-data/$FILE
CHECKSUM=ed34ca912d6782edfb673a8c3a0bda6d
if [ -f $FILE ]; then
echo "File already exists. Checking md5..."
......
Scripts to reproduce (most) of the experiments in the paper.
Scripts are under `experiments/scripts`.
Each script saves a log file under `experiments/logs`.
......
EXP_DIR: faster_rcnn_alt_opt
TEST:
HAS_RPN: True
EXP_DIR: faster_rcnn_end2end
TRAIN:
HAS_RPN: True
IMS_PER_BATCH: 1
BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True
RPN_POSITIVE_OVERLAP: 0.7
RPN_BATCHSIZE: 256
PROPOSAL_METHOD: gt
TEST:
HAS_RPN: True
EXP_DIR: fc_only
TRAIN:
SNAPSHOT_INFIX: fc_only
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment