Merge pull request #574 from microsoft/staging

PatrickBue · web-flow · commit 16d2caf2db7b · 2020-06-26T21:15:03.000Z
Small edits to readmes (#573)
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 <img src="scenarios/media/logo_cvbp.png" align="right" alt="" width="300"/>
 
 ```diff
-+ Update June 24: Added action recognition as new core scenario. 
++ Update June 24: Added action recognition as new core scenario.
 +                 Object tracking coming soon (in 2-4 weeks).
 ```
 
@@ -37,7 +37,7 @@ Our target audience for this repository includes data scientists and machine lea
 To get started, navigate to the [Setup Guide](SETUP.md), which lists
 instructions on how to setup the compute environment and dependencies needed to run the
 notebooks in this repo. Once your environment is setup, navigate to the
-[Scenarios](scenarios) folder and start exploring the notebooks.
+[Scenarios](scenarios) folder and start exploring the notebooks. We recommend to start with the *image classification* notebooks, since this introduces concepts which are also used by the other scenarios (e.g. pre-training on ImageNet).
 
 Alternatively, we support Binder
 [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/PatrickBue/computervision-recipes/master?filepath=scenarios%2Fclassification%2F01_training_introduction_BINDER.ipynb)
diff --git a/scenarios/tracking/README.md b/scenarios/tracking/README.md
@@ -1,23 +1,23 @@
 # Multi-Object Tracking
 
 ```diff
-+ June 2020: This work is ongoing.
++ June 2020: All notebooks/code in this directory is work-in-progress and might not fully execute. 
 ```
 
 This directory provides examples and best practices for building multi-object tracking systems. Our goal is to enable the users to bring their own datasets and train a high-accuracytracking model easily. While there are many open-source trackers available, we have implemented the [FairMOT tracker](https://github.com/ifzhang/FairMOT) specifically, as its algorithm has shown competitive tracking performance in recent MOT benchmarking challenges, at fast inference speed.
 
 ## Technology
-Multi-object-tracking (MOT) is one of the hot research topics in Computer Vision, due to its wide applications in autonomous driving, traffic surveillance, etc. It builds on object detection technology, in order to detect and track all objects in a dynamic scene over time. Inferring target trajectories correctly across successive image frames remains challenging: occlusion happens when objects overlap; the number of and appearance of objects can change. Compared to object detection algorithms, which aim to output rectangular bounding boxes around the objects, MOT algorithms additionally associated an ID number to each box to identify that specific object across the image frames. 
+Multi-object-tracking (MOT) is one of the hot research topics in Computer Vision, due to its wide applications in autonomous driving, traffic surveillance, etc. It builds on object detection technology, in order to detect and track all objects in a dynamic scene over time. Inferring target trajectories correctly across successive image frames remains challenging: occlusion happens when objects overlap; the number of and appearance of objects can change. Compared to object detection algorithms, which aim to output rectangular bounding boxes around the objects, MOT algorithms additionally associated an ID number to each box to identify that specific object across the image frames.
 
 As seen in the figure below ([Ciaparrone, 2019](https://arxiv.org/pdf/1907.12740.pdf)), a typical multi-object-tracking algorithm performs part or all of the following steps:
 * Detection: Given the input raw image frames (step 1), the detector identifies object(s) on each image frame as bounding box(es) (step 2).
-* Feature extraction/motion prediction: For every detected object, visual appearance and motion features are extracted (step 3). Sometimes, a motion predictor (e.g. Kalman Filter) is also added to predict the next position of each tracked target. 
-* Affinity: The feature and motion predictions are used to calculate similarity/distance scores between pairs of detections and/or tracklets, or the probabilities of detections belonging to a given target or tracklet (step 4). 
+* Feature extraction/motion prediction: For every detected object, visual appearance and motion features are extracted (step 3). Sometimes, a motion predictor (e.g. Kalman Filter) is also added to predict the next position of each tracked target.
+* Affinity: The feature and motion predictions are used to calculate similarity/distance scores between pairs of detections and/or tracklets, or the probabilities of detections belonging to a given target or tracklet (step 4).
 * Association: Based on these scores/probabilities, a specific numerical ID is assigned to each detected object as it is tracked across successive image frames (step 5).
 
 <p align="center">
 <img src="./media/figure_motmodules2.jpg" width="700" align="center"/>
-</p> 
+</p>
 
 
 ## State-of-the-art (SoTA)