What does a CNN see? : Visualising hidden layers of a Neural Network

Deep Learning has made remarkable progress over the past few years with quick transitions from discovery of new methods to their industrial implementation. While framework and libraries have made creating and working with deep architectures easy, quite less is known by practitioners about the internal states of the process. This post is an attempt to find out what composes a neural network and what a convolutional neural network sees in an input.

The code is publicly available on my Github

The architecture of the network we will work on is as follows:

Input
Convolution (5 x 5)
MaxPooling
Convolution  (5  x 5)
MaxPooling
FullyConnected

The model is trained on the popular MNIST dataset with following parameters:

batch_size = 50
learning_rate = 0.001
epochs = 400
Optimiser = Adam

After training, we load the layer to visualise and pass a sample input via the input layer. The function then runs a session for the layer given the input and returns all the filters that comprise that layer.
This is done by using TensorFlow's session.run() function which returns all the filters when a layer is fed in as an object.

Sample Input:



The results for the sample input are the following visualisations which are plot using matplotlib.

Hidden Layer 1 :

Open in new window for full scale


 Hidden Layer 2:

Open in new window for full scale

The number of plots correspond to the increased number of filters as we go deeper into the network.
The depth also describes how more finer details are sought by the filters as the depth increases. This can be seen in the representation between what HiddenLayer1 vs HiddenLayer2 sees as the filter shows how the input stimulates the filter.

The height of your accomplishments equal the depth of your convictions.

Stats:
TensorFlow 1.8
Jupyter notebook
Ubuntu 17.10

What's that Noise? : Working with sound on Android

Sound is diverse. It's nature is to comprise of so many forms that solid rules cannot draw lines between them. I always have been an admirer of sound.

I have spent a few weeks implementing sound acquisition and processing in android and have come up with something to begin with.

Presenting:



Shh..Silence 

An application that monitors sound level of an environment and plays a 'Shhhh...' when noise levels pass a limit.

You can download it here:

Get it on Google Play
Google Play and the Google Play logo are trademarks of Google LLC.

For the end user, it is a harmless application but from an engineering point of view, the internals are a window to a ton of possibilities. 

The application monitors sound in the following manner : 
  1. Acquire Microphone 
  2. Configure 
    • Sample Rate
    • Mono/Stereo
    • Encoding format
    • Buffer size
  3. Calculate average over the buffer size
  4. Compare obtained value with threshold 
  5. Trigger when above threshold
    The pipeline is simple when dealing only with the average. Here's a code snippet for step 3:

     private void readAudioBuffer() {  
         try {  
           short[] buffer = new short[bufferSize];  
           int bufferReadResult = 1;  
           if (audio != null) {  
             bufferReadResult = audio.read(buffer, 0, bufferSize);  //Audio Samples
             double sumLevel = 0;  
             for (int i = 0; i < bufferReadResult; i++) {  
               sumLevel += buffer[i];  
             }  
             lastLevel = Math.abs((sumLevel / bufferReadResult));  
           }  
         } catch (Exception e) {  
           e.printStackTrace();  
         }  
       }  
    

    The most intriguing part of it is at bufferReadResult. It contains a sequence of numbers that depict the sound received by the microphone. Following this, it is a matter of requirement what needs to be done next. On extracting audio features like Mel Coefficients, MFCC etc, the application can be stretched to domains of Audio Classification, Speech Recognition, User Identification and Keyword Detection.

    Implementing ML/DL on android has become easier than ever using TensorFlow with a light framework. The next step is to develop an application that uses TensorFlow for purpose of classifying sounds.

    The Optimist sees the potential in a seed

    Kudos.

    An Infinite point possibilities : Intel's Open3D Library

    Intel have recently launched its open source library for 3D data processing Open3D  [ research paper by Qian-Yi Zhou and Jaesik Park and Vladlen Koltun ]

    *not the official logo, only for personal representation

    Open3D is an open-source library that supports rapid development of software that deals with 3D data. The Open3D frontend exposes a set of carefully selected data structures and algorithms in both C++ and Python. The backend is highly optimized and is set up for parallelization. Open3D was developed from a clean slate with a small and carefully considered set of dependencies. It can be set up on different platforms and compiled from source with minimal effort. The code is clean, consistently styled, and maintained via a clear code review mechanism. Open3D has been used in a number of published research projects and is actively deployed in the cloud.

    With Open3D, the library enables developers to work with 3D models and point clouds.
    Open3D has the following features:

    • Basic 3D data structures
    • Basic 3D data processing algorithms
    • Scene reconstruction
    • Surface alignment
    • 3D visualization
    With Open3D, RGBD images (Images with 3 color components and a Depth component) can be converted into 3D models. Here' a python code snippet to achieve just that:

     import sys  
     import py3d  
     import matplotlib.pyplot as plt  
     sys.path.append("../Open3D/build/lib/")  
     print("Read Redwood dataset")  
     color_raw = py3d.read_image("/home/<username>/Open3D/build/lib/TestData/RGBD/color/00000.jpg")  
     depth_raw = py3d.read_image("/home/<username>/Open3D/build/lib/TestData/RGBD/depth/00000.png")  
     rgbd_image = py3d.create_rgbd_image_from_color_and_depth(  
         color_raw, depth_raw);  
     print(rgbd_image)  
     plt.subplot(1, 2, 1)  
     plt.title('Redwood grayscale image')  
     plt.imshow(rgbd_image.color)  
     plt.subplot(1, 2, 2)  
     plt.title('Redwood depth image')  
     plt.imshow(rgbd_image.depth)  
     plt.show()  
     pcd = py3d.create_point_cloud_from_rgbd_image(rgbd_image,  
                            py3d.PinholeCameraIntrinsic.prime_sense_default)  
     pcd.transform([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]])  
     py3d.draw_geometries([pcd])  
     print("Writing ply file")  
     py3d.write_point_cloud("Redwood.ply", pcd)  
    

    The result obtained is as follows:


    Open3D has been developed keeping in mind the computations required for solving 3-dimensional geometry and the need for parallelization for faster turn-around times. It has an inbuilt visualiser which enables developers to visually examine their work and also manipulate them using pan and rotate controls along with a dozen more manipulations such as lighting, changing point size, toggling mesh wireframe etc.

    Open Source community has always accelerated the development of advanced tools and libraries. Looking forward to the community to scale this one ahead too.

    Limitations only exist if you let them
    <This post is an attempt to integrate 3D model in a webpage using webVR, stay tuned for a post update>

    Peace Out.