Enhanced selfie experience via MLKit

Photo by Steve Gale on Unsplash

Enhanced selfie experience via MLKit

At InCred, we are reinventing the lending businesses by emphasizing and investing more and more in cutting-edge technologies to solve real-world problems. Our aim is to provide a hassle-free lending process with an awesome customer experience. This has become more relevant in these tough and challenging times(COVID-19) where the fewer the human interactions are, the safer the world can be. To have zero paperwork in the process, we introduced customer tasks where users can complete any task like Taking a selfie, Signing the agreement, Submitting documents, NACH registration, etc by themselves without requiring any human interaction. Taking a selfie is the most important step in our loan lending business because an unclear picture can lead to the delay or rejection of the loan application. This article will tell about how we are utilizing MLKit to improve the Selfie experience of our users.

Face Detection via MLKit

When we started on our journey to improve the Selfie experience, the aim was very simple: Our customers should be able to capture their Selfies clearly for faster loan processing. So, we utilised the power of machine-learning mainly Face detection MLKit(earlier via Firebase) to solve just that. We went with the unbundled face model approach for MLKit as the bundled one would increase the app size by 16MB. It pulls in the face models lazily when the app is being initialized and opened for the first time.

// Add this in your AndroidManifest.xml to automatically download the face model from the Play Store after the app is installed
<meta-data
android:name="com.google.mlkit.vision.DEPENDENCIES"
android:value="face" />

Setting up Camera 📸

The choice of camera library decides the efforts of implementation. Until CameraX becomes stable, we would want to use the library that provides easy-to-use APIs that are compatible with different Android OS, and then we found CameraView. CameraView is well-documented and powered by Camera1(<API 21) and Camera2(≥API 21) engines making it fast & reliable. It has all the features, we need, the most important being Frame Processing Support. It also offers support on LifeCycleOwner where the CameraView handles the LifeCycle event on its own, mainly asking for permission in onResume, cleaning frame processors & listeners, and destroying cameraViewin onDestroy of fragment/activity. By default, CameraView offloads the Frame processing to the background thread so that these frames can then be consumed by the face detector synchronously.

cameraView.apply {
    setLifecycleOwner(viewLifecycleOwner)
    facing = Facing.FRONT
    addCameraListener(cameraListener())
    addFrameProcessor {
        faceTrackerViewModel.processCameraFrame(it)
        cameraOverlay.setCameraInfo(
            it.size.height,
            it.size.width,
            Facing.FRONT
        )
    }
}

It is important that you set the width and height of cameraOverlay to the received frame’s width and height. While testing initially on different devices, the face detected rectangle went away from where the face actually resides in. More details are present in the issue and how we fixed it.

Reactive Frame Processing for Face Detection 📽️

The frame is then sent to the frame processor via faceTrackerViewModel to detect a face in it.

fun processCameraFrame(it: Frame) {
    val byteBuffer = ByteBuffer.wrap(it.getData())
    val frameMetadata =
        FrameMetadata(it.size.width, it.size.height, it.rotationToUser, Facing.FRONT)
    compositeDisposable.add(
        rxFaceDetectionProcessor.process(byteBuffer, frameMetadata)
            .subscribe()
    )
}

All this is done while keeping the reactive nature of the app intact via RxFaceDetectionProcessor . The ViewModel will pass on the frame to the processor which will process each frame asynchronously off the UI thread.
RxFaceDetectionProcessor is the reactive layer written over FaceDetectionProcessor that emits face-detection results via FaceDetectionResultListener which are then consumed by faceDetectionResultLiveData = LiveData<List<Face>> . This faceDetectionResultLiveData is observed via the view layer via viewModel to display the rectangular bounding box over the face.

class RxFaceDetectionProcessor
@Inject
constructor(private val faceDetectionProcessor: FaceDetectionProcessor) :
    FlowableOnSubscribe<List<Face>>,
    FaceDetectionResultListener {
    private lateinit var emitter: FlowableEmitter<List<Face>>
    private lateinit var data: ByteBuffer
    private lateinit var frameMetadata: FrameMetadata
    private lateinit var faceDetectionResultLiveData: MutableLiveData<List<Face>>

    fun setFaceDetectionResultLiveData(faceDetectionResultLiveData: MutableLiveData<List<Face>>) {
        this.faceDetectionResultLiveData = faceDetectionResultLiveData
    }

    fun process(
        data: ByteBuffer,
        frameMetadata: FrameMetadata
    ): Flowable<List<Face>> {
        this.data = data
        this.frameMetadata = frameMetadata
        return Flowable.create(this, BackpressureStrategy.LATEST)
    }

    override fun subscribe(emitter: FlowableEmitter<List<Face>>) {
        this.emitter = emitter
        faceDetectionProcessor.process(data, frameMetadata, this)
    }

    override fun onSuccess(
        results: List<Face>
    ) {
        faceDetectionResultLiveData.value = results
    }

    override fun onFailure(e: Exception) {
        Timber.d(e)
        faceDetectionResultLiveData.value = emptyList()
    }

    fun stop() {
        faceDetectionProcessor.stop()
        if (::emitter.isInitialized)
            emitter.setDisposable(Disposables.disposed())
    }
}

FaceDetectionProcessor is the class where actual frame processing happens. This involves creating FaceDetector with FaceDetectorOptions to start processing each frame for our use case.

val options = FaceDetectorOptions.Builder()
    .apply {
      setClassificationMode(FaceDetectorOptions
                                       .CLASSIFICATION_MODE_NONE)
       setLandmarkMode(FaceDetectorOptions.LANDMARK_MODE_NONE)
       setPerformanceMode(FaceDetectorOptions.PERFORMANCE_MODE_FAST)
       enableTracking()
    }
    .build()
detector = FaceDetection.getClient(options)

We wanted the face detection to be as fast as possible without requiring any extra processing on landmarks or classification. Hence, we went for faster performance by limiting the features and not requiring the classification of faces(smiling, eyes open, etc) or any landmark detection.

Why are the frames getting processed even if the face detector is closed?
When we started pushing frames for face detection, we came around with one more blocker issue. We found that even if the face detector was closed and the camera view had been destroyed, the frames kept on getting processed thus affecting the battery performance. Anything which hinders the performance, cannot hit production at any cost. So we sat around to fix the issue. We found that we were feeding more frames to faceDetector than it could actually consume and since faceDetector took time to compute the result for a particular frame, the next frame was getting processed even if the faceDetector was closed because those were already sent in its buffer. Here is the processing logic after the fix.

class FaceDetectionProcessor
@Inject
constructor() {
    private val detector: FaceDetector
    private var latestImage: ByteBuffer? = null
    private var latestImageMetaData: FrameMetadata? = null
    private var processingImage: ByteBuffer? = null
    private var processingMetaData: FrameMetadata? = null

    init {
        // Face detector initialisation here
    }

    fun process(
        data: ByteBuffer,
        frameMetadata: FrameMetadata,
        detectionResultListener: FaceDetectionResultListener
    ) {
        latestImage = data
        latestImageMetaData = frameMetadata
        // Process the image only when the last frame processing has been completed
        if (processingImage == null && processingMetaData == null) {
            processLatestImage(detectionResultListener)
        }
    }

    private fun processLatestImage(detectionResultListener: FaceDetectionResultListener) {
        processingImage = latestImage
        processingMetaData = latestImageMetaData
        latestImage = null
        latestImageMetaData = null
        if (processingImage != null && processingMetaData != null) {
            processImage(
                requireNotNull(processingImage),
                requireNotNull(processingMetaData),
                detectionResultListener
            )
        }
    }

    private fun processImage(
        data: ByteBuffer,
        frameMetadata: FrameMetadata,
        detectionResultListener: FaceDetectionResultListener
    ) {
        detectInVisionImage(
            InputImage.fromByteBuffer(
                data,
                frameMetadata.width,
                frameMetadata.height,
                frameMetadata.rotation,
                InputImage.IMAGE_FORMAT_NV21
            ),
            detectionResultListener
        )
    }

    private fun detectInVisionImage(
        image: InputImage,
        detectionResultListener: FaceDetectionResultListener
    ) {
        detector.process(image)
            .addOnSuccessListener {
                detectionResultListener.onSuccess(it)
            }
            .addOnFailureListener {
                detectionResultListener.onFailure(it)
            }.addOnCompleteListener {
                // Process the next available frame for face detection
                processLatestImage(detectionResultListener)
            }
    }

    fun stop() {
        try {
            Timber.d("Face detector closed")
            detector.close()
        } catch (e: IOException) {
            Timber.e("Exception thrown while trying to close Face Detector: $e")
        }
    }
}

Amit Randhawa from InCred’s Android team showing the Selfie Experience

Quality Selfies via Auto Capture 🤳

We launched our newly developed selfie experience to our users which received very good feedback. At InCred, we always aim to make good, better and then strive for the best. So, we completely enhanced the selfie experience by introducing auto-capture once the face is detected with an utmost quality snapshot. We provided an oval overlay on top of the camera feed and asked our users to have their faces inside it. Once the face was detected, we would auto-capture the selfie. To guide the users properly, we provided real-time feedback like (You are too near to the camera/ You look too far from the camera, etc.) on top of the overlay so that our users could do this task without requiring any manual support.

Detecting the face inside the oval and the feedback

We added an oval overlay on top of the GraphicOverlay. Note that GraphicOverlay helped earlier in adding the rectangular bounding box graphic once the face is detected. We were looking at 3 kinds of feedback from the users:

  • Face inside oval: To detect whether the detected face was actually inside the oval, it was just about checking whether the face bounding box was inside the oval dimensions. Here this is the oval dimensions and other parameters for the face bounding box.

      fun sidesInsideOval(top: Float, right: Float, bottom: Float, left: Float): Boolean {
          if (top >= this.top && bottom <= this.bottom && left >= this.left && right <= this.right)
              return true
          return false
      }
    
  • The face inside the oval but zoomed out: The face of the user could be inside the oval but if it is too far from the camera, the selfies captured would not be clear. To verify this case, we compared the vertical distance of the face bounding box with the oval. If it was less than half, we gave the feedback to the user to move closer to the camera.

      fun isFaceZoomedOut(top: Float, bottom: Float): Boolean {
          if (((bottom - top) / (this.bottom - this.top)) <= 0.5)
              return true
          return false
      }
    
  • Face zoomed in: The user could be holding the camera too near to the face which could bring in sub-quality selfies. To verify this, we checked whether any of the coordinates of the face bounding box is greater than the oval dimension or not. If the face was zoomed in, we gave the feedback to the user to move near to the camera.

      fun isFaceZoomedIn(top: Float, right: Float, bottom: Float, left: Float): Boolean {
          var sidesInside = 0
          if (top >= this.top)
              sidesInside += 1
          if (bottom <= this.bottom)
              sidesInside += 1
          if (left >= this.left)
              sidesInside += 1
          if (right <= this.right)
              sidesInside += 1
          if (sidesInside <= 1)
              return true
          return false
      }
    

Once the corner cases had been tested, we experimented with our new selfie experience in conjunction with the old one via Firebase A/B testing. Within a month we found that the oval overlay with the auto-capture feature was performing way better than the former one. This provided confidence in our new feature and then we launched it to all our users.

Amit Randhawa from InCred’s Android team showing the new Selfie Experience

The fallback approach

Android is heavily fragmented and with different OEMs, it is more and more difficult to test your feature on all Android devices. This becomes more problematic in a country like India where there is a vast majority of OEMs that customize the behaviour of Android OS according to their needs and requirements. This means that if your feature does not work on some devices, you are basically blocking the user journey in your app. To solve for those users, we provided a fallback approach where if a face was not detected in 10s, we would move to the native camera experience. With this, we were not blocking the user’s loan application journey at any cost while giving us time to fix the issues in the background.

Always provide the fallback approach for your feature. At the end, you would never want to block your users and then receive bad reviews on Play Store.

We truly believe that Necessity is the mother of invention. The current tough times are pushing us to re-imagine and build every feature without requiring any human intervention. We are on our path to not only rebuild/revamp our tech stack but also solidify and improve the features that can boost the user experience. If this excites you, we will be more than happy to discuss it.