Despite many years of the digital transformation era, it still exits a lot of processes that require managing physical or printed documents, which are typically scanned or just “digitalized” by taking a photograph. Some examples are legal agreements, IDs, checks, and other financial documents.
In this article, we will focus on a particular use case in banking: signature verification in checks. However, the following solution could be applied to any document that requires signatures.
Let’s start by defining what we understand by verification or validation. Identity validation is the process of verifying that a person's identity matches the information they have provided. This can involve confirming their identity documents, such as a passport or driver's license, or using biometric data, such as fingerprints or facial recognition (e.g., Cogniflow’s Face Similarity model), to confirm their identity. Identity validation or verification aims to prevent fraud and ensure that only authorized individuals have access to sensitive information or services.
In the following sections, we will describe in more detail every step needed to perform a signature verification from an image of a check.
As usual in the world of AI and Machine Learning, raw data can be pretty challenging. As was detailed in a previous post, the images of bank checks usually taken from cell phones are rotated, blurry, angled, and noisy. To pre-process the images, we used the exact techniques developed in our previous work to detect and extract the signatures from the pictures.
One of the critical aspects also to consider while validating a signature is its own variable nature. No signature is 100% equal to another for the same drawer, as you can see in the image below. As will be discussed later, this is one of the many edges to ponder while comparing two images of this kind.
So, how do we deal with signatures in AI?
In AI, an embedding is a vector representation of data that captures its essential characteristics. It is a dense representation of the original data that can be used as input to a machine learning model. Embeddings are commonly used in natural language processing and computer vision tasks to transform data into a more manageable and informative format. In computer vision, an embedding can be used to represent an image as a vector in a high-dimensional space. The vector can then be used for tasks such as image classification, object detection, and similarity search.
In our case, we take every cropped signature image and generate its embedding. This is an excellent fit for our context because when we have every signature represented as an embedding and then store it in a vector space, it is super fast to compare vectors using algebraic properties.
The fastest way of generating embeddings for images is by using a pre-trained convolutional neural network (e.g., VGG16, EfficientNetB3, MobileNetV2, and so on) with its top layer removed, so the output will consist of a latent features vector which will represent the image. Alternatively, a fine-tuning process could be done aiming at achieving a more precise representation of the images in the domain.
Now that we have a vector to represent each signature, it’s necessary to compare them and decide whether they are valid or not.
How do we do that?
Visual Search (VS) is a machine learning technique that involves searching for an image in a database of already known embedding vectors based on similarity. A new, unknown image (i.e., already represented as a vector) is received as input, and the Visual Search returns the K most similar images in the database. The technique requires a right image representation (embedding), a distance metric, and a similarity threshold, which are used to assess whether two images are similar or not.
After finding the best way of representing the images, a suitable distance metric is necessary to assess whether two images are the same or pretty alike. The closer two vectors are in the vector space, the more similar the images will be. There are many ways to measure distance in AI, such as the cosine, euclidean, or Manhattan, among others.
After defining the distance metric, we need to find the similarity threshold to conclude if the signature is the same or not. In the next section, we will describe how we found its value.
To sum up, once we have the embedding of a new signature, we search the database to get the top k most similar signatures. Finally, we compare their distances with the similarity threshold to verify their identity.
The following diagram illustrates the high-level process for signature verification.
All images of checks in our dataset were processed in order to build the image index (vector space).
In order to find the best similarity threshold, the images were split following 80/20 criteria: 80% for building the index (train set) and 20% for running the search (test set). Knowing the drawer for every bank check, for each cropped signature in the train and test sets, a VS was run using a flat index and the cosine similarity metric. If a good match was found, it was considered a hit and, that way, added to the computation of the accuracy metric. This same process was repeated for different similarity thresholds, which allowed us to find the best one with an accuracy of 91%.
After determining the optimal similarity threshold, we are ready to start verifying new signatures. Now, when a new unidentified bank check is processed, the model will return for each of the signatures found, their corresponding drawer, and the similarity score. If a signature turns out not to be valid but afterward is manually validated, it will be stored for future batch-upgrading of the index so that it can be validated in the future.
This article described how the power of AI could be unleashed with the help of Cogniflow to add a crucial step in bank check processing: signature validation. The most relevant challenges were shown next to how they were solved to make the solution more complete and robust and, at the same time, achieve considerable business expense reduction.
We presented how Visual Search, a machine learning technique, is an elegant and powerful solution to finding similarities like signatures.
One of the remarkable sides of the approach taken here is its viability regarding scaling, the introduction of new signatories, and multi-signed bank checks. The efficient way of building the image index allows an efficient scaling process (i.e., adding more images) while keeping it fast and accurate. Also, adding new, unknown signatures is pretty straightforward once the right embedding generator and index are built.
Once again, Cogniflow showed how AI could be applied for the automation of processes that traditionally have been done manually or as a way of helping humans to complete repetitive, error-prone tasks in a fast and effective manner.
Also, detecting possible frauds and forgeries is necessary given the growing number of data leaks, privacy violations and cybersecurity threats, among other ills of the digital world. AI demonstrated once again how it could be used for good purposes and make human life safe, easy, and convenient.