In this article, we will present how MiFinanzas, one of the most pushing forward Latam fintechs, used Cogniflow to automate part of their manual daily work. The company’s trade consists in facilitating access to financing and financial management for small and medium-sized companies, offering tools to boost businesses and ventures in a simple and accessible way. MiFinanzas has already more than 12.600 clients and 29.000 financings granted rounding 132 million dollars. A well-established and consolidated fintech.
AI facilitates a wide range of tools for fintech companies to transform their workflows. From fraud detection to financial decision making, through credit risk assessment or default prediction, among others, AI techniques can assist these firms to turn their processes lighter and more cost-effective.
MiFinanzas was not the exception, having some improvement opportunities on the manually processing of the bank checks photos. These images were being uploaded by the users through the web platform or the mobile app. Checks contain a lot of information for being manually, visually retrieved: bank and branch ids, client current account number, currency, amount, if it’s signed or not, among others. So they came to us with one wish in mind: how Cogniflow can help us process our bank check images in a more efficient way by invoking the power of AI? 🤖
Fortunately, Cogniflow had the right tools in order to cope with the challenges presented: Image Recognition and Object Detection models. The first one is to identify the check’s issuing bank and the second one to detect (and count) signatures and read the digits located at the check’s footer.
So let’s see how they used Cogniflow to solve the challenges presented.
The first step to take, as it’s usual with AI problems, is to have a look at the data available. Data consisted of bank checks pictures taken by the users with their smartphones. One of the most challenging aspects, as well as for almost any task to be solved with AI, was the data quality (and many times, quantity). Data quality is always problematic, but maybe there is one scenario that could be considered as the top challenge maker: in-the-wild data 😱. In some cases, as we can see in Figure 1, some of the users took the “in-the-wild” side for real.
As shown in Figure 1, more than one bank check can appear in a picture, making it difficult to decide which one is the one to be read; digit occlusion may occur (fingers over the footer); not fully horizontal or camera-angled images; image background, that acts as a noise source, is often present, and so on. These issues can severely test the robustness of any algorithm. But luckily most of the images were decently taken.
Once a dataset is available it’s time to roll up your sleeves and start labeling it 😓. Labeling is a work-intensive task, which consumes time and effort, but open source free tools like Label-Studio are at hand to complete it easily. Label-Studio supports a variety of labeling for different AI fields, like NLP, Audio Processing or Computer Vision, and so on. In particular, the Object Detection labeling functionality was used to label both signatures and footer digits. Label-Studio is already integrated into Cogniflow, so there is no need for any extra registration/subscription effort.
Object Detection labeling consists in generating bounding boxes around the object to be detected, with its respective label. Label-Studio makes it easy to set up the labels and their colors, as well as, of course, label the data.
After the data is labeled, it can be exported in different formats, depending on the models chosen, to finally upload to Cogniflow both images and bounding boxes labels and coordinates in a zip file. And that’s all! 🙌
For the Image Recognition problem (i.e. identify the issuing bank), data labeling is really simple with Cogniflow: just name folders as each category and fill them with the corresponding data. Then upload them in a zip file.
Once a labeled dataset is uploaded, let Cogniflow do the hard work and start training and finding the best model! 🚀 Cogniflow will notify you by e-mail when the job is done. Easy, isn’t it?
But let’s first take a glimpse at the techniques used by Cogniflow to complete the tasks scheduled.
Computer Vision (CV) is one of the main sub-disciplines in AI. CV includes tasks such as image recognition (a.k.a image classification), semantic segmentation, image regression, or object detection, among others. CV is in charge of making machines see (what a responsibility, huh?).
In order to solve their challenges, MiFinanzas was particularly interested in two specific CV tasks: Image Recognition (IR) and Object Detection (OD). The first one is useful when classifying images according to some pattern is necessary (e.g. classify if it is a 😺 or a 🐶 picture), while the second one is in charge of locating specific objects inside images (e.g. where is the 🐶 located in the image?).
Besides being IR and OD both tasks part of the CV world, they come with a battery of different algorithms and strategies to deal with the challenges presented. That’s why Cogniflow has many functionalities to accomplish this kind of tasks by using a No-Code AI paradigm, where technical skills are not required and we can build a fast and accurate solution without writing a single line of code! Amazing, isn’t it?
MiFinanzas used Cogniflow to train many AI models to deal with some of the issues to resolve. Which bank does this check belong to? Is this check signed or not? If it is signed, how many signatures does it have? How to read the footer CMC7 digits? As mentioned above, these questions are circumscribed in different Computer Vision tasks, like Image Recognition and Object Detection.
It's the first model in the data pipeline, and it’s in charge of detecting whether a check belongs to bank A or B (with eight different banks in the dataset). The model was trained with Cogniflow, using as training data raw images of bank checks.
Note: To see more details about how to create an image recognition solution in Cogniflow, just click here, go to “How to create an AI-based solution for content moderation using Cogniflow?” and follow the steps.
Results: 0.94 accuracy score achieved. This means that 94 out of 100 bank checks could be correctly classified (measured with images that were not part of the training set).
At Cogniflow it’s also possible to rapidly test a trained model by just uploading an image or taking a snap with the webcam. As shown in the image below, the model will return you the label of the image (the name of the bank in this case). Check how the issuing bank concurs with the predicted label 🤗.
The second model that MiFinanzas created with Cogniflow was a signature detection model to detect and locate signatures in the bank checks. There was no need to worry too much about image alignment because the Signature Detector is capable of finding them regardless of bank check position in the image. The model was trained with a dataset of raw, in-the-wild images.
The main goal of this detector is to find bank checks without any signature, which is a non-compliance case to be informed. The model works fine with unsigned bank checks. The cases where handwritten text next to signatures was wrongly detected as signatures, remained pretty atypical, and by modifying the confidence coefficient (let’s call it the detector sensitivity) it is possible to detect more signatures that otherwise would be ignored.
As well as a bounding box, the model will return a score between 0 and 1, indicating how confident it is about its decision (i.e. bigger the score, more certain the detection).
Successful signature detection example
Examples as above mentioned were rare. Most cases consisted of images with low resolution and/or with particular signature patterns, often similar to the handwritten text in the amount field or narrow and vertical enough to overlap the upper field. This issue can be solved by training with more data, including more examples as the one above in order to learn better. Anyway, the current model performance is capable of drastically reducing the manual inspection of bank checks images, making the whole process significantly more efficient.
Results for the first released model:
These results represent an error rate lower than 1%, which translates to having to manually process 18 checks instead of 2.619. This model can save around 99% of the human effort.
The objective of this task was to automatically read the digits located at the footer so that no human or CMC7 reader machine had to do it. But first, let’s see a little more about footer digits.
Bank checks are rich in information. Each check contains features such as bank logo, id, and branch; checkbook identifiers; client bank account id; total amount (numerical and textual); two signature fields and footer digits (which format depends on the country banking regulations).
CMC7 is one of the fonts used for Magnetic Ink Character Recognition (MICR). Its purpose is to be printable and easy-reading both for humans and machines. Each character has a barcode format, and the magnetic ink makes it possible for machines to easily read them.
CMC7 has been broadly adopted in Europe and South America (including Uruguay), while North America, Asia, and Commonwealth countries went for the E-13B font.
Each ten digits block has its own purpose. From left to right, the first one encodes information about currency and bank check’s identification numbers; the second block contains bank and branch numbers (and verifiers); the last ten digits are the client’s bank account number, filled with zeros to the left if necessary. The first character in the footer is the internal, while the last one is the terminator. Both are part of the CMC7 font.
As mentioned above, some of the digits are verifiers. They are all located at the second block and check that the information coded in the other blocks digits is consistent. They also demonstrated being useful in order to measure the quality of the detection by the validation of each of the three blocks.
Optical Character Recognition (OCR) was the first choice considered, but was rapidly discarded because of the low image quality frequently encountered (i.e. OCR works fine when a high image resolution is available) and the particular font used for footer digits. So an Object Detection approach was taken.
Training this model proved the importance of having the right quantity of labeled data available. Also image resolution is key in the success of the detector because it is strongly related to the capability of the model to differentiate between digits with similar patterns, as for example the 3s and 8s are —and don’t forget object occlusion, multiple bank checks in one image or noisy backgrounds, as mentioned before—.
This first Digit Detector model was trained with 600 labeled images, which is not a big dataset when considering ten different objects are to detect (the ten digits).
After the first model was released, MiFinanzas’ team generated a new batch of labeled bank checks for fine-tuning. Fine-tuning consists in picking the first trained model and make it learn from the new data. This is a usual approach when iterating over a model, or fine-tuning a pre-trained model from a third party. The only motivation for fine-tuning is to achieve better results thanks to a new, more intelligent model.
But not always just pouring in more data is the proper thing to do. There are better, clever ways to do it, and that’s what MiFinanzas did. This is in fact a common approach when fine-tuning a model: it’s more fruitful (and efficient) to re-train it with data which is more challenging for the original model, so it can learn better. In this case “challenging” means hard for the trained model to perform well. Analogous to human learning, it’s as if you want to learn more about a topic and choose new material which is at the same level of the one you have already read. If you really want to learn better, you should read more in-deep, challenging texts. This is the main principle of Active Learning. So, by selecting fewer but more fit data, it’s possible to achieve superior results while also cutting costs and effort. Looks like a win-win situation, doesn't it?
So, MiFinanzas’ team took the 240 images that performed worst with the first model —the ones with lower confidence scores—, labeled them and fine-tuned the model until a new release was ready to go.
This second model had about 6 errors every 1000 digits batch. Considering that each check has 30 digits, this means that for a 33 checks batch, around 6 are not a 100% correctly read. Worst case scenario, each error pops up in different checks (i.e. a 82% accuracy). But in many cases most errors would be present at the same time, in the same check —being digits occlusion a perfect example— meaning a higher rate of correctly read bank checks.
Finally, after training all previously described models, we have an AI-powered full data pipeline capable of recognizing the bank, detecting if the check is signed or not, and lastly retrieving the ten digits located at the footer.
In this post, we covered a realistic use case for one of the most pushing forward Latam fintech, MiFinanzas. The journey started with assessing the data available to finally obtain the models, going through the data labeling and the training of the different models.
We talked about classical AI tasks such as Object Detection and Image Recognition and how they can help a company (or any organization) with coping with efficiency challenges by automatizing repetitive but work-intensive duties.
It was also exposed how a second iteration, with just a few new data examples, could achieve even better results. This is relevant also because shows how it’s is possible to leverage previously trained models (and all the work done before) to generate a virtuous cycle of continuous improvement resulting in more powerful AI-based tools.
One of the key aspects to have into consideration for future iterations is data quality. To accomplish this, it’s essential to give users a bit of a nudge to upload high resolution, correctly focused, horizontal, not camera angles, minimum background, single item (bank check) pictures.
One of the most relevant questions this project answered was that there isn’t always necessary to really have a large amount of data to achieve good results. This is not “Big Data”, just using hundreds of relatively small-sized images for training could result in truly powerful models, being them Object Detectors or Image Recognizers. If you are a small or medium-sized company, or you simply don’t have yet a large set of data, you can anyway achieve good results with the help of Cogniflow 🚀.