QA Financial Forum New York | 15 May 2024 | BOOK TICKETS
Search
Close this search box.

iceDQ plans capital raise for growth in data testing

sandesh-gawande-1703167799

Connecticut-based Torana is planning a venture capital funding round in 2024. As its website home shouts out, Torana’s iceDQ platform does not do application testing – it does automated testing of data for data-centric projects, such data migration, data warehousing and the management of big data.

Financial firms are a key customer base, and iceDQ users include Fidelity, E* Trade and Wells Fargo. We spoke to Sandesh Gawande, founder and CTO, about how he started iceDQ, and where it’s headed from here, including plans for LLM-based data testing.

Q: How did your career path take you to iceDQ?

I spent more than 25 years tackling data engineering projects for various banks and insurance companies like Deutsche Bank, XL Capital, JP Morgan and Nomura.

My experience with numerous data projects exposed the glaring inefficiencies of traditional data testing methodologies. The ETL [extrat, transform and load] developers would develop data pipelines and say “everything is working fine”, but I didn’t have a good way of testing them before deploying to production. This led me to ask a crucial question: how can we implement automated, comprehensive testing across data pipelines to ensure consistent data quality?

Witnessing firsthand the lack of automated data testing during my days as a data architect, I seized the opportunity to address this gap in 2008 with the launch of iceDQ.

Q: How do you sum up the iceDQ proposition?

iceDQ is a unique data reliability platform that provides integrated data testing, monitoring, and observability capabilities throughout the DataOps cycle.  We believe bad data processes will result in bad data. Hence, the companies must do both QA of data processes during development and QC of data in production.

Q: How does the iceDQ platform work to test data?

Application testing mostly works using screens. You make an action on a screen and then you replay it in your testing software to see what happens. Unfortunately, data processes don’t have screens, instead they have tasks that run in the background. This also means that conventional QA tools, for example Selenium cannot be used in data centric projects such as data migration, data warehouse and big data.

While designing iceDQ we had to go back the basics – how to identify the expected vs actual in a data centric project?

We realized that the output data generated is the actual value and the input data plus the transformation rule will provide the expected values. In some cases, the expected values are simply based on business rules provided by the users.

In iceDQ we provided the testers the ability to create data reconciliation and validation rules. This enabled users to test billions of records without sampling. Since the expected and actual values could be in different databases or files, we provided cross platform connectivity.

In a data reconciliation use case, if you received a million transactions, the number of loaded transactions must be million. Not more – not less. Similarly, a data validation use case might be that the discount cannot exceed 100%.

These testing rules can be embedded in a DevOps pipeline of large financial firms. You might have a developer who develops the ETL code, but we will not allow him or her to deploy the process until the tests are run in Jenkins and pass. And then some of these rules are embedded in the system that you are using to monitor and observe your processes for data reliability.

Q: And your customers’ compliance requirements are critical to all this?

Absolutely. Financial firms have many regulatory and compliance requirements, especially around data.   There are GDPR requirements for privacy; then there are the BCBS-239, FINRA, and other rules for data accuracy… the list goes on and on. Fundamentally, financial institutions must provide proof of data testing for every data project that was developed and proof of data consistency every time the data transfer or processing occurs.

And this is all complicated testing requirements with firms that are merging data because of M&A or internal reorganization.

The key thing is that iceEDQ is technology agnostic and business agnostic. And that is critical for data management in financial services. For example, you may be taking end of day stock prices from multiple sources. Without testing how do you ensure that there are no price differences?

Financial firms may have data in different formats in different proprietary databases or in a spark cluster, for example. But the key thing is how you validate the data based on auditing with rules, and independent of data processing. The testing may be done on processing requirements but many times they are also based on business rules that are “universal truths” for your institutions.

Q: There have been some glaring examples of banks that have had major problems with data transitions. For example, in 2018 when TSB in the UK was transitioning its customer data to its new owner, Sabadell, the Spanish bank. There was a huge failure which meant TSB’s online services went down for weeks.

Yes, I’m not familiar with the intricacies of that example but the root cause of these problems is often that there was no certification and validation of the data merger.

Why? Often because they don’t have the expertise because many of the key people come from application testing. They might have the tools to manage many millions of data records, but they didn’t have the ability to create business rules.

Some of the reasons identified for the TSB migration failures were, inaccurate data mapping, data transfer errors, data corruption, system integration issues, data validation errors, testing shortcomings, quality control lapses, lack of contingency plans, duplication of data and inadequate data validation and testing.

I’ve been involved in many data mergers, and you need multiple teams because the volume of the data requires that. TSB’s most of the data related issues would have been solved with automated data testing at scale.

Data has been getting more important, but the testing of data has not kept with the pace of time. There is mobile testing, there is application testing and there is API testing, but how many companies have separate data testing? It’s a big issue. Very few C-level executives really think about data testing or the automation of data testing.

Q: So, is iCEDQ a unique solution in that respect? Who do you compete with?

We are unique as a vendor solution, I believe. We compete with in-house data testing teams, where they exist, but most of those we come across are scripting manually. They are not automating. And then in terms of vendors, we often got confused with the data management tools, and there are many of those. A lot of our time is spent educating customers about data testing.

Q: And of course, the reason you will be looking for VC investment is to finance growth? How will you do that?

Yes, we have 135 staff right now, with around 120 in India and 15 in the US. We want to grow and are working on developing further functionality for monitoring and observability within the existing platform. And we’re also putting a lot of work into how we can use LLMs to predict and write the rules for testing. The vision is that customers will be able to provide their data mapping documents and the platform will generate the rules and tests automatically; and then adjust the rules dynamically.  This will be the next generation of the product. We’ll also be looking for more technology partnerships with other vendors and systems integrators.