From Black Box to Transparency by Design for AI/ML

daniel.levyE4REG · ‎06-23-2023

Daniel Levy, Corporate Counsel, Data Use at Autodesk

Volha Samasiuk, Director, Sr. Privacy & Data Use Counsel at Autodesk

Transparency around data and digital tools is fundamental for customer trust. This general principle is not controversial. But what does transparency mean exactly in the context of AI/ML? What laws and standards apply? How much transparency is appropriate for users to understand the impact of AI/ML systems in their work or personal lives? How can transparency be implemented effectively?

For example, when you buy a product, the algorithm may start serving you ads for similar or complementary products because other people who bought your product also bought those other products. A standard (but probably hard to read) privacy notice might be a commonly accepted level of transparency for this. But what should transparency look like for an AI/ML system that might recognize a face or an object within in an image? There is a level of complexity, e.g., related to data acquisition, training and AI/ML methods used, that would require technical explanation and that most people are not yet comfortable with.

In this blog post, we look at transparency requirements from a privacy perspective and compare to what transparency might mean for AI/ML as that technology takes the spotlight and new regulations try to catch up (yes, they are coming).

Image provided by Autodesk

In the data privacy realm…

Modern transparency requirements originate from the General Data Protection Regulation (GDPR), which came into force in 2018 and was enacted to protect personal data of individuals in the EU. Of course, data privacy regulation is no longer limited to GDPR. US States, for example, now have additional disclosure or notice requirements, and global companies need to comply with them all.

These requirements manifest primarily in privacy notices and follow a standard formula, including the type of personal information a company collects, how the company uses that information, how and why the company discloses that information, and individual rights with respect to their personal information. You are probably familiar with privacy notices (aka privacy policies), or at least clicking through without reading them.

The well-known challenge with creating a comprehensive privacy policy noting every potential type of data use across a company, because it is legally required, is that the policy must also be understandable, because that is also a legal requirement. For example, GDPR says they must be “concise, transparent, intelligible, and easily accessible” and use “clear and plain language.”

The consequences of not maintaining this balance can be costly. And yet, privacy policies have always been and still are hard to read, see here, here, and here (from 1996 to 2021, finding “evidence for increasing policy bloat and decreasing readability, in particular after the introduction of recent privacy regulations.”).

Tl;dr – It is difficult, even (especially?) for lawyers, to write both comprehensive and understandable privacy policies.

Transparency gets more complicated with AI/ML

More transparency is and likely will be required because of the complexity of AI/ML systems, scale of data collected to train them, additional risks like bias and safety, and the potential impact on individuals, communities, and society as a whole. Notably, transparency for AI/ML systems is not limited to disclosure about data used for these systems (aka input) but requires detailed information about how the AI/ML systems work and how they generate decisions, insights, content, and other output.

When personal data is used in algorithms, GDPR already states that a company must provide “meaningful information about the logic” of any automated decision-making using personal data. “Automated decision-making” does not cover all applications of AI/ML, just cases where decisions are solely automated (i.e., no human oversight) and which produces “legal effects” or “similarly significant” effects (please consult your legal team if you are trying to determine if this applies to your project). US State laws have or will have similar rules.

New transparency requirements on horizon?

The EU is set to adopt its comprehensive regulation of AI that is likely coming into force in the next year or two. The proposed EU AI Act would classify AI systems by risk, with the strictest compliance obligations related to “high risk” AI systems, and mandate various technical and organizational measures for companies developing and deploying AI systems.

The draft EU AI Act (as amended by the European Parliament, the final version may change during the Trilogue process) defines high risk AI systems as (i) AI systems used in a regulated product or as a safety component of a regulated product covered by EU laws; and (ii) AI systems that fall under specified critical areas or use cases (e.g., biometric and biometrics-based systems, employment and worker management, education, critical infrastructure, etc.) if they pose a significant risk of harm to people’s health, safety, fundamental rights, or the environment.

The draft has detailed transparency requirements for such high risk AI systems, including apparent technical requirements. For example, high risk AI systems “shall be designed and developed” to be sufficiently transparent and to enable providers and users to “reasonably understand the system’s functioning.” The draft also refers to the use of “all technical means available in accordance with the generally acknowledged state of art” to ensure that the AI system’s output is interpretable by the provider and the user.

As far as non-technical measures for high risk AI systems, the draft requires concise, correct, clear, and “to the extent possible” complete information that is also reasonably relevant, accessible, and comprehensible to users. The information needs to specify the “characteristics, capabilities, and limitations of performance” including the intended purpose, the level of accuracy, robustness, and cybersecurity against which the high risk AI system has been tested and validated, any clearly known or foreseeable risk to the health and safety, fundamental rights or the environment, and other details (see Article 13).

For systems that are not deemed high risk, the transparency requirements are not as thorough. But AI systems that interact with human users must still inform the user that it is an AI system “in a timely, clear and intelligible manner, unless this is obvious from the circumstances and the context of use,” which may trigger customer questions or trust issues.

The Parliament also introduced additional transparency requirements for providers of foundation models, including an obligation to prepare technical documentation and intelligible instructions to enable downstream providers to comply with their obligations under the EU AI Act. Providers of generative foundation models, like GPT, will be required to disclose that the content was generated by an AI system and publish summaries of the use of training data protected under copyright law.

Industry standards and best practices

Established and emerging industry principles and standards also reflect transparency requirements for AI systems. For example, the NIST AI Risk Management Framework (RMF), released on January 26, 2023 as a fully voluntary and flexible framework, and the companion RMF Playbook include principles of transparency, explainability, and interpretability. In the RMF, transparency means providing the appropriate level of information about topics such as design decisions, training data, the structure of the model, and intended use cases. Transparency should consider human-AI interaction, and developers and deployers of AI systems should consider different types of transparency tools to ensure the system is used as intended.

Explainability refers to being able to answer how a decision was made by an AI system, and interpretability refers to being able to give meaning or context to a user about why a decision was made. Together, explainability and interpretability allow operators and users of a system to understand its function and output, thereby enhancing trust in the system.

Companies like Microsoft, Google, and IBM are leading development of industry best practices on AI/ML transparency (e.g., see Microsoft’s Transparency Notes, Google’s Data Cards and Model Cards, and IBM’s AI FactSheets 360).

The next level of the transparency problem

As regulators are trying to catch up with recent developments, technologies continue to get more advanced and sophisticated. Before the emergence of Open AI’s GPT-4 and other generative AI systems, AI/ML already had a black box problem from a transparency point of view. Now, as generative AI systems and foundation models that use very large amounts of data have become prominent, we are starting to see a new dimension to the black box problem: emergent behavior. Large models are not only using more data and getting better at specified tasks than smaller models, but they are displaying unexpected “emergent” behaviors that seemingly have little to do with their original purpose. For example, large language models writing and executing code or playing games.

Engineers and researchers themselves are surprised and trying to understand how and why these additional emergent capabilities occur. Some researchers are calling for more transparency and explainability. Others note that explainable and interpretable AI/ML models may make very complex AI/ML systems less effective because the more accurate an explanation is, the harder it will be to understand (similar to the privacy policy problem, but leveled way up). But a black box is also not going to be acceptable in most cases.

Where should we go from here?

As complex AI/ML solutions are developed, transparency cannot be an afterthought left solely to formulaic written materials presented in privacy policies or other transparency notices. To be sure, some written explanation is and will be necessary. But the pace of development of new capabilities, both intended and unintended, demands a holistic approach to transparency. This is going to be critical not just from a minimum compliance perspective, but more importantly from a customer trust point of view.

Key takeaways:

Start thinking about transparency for your AI/ML project early in the design phase as you would about other technical or program requirements. What would users like to know about the system and how it generates decisions, insights, or recommendations? What tools could help them understand its function and output?
Consult with your legal team to identify existing and upcoming global regulations around transparency applicable to your particular use case. Consider using industry standards and practices where law doesn’t provide clear requirements.
Continue discussing what transparency should look like for your project with a broad group of stakeholders, incl. UX designers and ethicists. Some companies have already established ethics or Responsible AI review boards to address these novel issues.
Consider your audience, impact, and complexity of your AI/ML system when designing and developing technical transparency measures (e.g., built-in transparency tools) and/or non-technical measures (e.g., written or video explanations of how AI systems works and makes decisions, guidance on human oversight).
Ask for user feedback during the AI/ML system beta testing phase and adjust accordingly.

The Authors:

Daniel Levy, Corporate Counsel Data Use

Volha Samasiuk, Director, Senior Privacy and Data Use Counsel