Research Record
Detecting First-Party to Third-Party Data Leaks in Android Apps Using PIT
Using PIT to trace how user-provided data can move from trusted Android app interfaces into third-party SDKs and external endpoints.
Modern Android applications are rarely self-contained. Most rely on third-party SDKs for analytics, advertising, authentication, and user experience features. While this accelerates development, it also introduces a serious risk: sensitive user data may flow from the app to external parties without the developer fully understanding how or why.
This work explores that problem using a static analysis tool called Personal Information Tracker (PIT).
What PIT Does
PIT is built on top of FlowDroid and focuses specifically on GUI-based data leaks.
Instead of only tracking generic sources like device identifiers or system APIs, PIT looks at user-provided data from interface elements such as:
- EditText
- CheckBox
- RadioButton
- Spinner
It then performs taint analysis to determine whether this data flows into sinks such as:
- Network requests
- Files
- Databases
- Logs
What makes PIT particularly interesting is that it does not just say “data leaked.” It identifies which specific UI element the data came from, linking user interaction directly to the leak.
Why This Matters
Most discussions of Android data leaks focus on system-level data such as location or device identifiers. But a large portion of sensitive information comes directly from users:
- Names
- Emails
- Travel details
- Preferences
When this data is entered into an app, users assume it stays within that app’s intended functionality. In reality, it can be routed through layers of third-party code before being transmitted elsewhere.
This creates a first-party to third-party leak:
- First party: the app the user trusts
- Third party: an external SDK or service receiving the data
The key issue is that developers often do not have full visibility into these flows.
Methodology
The analysis process follows a static pipeline:
- Decompile the APK
- Run PIT to identify tainted flows from UI sources
- Inspect the reported paths from source to sink
- Determine whether the sink belongs to first-party code or a third-party SDK
This allows for identifying cases where user input is not just processed locally, but transmitted externally through embedded libraries.
Because PIT is static, it does not require executing the app. This makes it scalable across many applications, though it requires careful interpretation of results to avoid false positives.
Findings
Even with limited analysis time, multiple applications were found to exhibit clear data flows from user input fields to third-party endpoints.
A particularly striking example involves the American Airlines Android app. When a user books a flight and later checks their flight status, flight-related data is transmitted to a third-party service.
This is not an intentional leak in the traditional sense. It is a consequence of how modern apps are built.
The application integrates third-party SDKs that handle various functionalities. These SDKs often operate as opaque components. Developers include them for convenience, but may not fully understand:
- What data is being accessed
- When it is being transmitted
- Where it is being sent
As a result, sensitive information such as travel details can leave the application boundary without explicit awareness from the developers themselves.
Key Insight
The core issue is not just poor security practices. It is loss of control over data flows due to dependency on third-party frameworks.
Developers are effectively composing applications out of black boxes. Each SDK introduces its own behavior, and when combined, the overall system becomes difficult to reason about.
From a static analysis perspective, tools like PIT expose this hidden structure. They make it possible to trace how user input propagates through layers of abstraction and eventually reaches external endpoints.
Limitations
Static analysis alone cannot guarantee that a leak occurs at runtime. Some flows may be conditional or unused. However, the presence of a valid path from source to sink is strong evidence that the leak is possible.
This reinforces the need for combining static findings with dynamic validation in future work.
Conclusion
First-party to third-party data leaks represent a structural problem in modern Android development. They arise not from a single bug, but from the interaction between application logic and embedded SDKs.
By focusing on GUI-derived data and tracing it through the application, PIT provides a clear view into how user information can escape its intended boundaries.
As mobile ecosystems continue to rely on third-party components, understanding and auditing these data flows will become increasingly important.