This study analyses English and Welsh charity financial data from the Charity Commission's Public Register of Charities, specifically submissions from five years of recent annual return cycles (2020 to 2024). The data extract was downloaded on XX April XXXX.
Of the public register, three data extracts are used: Charity Annual Return History, Charity Annual Return Part A, and Charity Annual Return Part B. From the Charity Annual Return History extract 'organisation_number', 'registered_charity_number', and 'charity_name' were selected. From the Charity Annual Return Part A extract 'organisation_number', 'ar_cycle_reference', 'total_gross_income', 'total_gross_expenditure', 'income_from_government_contracts', and 'income_from_government_grants' were selected. From the Charity Annual Return Part B extract 'organisation_number', 'ar_cycle_reference', 'income_donations_and_legacies', 'income_other_trading_activities', 'income_charitable_activities', 'income_investments', 'income_other', 'income_endowments', 'reserves', 'funds_endowment', 'funds_unrestricted', 'funds_restricted', and 'funds_total' were selected.
Charity Annual Return Part A and Charity Annual Return Part B were inner joined on 'organisation_number' and 'ar_cycle_reference' before inner joining the merged dataset to Charity Annual Return History on 'organisation_number'. The combination of Part A and Part B data is methodologically significant beyond its role in data validation. Part A records income as classified by statutory funding relationships — government contracts and grants as distinct instruments — while Part B records income as classified by the organisation according to its operational purposes under SORP. Where these classifications differ for the same organisation, the gap captures the institutional hybridity that single-source studies cannot observe. Specifically, the degree to which government-commissioned income has been absorbed into an organisation's core operational identity as charitable activities income rather than recorded as a distinct statutory funding relationship. The government axis therefore measures statutory dependence as funders classify it, while the market axis captures delivery income as organisations classify it. The structural gap between these two representations, visible only through the Part A and Part B combination, is the empirical marker of what Billis (2010) calls entrenched hybridity.
Any charity with less than £500,000 'total_gross_income' in any annual return cycle was excluded from the dataset, producing an initial sample of 21,479 charities. The £500k threshold was chosen for both regulatory and theoretical reasons. In regulatory terms, it aligns with Charity Commission requirements for full SORP compliance. In theoretical terms, it identifies organisations with sufficient strategic capacity to make deliberate funding portfolio decisions rather than relying on opportunistic resource acquisition. This threshold introduces a risk of selection bias; however, it is defensible insofar as organisations below this level typically lack the professional infrastructure, dedicated fundraising staff, and strategic planning processes required for portfolio management behaviour. The threshold also ensures institutional visibility as charities above £500k income are more likely to interact with similar donors, face comparable regulatory demands, and compete in overlapping resource markets, thereby occupying a shared institutional environment suitable for meaningful archetype comparison.
To validate the accuracy of income data between Part A and Part B, any charities with a variance between 'total_gross_income' in Part A and the sum of 'income_donations_and_legacies', 'income_other_trading_activities', 'income_charitable_activities', 'income_investments', and 'income_other' was excluded. Similarly, to validate accuracy of fund composition filings within Part B, any charity with a variance between 'funds_total' and the sum of 'funds_restricted', 'funds_unrestricted', and 'funds_endowment' was excluded. This produced a refined dataset of 20,695 charities (3.6% exclusion rate, n=784).
After filtering the refined dataset to include only submissions for the annual return cycles between 2020 and 2024, including only those with a submission in each of the five cycles, the final dataset was reduced to 5,503 charities.