Mapping Social Enterprises at Scale

Mapping Social Enterprises at Scale

February 2026

AI-powered extraction and classification of 62,088 CIC incorporation documents reveals what social enterprises do and who they serve

1. Background

Community Interest Companies (CICs) are a legal form designed for social enterprises that want to use their profits and assets for the public good. When a CIC is registered with Companies House, its founders must submit a CIC36 (or CIC37) incorporation form describing the company's intended activities, the communities it will benefit, how those communities will benefit, and how any surplus will be used. These four free-text fields represent a uniquely richโ€”and largely untappedโ€”source of data on what social enterprises set out to do and who they set out to serve, written in their founders' own words at the point of formation.

62,088 CIC incorporation documents extracted
2005โ€“2025 Coverage period
4 Free-text fields per document

Using AI-powered text extraction, we have processed CIC incorporation PDFs filed at Companies House since the CIC form was introduced in 2005. The chart below shows the proportion of CICs registered each year for which we have successfully extracted incorporation documents, benchmarked against the total CIC population recorded in the CSO Spine database. Coverage exceeds 80% for most years, with lower rates in the earliest years (fewer PDFs available digitally). Coverage dips in 2022 (to 51%) due to an incomplete batch of PDF downloads from Companies Houseโ€”the extraction pipeline itself processed over 99% of available documents. Additional PDFs are being retrieved to close this gap.

Extraction coverage by year
Proportion of CICs registered each year with extracted incorporation documents, coloured by government in office. Denominator: CSO Spine.

2. Classification Methodology

The analysis pipeline has four stages. First, we download the full corpus of CIC36/37 incorporation PDFs from Companies House via their bulk data API. Second, each PDF is processed through OpenAI's GPT-4o-mini model using structured extraction prompts, which reliably separates the four free-text fieldsโ€”activities, beneficiaries, community benefit, and surplus useโ€”from the form layout, handling variations in formatting, handwriting, and scan quality. Third, we classify each extracted text against an extended version of the UK Charity Activity Taxonomy (UKCAT), a hierarchical classification system originally developed for the charity sector. We have augmented UKCAT with 14 new beneficiary categories designed specifically for CIC language (covering groups such as local communities, people with mental health needs, those in poverty, carers, and civil society organisations). Classification uses pre-compiled regular expression patterns, matching against both the beneficiary field alone (โ€œspecificโ€ match) and a combined text of all four fields (โ€œbroadโ€ match) to maximise coverage. Fourth, we perform statistical analysis of the classified data, examining activity sectors, beneficiary populations, and geographic and temporal patterns across two decades of CIC formation.

1. Collect CIC Incorporation PDFs Source: CIC36/37 forms from Companies House (2005โ€“2025) 2. AI-Powered Text Extraction Model: GPT-4o-mini structured extraction of 4 text fields 3. Classify Using Extended UKCAT Method: 268 regex patterns from UK Charity Activity Taxonomy + 14 CIC-specific beneficiary categories (new) 4. Statistical Analysis Activity sectors, beneficiary populations, geographic and temporal patterns across 20 years of CIC formation Companies House API Bulk PDF download 4 Structured Fields Activities โ€ข Beneficiaries โ€ข etc. Classification Coverage Activities: 99.5% Beneficiaries: 97.1%

Classification Coverage

Activity sectors
99.5%
Beneficiary groups (specific)
88.4%
Beneficiary groups (broad)
97.2%

3. Activity Sectors

CICs span a wide range of activity sectors, with health, education, and training consistently among the most common. The charts below show the five activity sectors whose share of new CIC registrations has risen the most (left panel) and declined the most (right panel) over the past two decades, while the bar chart summarises the overall distribution.

Activity sector trends over time
Activity sectors with the largest increase (left) and decrease (right) in share of CIC registrations between 2005 and 2025. Dashed lines mark changes of government.
Top 10 activity sectors
Top 10 activity sectors across all CIC registrations.

Most Distinctive Activity Sector by Region

The tile map below highlights what makes each region different. For each region, we compare the share of CICs in a given activity sector to the national average. The sector shown is the one that is most over-represented locallyโ€”i.e., the sector where CICs in that region are most disproportionately concentrated compared to the UK as a whole. The two most common national categories are excluded to surface genuinely distinctive regional patterns.

SC Horses NI Religion NE Umbrella bodies NW Playground YH ESOL WA Languages WM Victim support EM Adult day care EE Child abuse SW Wildlife SE International develoโ€ฆ LO Social Investment Horses Umbrella bodies Playground ESOL Adult day care Victim support Child abuse Social Investment International development Wildlife Languages Religion

4. Beneficiary Populations

Who do CICs set out to serve? The beneficiary analysis reveals that the general public and local communities are the most commonly named beneficiaries, followed by people with mental health needs and those experiencing poverty or disadvantage. The charts below show the five beneficiary groups whose share has risen the most (left panel) and declined the most (right panel) over the past two decades.

Beneficiary group trends over time
Beneficiary groups with the largest increase (left) and decrease (right) in share of CIC registrations between 2005 and 2025. Dashed lines mark changes of government.
Top 10 beneficiary groups
Top 10 beneficiary groups across all CIC registrations (specific match on beneficiaries field).

Most Distinctive Beneficiary Group by Region

Using the same approach, this map shows the beneficiary group that is most over-represented in each region relative to the national average. This highlights which populations are disproportionately served by CICs in different parts of the UK, after excluding the two most common national categories.

SC Artists / creative pโ€ฆ NI People with learningโ€ฆ NE People with learningโ€ฆ NW People with substancโ€ฆ YH People with long-terโ€ฆ WA Young children WM People experiencing โ€ฆ EM People experiencing โ€ฆ EE Young children SW Young children SE LGBTQ+ LO Widows; widowers andโ€ฆ Artists / creative practitโ€ฆ People with learning disabโ€ฆ People with substance misuโ€ฆ People with long-term healโ€ฆ People experiencing homeleโ€ฆ Young children Widows; widowers and orphans LGBTQ+

5. Next Steps

This analysis demonstrates what is possible when CIC incorporation data is extracted and classified at scale. Several avenues for further research and collaboration are available:

  • Topic modelling: Using embedding-based topic models (BERTopic) to discover thematic clusters in CIC activities beyond the UKCAT taxonomyโ€”revealing emerging sectors and niche specialisations that predefined categories miss.
  • Financial linkage: Joining CIC incorporation data to annual accounts filed at Companies House, enabling analysis of income, expenditure, and financial sustainability by activity sector and beneficiary group.
  • Survival analysis: Tracking which CICs remain active over time and identifying factors associated with longevity or dissolutionโ€”particularly by sector, region, and era of formation.
  • Comparison with charities: Comparing CIC activity and beneficiary profiles with those of registered charities to understand how the two legal forms serve different (or overlapping) populations and sectors.
  • Geographic deep dives: Detailed analysis of CIC formation patterns at local authority level, examining the relationship between CIC activity and local deprivation, funding landscape, and existing civil society infrastructure.
  • Longitudinal text analysis: Tracking how the language CIC founders use to describe their missions has evolved over 20 yearsโ€”detecting emerging terms, shifting priorities, and responses to policy changes.

Appendices

A1. Sample Narratives

Ten randomly selected examples of the activities and beneficiaries fields extracted from CIC36/37 incorporation forms. These illustrate the range of language, length, and specificity found in the dataset.

Company NumberYearActivitiesBeneficiaries
100851742016General; running a Gymnastics Academy with a strong community focus. Provide gymnastics to the local community to improve overall fitness, health and well being.We aim to use the numerous disciplines within gymnastics to increase physical activity to the community. In particular, the company's activities will be carried out for the benefit of the residents of Amber Valley and the surrounding area.
123481752019To provide for and promote an essential fellowship under God for those engaged in ministry in the unique environment of civil aviation. To provide a continuing exchange of experience and insights to enhance the fulfilment of our task. To develop our understanding of how civil aviation functions, its...The community is those who are engaged in the activity of being a chaplain in an airport.
131730772021To provide technological equipment and support to anyone of any age or background without access to it To deliver devices at no cost to those who need them in West LondonThe company's activities will provide benefit to the people who are affected by inequalities in access to technology and the internet, otherwise known as the digital divide, in West London
SC4959952015Enable and facilitate networking between the community organisations that form our membership Provide services to support our members to work more effectively to achieve their aimsCommunity organisations across Scotland that are members of SCCAN
160375772024Our organisation's activities include but not limited to the following: Support Services: The business will be providing support to vulnerable adults in the community. We will engage trained and competent personnel who will be visiting selected people and assist them with tasks they cannot do for th...To vulnerable adults and those with special needs and other mental health issues. The individuals would receive support with their shopping, hospital appointments and other activities of daily living.
1575212120243 Bringing individuals closer to Christ through community outreach. 3 Teaching the Word of God in its purest, unaltered form. 3 Supporting less affluent communities with essentials like food, education, and healthcare. 3 Providing vocational training and life skills workshops for sustainable liv...Christ Restoration Ministries' activities will provide benefits to the community by fostering spiritual growth and communal well-being through a series of focused initiatives. The ministry's dedication to bringing individuals closer to Christ, coupled with teaching the Word of God in its unaltered f...
124147582020Cinema exhibition Cinema-related events Cinema-related education and engagement eventsThe company's activities will provide benefit to persons living in and around Penarth, Vale of Glamorgan, through the provision of a broad programme of cinema screenings and events at Penarth Pier Pavilion. The cinema operation that was running at the venue closed in 2017 to the disappointment of th...
085060402013The company is being set up to generate funds for charities and create employment, training and volunteering opportunities by sourcing, repairing and selling donated hand-held electronic devices.Charities in the UK and create training, volunteering and employment opportunities for UK residents.
099055942015General: To set up a Forest Garden Project with a strong community focus and to create a teaching space to preserve traditional crafts and knowledge to support the building of a more sustainable community. Teaching traditional crafts Green woodwork, coppicing, timber framing, cob, felt making, baske...The company's activities will provide benefit to people of all ages who want to creatively engage in the countryside and learn more about forest gardening and our woodland heritage. In particular, the company's activities will be carried on for the benefit of adults, children and young people from ...
146752742023General: running a community with a strong community focus and providing services to local residents. Reducing food waste from suppliers and supermarkets which will be provided for local residents. Food (including eggs, vegetables, fruit, cakes, pies, bread as well as toiletries, pet food and washin...The company's activities will be carried on for the benefit of residents of Blackburn with Darwen the surrounding area.

A2. Activity Tag Frequencies

All UKCAT activity tags matched in the dataset, ordered by frequency. Percentages are calculated over all 62,088 CIC incorporations.

TagCount% of CICs
General public / local community31,02550.0%
People with mental health needs27,19743.8%
Health26,35642.4%
Education24,60739.6%
Civil society organisations24,09138.8%
Training21,84835.2%
People in poverty / disadvantaged21,53734.7%
Young people19,67131.7%
Unemployment17,86228.8%
Children17,75328.6%
Employability training17,40028.0%
Mental health17,29227.9%
Families16,73727.0%
Schools15,24224.5%
Individual poverty13,50121.7%
Arts11,38018.3%
Loneliness10,71017.2%
Accommodation10,19316.4%
People with disabilities10,06116.2%
Businesses / small enterprises9,94116.0%
Volunteering9,32115.0%
Sports8,95714.4%
Health and wellbeing8,37113.5%
Racial; ethnic or national communities8,35713.5%
Social activities7,98312.9%
Charity and VCS support7,78412.5%
Students / learners7,77312.5%
Food7,49412.1%
Parents and guardians7,46312.0%
Exercise and fitness7,10711.4%
Mentoring6,85411.0%
Artists / creative practitioners6,34810.2%
Music6,29110.1%
Women6,21010.0%
Literature5,2218.4%
Older people5,1548.3%
Research5,0468.1%
Unemployed / workless5,0138.1%
Recreation4,8247.8%
Associations4,1956.8%
Visual arts4,1686.7%
Crime and Justice4,0366.5%
Carers3,9166.3%
Fundraising3,7916.1%
People with learning disabilities3,6005.8%
Grant making3,5875.8%
Further education3,5085.7%
Advice and individual advocacy3,4925.6%
Victims of abuse / domestic violence3,4505.6%
Homelessness3,4215.5%
Housing3,3575.4%
People experiencing homelessness3,2535.2%
Counselling and therapy3,1295.0%
Green space3,1125.0%
Heritage3,0714.9%
Addiction and dependency3,0664.9%
Rural and farming areas2,9844.8%
People with substance misuse issues2,8474.6%
Conservation and sustainability2,7644.5%
Theatre2,7004.3%
History2,6854.3%
Ex-offenders / criminal justice2,6634.3%
Dance2,6574.3%
Higher education2,6414.3%
Social enterprise2,6344.2%
Abuse2,6314.2%
Young children2,5654.1%
Economic development2,5254.1%
Festival2,5244.1%
Charity shops2,5134.0%
People with long-term health conditions2,4353.9%
Policy campaigning and advocacy2,3353.8%
Offender support and rehabilitation2,2203.6%
Animals2,1193.4%
Community development2,1053.4%
Film2,0113.2%
Social club1,9403.1%
Asylum seekers and refugees1,9353.1%
Domestic abuse1,8733.0%
Social care1,7622.8%
Gardening1,7472.8%
Religion1,7082.8%
Community centre1,6562.7%
Out of school club1,6312.6%
Science1,5882.6%
Christianity1,5582.5%
Girls1,5452.5%
Equality and diversity1,5322.5%
Migrants1,4932.4%
Clothes1,4862.4%
Television1,4512.3%
Outdoor pursuits1,3702.2%
Urban areas1,3612.2%
Men1,3412.2%
Youth Groups1,3352.2%
Historical conservation and restoration1,3342.1%
Recycling1,3252.1%
Hospital1,3152.1%
Climate Emergency1,3082.1%
Emergency services1,2962.1%
Community cafe1,2552.0%
Nursery1,2512.0%
Community association1,2402.0%
Food banks1,2302.0%
Secondary education1,1981.9%
Wildlife1,1921.9%
Museum1,1751.9%
Performing art1,1491.9%
Childcare1,1041.8%
Residential care1,0991.8%
LGBTQ+1,0721.7%
Citizenship9901.6%
Health condition9641.6%
Bereavement9531.5%
Dementia8971.4%
Umbrella bodies8811.4%
Choirs8741.4%
Children in care8431.4%
Respite8261.3%
Radio7831.3%
Open spaces7271.2%
Vocational training7171.2%
Maternity7091.1%
Armed forces6521.1%
Church or place of worship6491.0%
Healthcare provider6301.0%
Energy6291.0%
Society6261.0%
Cancer6011.0%
Primary education5911.0%
Student support5840.9%
Adult education5830.9%
Media5420.9%
Temporary or emergency housing5360.9%
Nursing5280.9%
Hobbies5160.8%
Grants to organisations5090.8%
Horses5000.8%
Women's Institute4720.8%
Religious; racial or cross-border harmony4500.7%
IT and digital4420.7%
Basic skills4350.7%
Human rights4260.7%
Sexual abuse4220.7%
Hearing loss4120.7%
Islam4090.7%
Print media4040.7%
Dogs4030.6%
ESOL3810.6%
Racial justice3770.6%
Carer support3740.6%
Community transport3630.6%
Prevention and safety3630.6%
Road safety3410.5%
Languages3390.5%
Visual impairment3320.5%
Healthcare workers3160.5%
Philosophy3140.5%
Playing fields3110.5%
Housing association3100.5%
Medical research3020.5%
Humanitarian relief2920.5%
Surgery2890.5%
Village hall2780.4%
Conflict resolution2760.4%
Religious ministry2450.4%
Scouting2410.4%
Domiciliary care2400.4%
International development2310.4%
Adult day care2150.3%
Playgroup2150.3%
League of Friends2090.3%
Widows; widowers and orphans2080.3%
Refuge or shelter2020.3%
Palliative care1970.3%
Hospice1970.3%
Strokes1950.3%
Victim support1830.3%
Parent teacher1770.3%
Child abuse1760.3%
Playground1740.3%
Trafficking and modern slavery1730.3%
Search and rescue1660.3%
Water1610.3%
Religious activities1510.2%
Veterans1440.2%
Complementary therapies1420.2%
Physiotherapy1400.2%
Ambulance service1310.2%
Army1280.2%
Religious education1260.2%
Democracy1180.2%
Children's homes1160.2%
Musical theatre1120.2%
Miners1030.2%
HIV / Aids990.2%
Archaeology970.2%
School fundraising940.2%
Multiple Sclerosis930.1%
Social Investment820.1%
Orchestra730.1%
Emergency service workers730.1%
Youth centre720.1%
Cats700.1%
Girlguiding700.1%
Hinduism690.1%
Sikhism560.1%
Cadets560.1%
Opera520.1%
Judaism500.1%
Sickle Cell490.1%
Healthcare provider support480.1%
Cerebral palsy450.1%
School support450.1%
Monuments; statues and memorials440.1%
Alternative medicine410.1%
Cemetery380.1%
YWCA / YMCA350.1%
Buddhism340.1%
Chaplaincy320.1%
Fibromyalgia310.0%
Navy310.0%
Natural history290.0%
RAF280.0%
Chronic Fatigue Syndrome220.0%
Residential care with nursing210.0%
Donkeys210.0%
Service clubs200.0%
Church of England190.0%
University of the Third Age190.0%
Saving of lives180.0%
Clergy180.0%
Roman Catholic170.0%
Rotary club140.0%
Grants to individuals130.0%
Student union120.0%
Motor Neurone Disease100.0%
Riding for the disabled100.0%
Planning and architecture80.0%
Friends of healthcare provider70.0%
Parochial Church Council40.0%
Society of Friends (Quakers)40.0%
Lions club40.0%
Benevolent Society40.0%
Jainism20.0%
Almshouse20.0%
Spiritualism20.0%
Fraternal societies10.0%
Jehovah's Witnesses10.0%

A3. Beneficiary Tag Frequencies

All UKCAT beneficiary tags matched in the dataset, ordered by frequency. Percentages are calculated over all 62,088 CIC incorporations.

TagCount% of CICs
General public / local community20,47933.0%
People with mental health needs14,88124.0%
People in poverty / disadvantaged14,12522.7%
Young people13,57821.9%
Civil society organisations12,27719.8%
Children11,91919.2%
Families10,14316.3%
People with disabilities7,32411.8%
Racial; ethnic or national communities6,40310.3%
Women4,4057.1%
Businesses / small enterprises4,3287.0%
Parents and guardians3,7806.1%
Students / learners3,5685.7%
Artists / creative practitioners3,3885.5%
Older people3,0574.9%
Unemployed / workless2,8874.6%
People with learning disabilities2,6904.3%
Carers2,3503.8%
People experiencing homelessness2,1513.5%
Ex-offenders / criminal justice1,8052.9%
Victims of abuse / domestic violence1,7382.8%
People with substance misuse issues1,5782.5%
Asylum seekers and refugees1,4162.3%
People with long-term health conditions1,3652.2%
Young children1,2662.0%
Migrants9841.6%
Girls9691.6%
Men9481.5%
LGBTQ+8801.4%
Widows; widowers and orphans1380.2%
Riding for the disabled50.0%