Mapping Social Enterprises at Scale
AI-powered extraction and classification of 62,088 CIC incorporation documents reveals what social enterprises do and who they serve
1. Background
Community Interest Companies (CICs) are a legal form designed for social enterprises that want to use their profits and assets for the public good. When a CIC is registered with Companies House, its founders must submit a CIC36 (or CIC37) incorporation form describing the company's intended activities, the communities it will benefit, how those communities will benefit, and how any surplus will be used. These four free-text fields represent a uniquely richโand largely untappedโsource of data on what social enterprises set out to do and who they set out to serve, written in their founders' own words at the point of formation.
Using AI-powered text extraction, we have processed CIC incorporation PDFs filed at Companies House since the CIC form was introduced in 2005. The chart below shows the proportion of CICs registered each year for which we have successfully extracted incorporation documents, benchmarked against the total CIC population recorded in the CSO Spine database. Coverage exceeds 80% for most years, with lower rates in the earliest years (fewer PDFs available digitally). Coverage dips in 2022 (to 51%) due to an incomplete batch of PDF downloads from Companies Houseโthe extraction pipeline itself processed over 99% of available documents. Additional PDFs are being retrieved to close this gap.
2. Classification Methodology
The analysis pipeline has four stages. First, we download the full corpus of CIC36/37 incorporation PDFs from Companies House via their bulk data API. Second, each PDF is processed through OpenAI's GPT-4o-mini model using structured extraction prompts, which reliably separates the four free-text fieldsโactivities, beneficiaries, community benefit, and surplus useโfrom the form layout, handling variations in formatting, handwriting, and scan quality. Third, we classify each extracted text against an extended version of the UK Charity Activity Taxonomy (UKCAT), a hierarchical classification system originally developed for the charity sector. We have augmented UKCAT with 14 new beneficiary categories designed specifically for CIC language (covering groups such as local communities, people with mental health needs, those in poverty, carers, and civil society organisations). Classification uses pre-compiled regular expression patterns, matching against both the beneficiary field alone (โspecificโ match) and a combined text of all four fields (โbroadโ match) to maximise coverage. Fourth, we perform statistical analysis of the classified data, examining activity sectors, beneficiary populations, and geographic and temporal patterns across two decades of CIC formation.
Classification Coverage
3. Activity Sectors
CICs span a wide range of activity sectors, with health, education, and training consistently among the most common. The charts below show the five activity sectors whose share of new CIC registrations has risen the most (left panel) and declined the most (right panel) over the past two decades, while the bar chart summarises the overall distribution.
Most Distinctive Activity Sector by Region
The tile map below highlights what makes each region different. For each region, we compare the share of CICs in a given activity sector to the national average. The sector shown is the one that is most over-represented locallyโi.e., the sector where CICs in that region are most disproportionately concentrated compared to the UK as a whole. The two most common national categories are excluded to surface genuinely distinctive regional patterns.
4. Beneficiary Populations
Who do CICs set out to serve? The beneficiary analysis reveals that the general public and local communities are the most commonly named beneficiaries, followed by people with mental health needs and those experiencing poverty or disadvantage. The charts below show the five beneficiary groups whose share has risen the most (left panel) and declined the most (right panel) over the past two decades.
Most Distinctive Beneficiary Group by Region
Using the same approach, this map shows the beneficiary group that is most over-represented in each region relative to the national average. This highlights which populations are disproportionately served by CICs in different parts of the UK, after excluding the two most common national categories.
5. Next Steps
This analysis demonstrates what is possible when CIC incorporation data is extracted and classified at scale. Several avenues for further research and collaboration are available:
Appendices
A1. Sample Narratives
Ten randomly selected examples of the activities and beneficiaries fields extracted from CIC36/37 incorporation forms. These illustrate the range of language, length, and specificity found in the dataset.
| Company Number | Year | Activities | Beneficiaries |
|---|---|---|---|
| 10085174 | 2016 | General; running a Gymnastics Academy with a strong community focus. Provide gymnastics to the local community to improve overall fitness, health and well being. | We aim to use the numerous disciplines within gymnastics to increase physical activity to the community. In particular, the company's activities will be carried out for the benefit of the residents of Amber Valley and the surrounding area. |
| 12348175 | 2019 | To provide for and promote an essential fellowship under God for those engaged in ministry in the unique environment of civil aviation. To provide a continuing exchange of experience and insights to enhance the fulfilment of our task. To develop our understanding of how civil aviation functions, its... | The community is those who are engaged in the activity of being a chaplain in an airport. |
| 13173077 | 2021 | To provide technological equipment and support to anyone of any age or background without access to it To deliver devices at no cost to those who need them in West London | The company's activities will provide benefit to the people who are affected by inequalities in access to technology and the internet, otherwise known as the digital divide, in West London |
| SC495995 | 2015 | Enable and facilitate networking between the community organisations that form our membership Provide services to support our members to work more effectively to achieve their aims | Community organisations across Scotland that are members of SCCAN |
| 16037577 | 2024 | Our organisation's activities include but not limited to the following: Support Services: The business will be providing support to vulnerable adults in the community. We will engage trained and competent personnel who will be visiting selected people and assist them with tasks they cannot do for th... | To vulnerable adults and those with special needs and other mental health issues. The individuals would receive support with their shopping, hospital appointments and other activities of daily living. |
| 15752121 | 2024 | 3 Bringing individuals closer to Christ through community outreach. 3 Teaching the Word of God in its purest, unaltered form. 3 Supporting less affluent communities with essentials like food, education, and healthcare. 3 Providing vocational training and life skills workshops for sustainable liv... | Christ Restoration Ministries' activities will provide benefits to the community by fostering spiritual growth and communal well-being through a series of focused initiatives. The ministry's dedication to bringing individuals closer to Christ, coupled with teaching the Word of God in its unaltered f... |
| 12414758 | 2020 | Cinema exhibition Cinema-related events Cinema-related education and engagement events | The company's activities will provide benefit to persons living in and around Penarth, Vale of Glamorgan, through the provision of a broad programme of cinema screenings and events at Penarth Pier Pavilion. The cinema operation that was running at the venue closed in 2017 to the disappointment of th... |
| 08506040 | 2013 | The company is being set up to generate funds for charities and create employment, training and volunteering opportunities by sourcing, repairing and selling donated hand-held electronic devices. | Charities in the UK and create training, volunteering and employment opportunities for UK residents. |
| 09905594 | 2015 | General: To set up a Forest Garden Project with a strong community focus and to create a teaching space to preserve traditional crafts and knowledge to support the building of a more sustainable community. Teaching traditional crafts Green woodwork, coppicing, timber framing, cob, felt making, baske... | The company's activities will provide benefit to people of all ages who want to creatively engage in the countryside and learn more about forest gardening and our woodland heritage. In particular, the company's activities will be carried on for the benefit of adults, children and young people from ... |
| 14675274 | 2023 | General: running a community with a strong community focus and providing services to local residents. Reducing food waste from suppliers and supermarkets which will be provided for local residents. Food (including eggs, vegetables, fruit, cakes, pies, bread as well as toiletries, pet food and washin... | The company's activities will be carried on for the benefit of residents of Blackburn with Darwen the surrounding area. |
A2. Activity Tag Frequencies
All UKCAT activity tags matched in the dataset, ordered by frequency. Percentages are calculated over all 62,088 CIC incorporations.
| Tag | Count | % of CICs |
|---|---|---|
| General public / local community | 31,025 | 50.0% |
| People with mental health needs | 27,197 | 43.8% |
| Health | 26,356 | 42.4% |
| Education | 24,607 | 39.6% |
| Civil society organisations | 24,091 | 38.8% |
| Training | 21,848 | 35.2% |
| People in poverty / disadvantaged | 21,537 | 34.7% |
| Young people | 19,671 | 31.7% |
| Unemployment | 17,862 | 28.8% |
| Children | 17,753 | 28.6% |
| Employability training | 17,400 | 28.0% |
| Mental health | 17,292 | 27.9% |
| Families | 16,737 | 27.0% |
| Schools | 15,242 | 24.5% |
| Individual poverty | 13,501 | 21.7% |
| Arts | 11,380 | 18.3% |
| Loneliness | 10,710 | 17.2% |
| Accommodation | 10,193 | 16.4% |
| People with disabilities | 10,061 | 16.2% |
| Businesses / small enterprises | 9,941 | 16.0% |
| Volunteering | 9,321 | 15.0% |
| Sports | 8,957 | 14.4% |
| Health and wellbeing | 8,371 | 13.5% |
| Racial; ethnic or national communities | 8,357 | 13.5% |
| Social activities | 7,983 | 12.9% |
| Charity and VCS support | 7,784 | 12.5% |
| Students / learners | 7,773 | 12.5% |
| Food | 7,494 | 12.1% |
| Parents and guardians | 7,463 | 12.0% |
| Exercise and fitness | 7,107 | 11.4% |
| Mentoring | 6,854 | 11.0% |
| Artists / creative practitioners | 6,348 | 10.2% |
| Music | 6,291 | 10.1% |
| Women | 6,210 | 10.0% |
| Literature | 5,221 | 8.4% |
| Older people | 5,154 | 8.3% |
| Research | 5,046 | 8.1% |
| Unemployed / workless | 5,013 | 8.1% |
| Recreation | 4,824 | 7.8% |
| Associations | 4,195 | 6.8% |
| Visual arts | 4,168 | 6.7% |
| Crime and Justice | 4,036 | 6.5% |
| Carers | 3,916 | 6.3% |
| Fundraising | 3,791 | 6.1% |
| People with learning disabilities | 3,600 | 5.8% |
| Grant making | 3,587 | 5.8% |
| Further education | 3,508 | 5.7% |
| Advice and individual advocacy | 3,492 | 5.6% |
| Victims of abuse / domestic violence | 3,450 | 5.6% |
| Homelessness | 3,421 | 5.5% |
| Housing | 3,357 | 5.4% |
| People experiencing homelessness | 3,253 | 5.2% |
| Counselling and therapy | 3,129 | 5.0% |
| Green space | 3,112 | 5.0% |
| Heritage | 3,071 | 4.9% |
| Addiction and dependency | 3,066 | 4.9% |
| Rural and farming areas | 2,984 | 4.8% |
| People with substance misuse issues | 2,847 | 4.6% |
| Conservation and sustainability | 2,764 | 4.5% |
| Theatre | 2,700 | 4.3% |
| History | 2,685 | 4.3% |
| Ex-offenders / criminal justice | 2,663 | 4.3% |
| Dance | 2,657 | 4.3% |
| Higher education | 2,641 | 4.3% |
| Social enterprise | 2,634 | 4.2% |
| Abuse | 2,631 | 4.2% |
| Young children | 2,565 | 4.1% |
| Economic development | 2,525 | 4.1% |
| Festival | 2,524 | 4.1% |
| Charity shops | 2,513 | 4.0% |
| People with long-term health conditions | 2,435 | 3.9% |
| Policy campaigning and advocacy | 2,335 | 3.8% |
| Offender support and rehabilitation | 2,220 | 3.6% |
| Animals | 2,119 | 3.4% |
| Community development | 2,105 | 3.4% |
| Film | 2,011 | 3.2% |
| Social club | 1,940 | 3.1% |
| Asylum seekers and refugees | 1,935 | 3.1% |
| Domestic abuse | 1,873 | 3.0% |
| Social care | 1,762 | 2.8% |
| Gardening | 1,747 | 2.8% |
| Religion | 1,708 | 2.8% |
| Community centre | 1,656 | 2.7% |
| Out of school club | 1,631 | 2.6% |
| Science | 1,588 | 2.6% |
| Christianity | 1,558 | 2.5% |
| Girls | 1,545 | 2.5% |
| Equality and diversity | 1,532 | 2.5% |
| Migrants | 1,493 | 2.4% |
| Clothes | 1,486 | 2.4% |
| Television | 1,451 | 2.3% |
| Outdoor pursuits | 1,370 | 2.2% |
| Urban areas | 1,361 | 2.2% |
| Men | 1,341 | 2.2% |
| Youth Groups | 1,335 | 2.2% |
| Historical conservation and restoration | 1,334 | 2.1% |
| Recycling | 1,325 | 2.1% |
| Hospital | 1,315 | 2.1% |
| Climate Emergency | 1,308 | 2.1% |
| Emergency services | 1,296 | 2.1% |
| Community cafe | 1,255 | 2.0% |
| Nursery | 1,251 | 2.0% |
| Community association | 1,240 | 2.0% |
| Food banks | 1,230 | 2.0% |
| Secondary education | 1,198 | 1.9% |
| Wildlife | 1,192 | 1.9% |
| Museum | 1,175 | 1.9% |
| Performing art | 1,149 | 1.9% |
| Childcare | 1,104 | 1.8% |
| Residential care | 1,099 | 1.8% |
| LGBTQ+ | 1,072 | 1.7% |
| Citizenship | 990 | 1.6% |
| Health condition | 964 | 1.6% |
| Bereavement | 953 | 1.5% |
| Dementia | 897 | 1.4% |
| Umbrella bodies | 881 | 1.4% |
| Choirs | 874 | 1.4% |
| Children in care | 843 | 1.4% |
| Respite | 826 | 1.3% |
| Radio | 783 | 1.3% |
| Open spaces | 727 | 1.2% |
| Vocational training | 717 | 1.2% |
| Maternity | 709 | 1.1% |
| Armed forces | 652 | 1.1% |
| Church or place of worship | 649 | 1.0% |
| Healthcare provider | 630 | 1.0% |
| Energy | 629 | 1.0% |
| Society | 626 | 1.0% |
| Cancer | 601 | 1.0% |
| Primary education | 591 | 1.0% |
| Student support | 584 | 0.9% |
| Adult education | 583 | 0.9% |
| Media | 542 | 0.9% |
| Temporary or emergency housing | 536 | 0.9% |
| Nursing | 528 | 0.9% |
| Hobbies | 516 | 0.8% |
| Grants to organisations | 509 | 0.8% |
| Horses | 500 | 0.8% |
| Women's Institute | 472 | 0.8% |
| Religious; racial or cross-border harmony | 450 | 0.7% |
| IT and digital | 442 | 0.7% |
| Basic skills | 435 | 0.7% |
| Human rights | 426 | 0.7% |
| Sexual abuse | 422 | 0.7% |
| Hearing loss | 412 | 0.7% |
| Islam | 409 | 0.7% |
| Print media | 404 | 0.7% |
| Dogs | 403 | 0.6% |
| ESOL | 381 | 0.6% |
| Racial justice | 377 | 0.6% |
| Carer support | 374 | 0.6% |
| Community transport | 363 | 0.6% |
| Prevention and safety | 363 | 0.6% |
| Road safety | 341 | 0.5% |
| Languages | 339 | 0.5% |
| Visual impairment | 332 | 0.5% |
| Healthcare workers | 316 | 0.5% |
| Philosophy | 314 | 0.5% |
| Playing fields | 311 | 0.5% |
| Housing association | 310 | 0.5% |
| Medical research | 302 | 0.5% |
| Humanitarian relief | 292 | 0.5% |
| Surgery | 289 | 0.5% |
| Village hall | 278 | 0.4% |
| Conflict resolution | 276 | 0.4% |
| Religious ministry | 245 | 0.4% |
| Scouting | 241 | 0.4% |
| Domiciliary care | 240 | 0.4% |
| International development | 231 | 0.4% |
| Adult day care | 215 | 0.3% |
| Playgroup | 215 | 0.3% |
| League of Friends | 209 | 0.3% |
| Widows; widowers and orphans | 208 | 0.3% |
| Refuge or shelter | 202 | 0.3% |
| Palliative care | 197 | 0.3% |
| Hospice | 197 | 0.3% |
| Strokes | 195 | 0.3% |
| Victim support | 183 | 0.3% |
| Parent teacher | 177 | 0.3% |
| Child abuse | 176 | 0.3% |
| Playground | 174 | 0.3% |
| Trafficking and modern slavery | 173 | 0.3% |
| Search and rescue | 166 | 0.3% |
| Water | 161 | 0.3% |
| Religious activities | 151 | 0.2% |
| Veterans | 144 | 0.2% |
| Complementary therapies | 142 | 0.2% |
| Physiotherapy | 140 | 0.2% |
| Ambulance service | 131 | 0.2% |
| Army | 128 | 0.2% |
| Religious education | 126 | 0.2% |
| Democracy | 118 | 0.2% |
| Children's homes | 116 | 0.2% |
| Musical theatre | 112 | 0.2% |
| Miners | 103 | 0.2% |
| HIV / Aids | 99 | 0.2% |
| Archaeology | 97 | 0.2% |
| School fundraising | 94 | 0.2% |
| Multiple Sclerosis | 93 | 0.1% |
| Social Investment | 82 | 0.1% |
| Orchestra | 73 | 0.1% |
| Emergency service workers | 73 | 0.1% |
| Youth centre | 72 | 0.1% |
| Cats | 70 | 0.1% |
| Girlguiding | 70 | 0.1% |
| Hinduism | 69 | 0.1% |
| Sikhism | 56 | 0.1% |
| Cadets | 56 | 0.1% |
| Opera | 52 | 0.1% |
| Judaism | 50 | 0.1% |
| Sickle Cell | 49 | 0.1% |
| Healthcare provider support | 48 | 0.1% |
| Cerebral palsy | 45 | 0.1% |
| School support | 45 | 0.1% |
| Monuments; statues and memorials | 44 | 0.1% |
| Alternative medicine | 41 | 0.1% |
| Cemetery | 38 | 0.1% |
| YWCA / YMCA | 35 | 0.1% |
| Buddhism | 34 | 0.1% |
| Chaplaincy | 32 | 0.1% |
| Fibromyalgia | 31 | 0.0% |
| Navy | 31 | 0.0% |
| Natural history | 29 | 0.0% |
| RAF | 28 | 0.0% |
| Chronic Fatigue Syndrome | 22 | 0.0% |
| Residential care with nursing | 21 | 0.0% |
| Donkeys | 21 | 0.0% |
| Service clubs | 20 | 0.0% |
| Church of England | 19 | 0.0% |
| University of the Third Age | 19 | 0.0% |
| Saving of lives | 18 | 0.0% |
| Clergy | 18 | 0.0% |
| Roman Catholic | 17 | 0.0% |
| Rotary club | 14 | 0.0% |
| Grants to individuals | 13 | 0.0% |
| Student union | 12 | 0.0% |
| Motor Neurone Disease | 10 | 0.0% |
| Riding for the disabled | 10 | 0.0% |
| Planning and architecture | 8 | 0.0% |
| Friends of healthcare provider | 7 | 0.0% |
| Parochial Church Council | 4 | 0.0% |
| Society of Friends (Quakers) | 4 | 0.0% |
| Lions club | 4 | 0.0% |
| Benevolent Society | 4 | 0.0% |
| Jainism | 2 | 0.0% |
| Almshouse | 2 | 0.0% |
| Spiritualism | 2 | 0.0% |
| Fraternal societies | 1 | 0.0% |
| Jehovah's Witnesses | 1 | 0.0% |
A3. Beneficiary Tag Frequencies
All UKCAT beneficiary tags matched in the dataset, ordered by frequency. Percentages are calculated over all 62,088 CIC incorporations.
| Tag | Count | % of CICs |
|---|---|---|
| General public / local community | 20,479 | 33.0% |
| People with mental health needs | 14,881 | 24.0% |
| People in poverty / disadvantaged | 14,125 | 22.7% |
| Young people | 13,578 | 21.9% |
| Civil society organisations | 12,277 | 19.8% |
| Children | 11,919 | 19.2% |
| Families | 10,143 | 16.3% |
| People with disabilities | 7,324 | 11.8% |
| Racial; ethnic or national communities | 6,403 | 10.3% |
| Women | 4,405 | 7.1% |
| Businesses / small enterprises | 4,328 | 7.0% |
| Parents and guardians | 3,780 | 6.1% |
| Students / learners | 3,568 | 5.7% |
| Artists / creative practitioners | 3,388 | 5.5% |
| Older people | 3,057 | 4.9% |
| Unemployed / workless | 2,887 | 4.6% |
| People with learning disabilities | 2,690 | 4.3% |
| Carers | 2,350 | 3.8% |
| People experiencing homelessness | 2,151 | 3.5% |
| Ex-offenders / criminal justice | 1,805 | 2.9% |
| Victims of abuse / domestic violence | 1,738 | 2.8% |
| People with substance misuse issues | 1,578 | 2.5% |
| Asylum seekers and refugees | 1,416 | 2.3% |
| People with long-term health conditions | 1,365 | 2.2% |
| Young children | 1,266 | 2.0% |
| Migrants | 984 | 1.6% |
| Girls | 969 | 1.6% |
| Men | 948 | 1.5% |
| LGBTQ+ | 880 | 1.4% |
| Widows; widowers and orphans | 138 | 0.2% |
| Riding for the disabled | 5 | 0.0% |