How to parse and group hierarchical list items from an unindented string in Python?

There seem to be inconsistencies.

If we look at output 21

[
“Types:\n1. Strategy: Games that require careful planning and decision-making to achieve victory.\n2. Party: Games designed to be played in social gatherings, often involving teamwork or competition.\n3. Family: Games suitable for players of all ages, typically with simple rules and shorter play times.”,

“Objectives:\n1. Strategy: To outmaneuver opponents and secure a winning position through tactical choices.\n2. Party: To engage in lively interactions, showcase skills, and enjoy shared experiences with others.\n3. Family: To provide entertainment, foster bonding, and create memorable moments for all participants.”,

]

We are taking Types:\n as a top level from


Types:\n1. Strategy: Games that require careful planning and decision-making to achieve victory.\n2. Party: Games designed to be played in social gatherings, often involving teamwork or competition.\n3. Family: Games suitable for players of all ages, …


However in output 2 you have

[
‘1. Develop a Comprehensive Game Experience\na. Gather components from resources such as meeples, dice, cards, and tiles to enrich the gameplay experience.’,

‘1.1 Establish Game's Theme and Goals within the Stargate Universe\na. Define the specific theme the game will explore within the Stargate universe.\n- Describe how the game will interact with characters, planets, and technologies.\nb. Outline the specific goals the players aim to achieve within the Stargate setting.\n- Define the primary objectives (e.g., completing missions, building alliances).\nc. Utilize Stargate lore and canon to understand franchise-standard game themes and goals.’,
]

Completely disregarding the first sentence of the input starting with Project Purpose:


Project Purpose:\nTo create a complex and immersive board game experience using techniques like deck-building, worker placement, area control, engine building, set collection, variable player powers, modular board, and asymmetric gameplay. The game uses resources like meeples, dice, cards, and tiles to enhance its gameplay elements.\nParent Objective/Sub-Objective:\n1. Develop a Comprehensive Game Experience\na. Gather components from resources such as meeples, dice, cards, and tiles to enrich the gameplay experience.\nSub-objectives at Depth Level


I appreciate the space and two words, but is that what separates a valid or invalid match?

Actually just copying and pasting input 21 into regex101

It looks like this regex .*?\\n does a lot of the heavy lifting for us

@christinanamodeo832

It seems to me given the variety of valid top level types and sub-level types you could do with some more varied examples for inputs and outputs. Most of the examples you have given are limited to top level examples starting with numbers (1., 2., 3.). The ones that interest me are those that are valid for both e.g. bullets ( -, *, •)

Yeah… I dont agree with Output 21. Output 23 indicates that the result of Output 21 should have trimmed off the wordings.

Simplify/shorten the examples based on the criteria is my thought. It’s a lot to work through in it’s current form.

Maybe I’m not getting it, but the difference I see is that 23 has no newlines e.g.

The Witcher: Old World board ...
rather than
The Witcher:\n Old World board ...

I think the criteria still needs ironing out though.

Even if so, I fail to see how “Types:” falls into one of these categories (that the OP has declared as explicit):

(What comes after Types? Typet?)

1 Like