1
0
mirror of https://github.com/kellyjonbrazil/jc.git synced 2025-06-17 00:07:37 +02:00
Files
jc/docs/parsers/universal.md

101 lines
3.5 KiB
Markdown
Raw Normal View History

2024-03-14 21:44:37 -07:00
[Home](https://kellyjonbrazil.github.io/jc/)
2022-01-25 17:07:47 -08:00
<a id="jc.parsers.universal"></a>
2022-01-20 09:46:24 -08:00
# jc.parsers.universal
2022-01-25 17:07:47 -08:00
2024-03-14 21:58:43 -07:00
## Table of Contents
2024-03-14 21:44:37 -07:00
2024-03-16 18:28:52 -07:00
* [jc.parsers.universal](#jc.parsers.universal)
* [simple_table_parse](#jc.parsers.universal.simple_table_parse)
* [sparse_table_parse](#jc.parsers.universal.sparse_table_parse)
2024-03-14 21:44:37 -07:00
2022-03-04 13:35:16 -08:00
jc - JSON Convert universal parsers
2022-01-20 09:46:24 -08:00
2022-01-25 17:07:47 -08:00
<a id="jc.parsers.universal.simple_table_parse"></a>
2024-03-14 21:44:37 -07:00
### simple_table_parse
2022-01-25 17:07:47 -08:00
2022-01-20 09:46:24 -08:00
```python
2022-03-24 16:58:45 -07:00
def simple_table_parse(data: Iterable[str]) -> List[Dict]
2022-01-20 09:46:24 -08:00
```
Parse simple tables. There should be no blank cells. The last column
may contain data with spaces.
Example Table:
2022-03-11 10:25:24 -08:00
col_1 col_2 col_3 col_4 col_5
2022-03-10 15:36:11 -08:00
apple orange pear banana my favorite fruits
carrot squash celery spinach my favorite veggies
chicken beef pork eggs my favorite proteins
2022-01-20 09:46:24 -08:00
2022-03-11 10:25:24 -08:00
[{'col_1': 'apple', 'col_2': 'orange', 'col_3': 'pear', 'col_4':
'banana', 'col_5': 'my favorite fruits'}, {'col_1': 'carrot',
'col_2': 'squash', 'col_3': 'celery', 'col_4': 'spinach', 'col_5':
'my favorite veggies'}, {'col_1': 'chicken', 'col_2': 'beef',
'col_3': 'pork', 'col_4': 'eggs', 'col_5': 'my favorite proteins'}]
2022-03-10 16:50:55 -08:00
2022-01-25 18:03:34 -08:00
Parameters:
2022-01-20 09:46:24 -08:00
2022-03-24 16:58:45 -07:00
data: (iter) Text data to parse that has been split into lines
2022-01-25 18:03:34 -08:00
via .splitlines(). Item 0 must be the header row.
Any spaces in header names should be changed to
underscore '_'. You should also ensure headers are
lowercase by using .lower().
2022-01-20 09:46:24 -08:00
2023-01-31 17:00:42 -08:00
Also, ensure there are no blank rows in the data.
2022-01-20 09:46:24 -08:00
2022-01-25 18:03:34 -08:00
Returns:
List of Dictionaries
2022-01-20 09:46:24 -08:00
2022-01-25 17:07:47 -08:00
<a id="jc.parsers.universal.sparse_table_parse"></a>
2022-01-20 09:46:24 -08:00
2024-03-14 21:44:37 -07:00
### sparse_table_parse
2022-01-20 09:46:24 -08:00
```python
def sparse_table_parse(data: Iterable[str],
delim: str = '\u2063') -> List[Dict]
2022-01-20 09:46:24 -08:00
```
Parse tables with missing column data or with spaces in column data.
2022-03-10 16:50:55 -08:00
Blank cells are converted to None in the resulting dictionary. Data
elements must line up within column boundaries.
Example Table:
2022-03-11 10:25:24 -08:00
col_1 col_2 col_3 col_4 col_5
2022-03-10 15:36:11 -08:00
apple orange fuzzy peach my favorite fruits
green beans celery spinach my favorite veggies
chicken beef brown eggs my favorite proteins
2022-01-20 09:46:24 -08:00
2022-03-11 10:25:24 -08:00
[{'col_1': 'apple', 'col_2': 'orange', 'col_3': None, 'col_4':
'fuzzy peach', 'col_5': 'my favorite fruits'}, {'col_1':
'green beans', 'col_2': None, 'col_3': 'celery', 'col_4': 'spinach',
'col_5': 'my favorite veggies'}, {'col_1': 'chicken', 'col_2':
'beef', 'col_3': None, 'col_4': 'brown eggs', 'col_5':
'my favorite proteins'}]
2022-03-10 16:50:55 -08:00
2022-01-25 18:03:34 -08:00
Parameters:
data: (iter) An iterable of string lines (e.g. str.splitlines())
Item 0 must be the header row. Any spaces in header
names should be changed to underscore '_'. You
should also ensure headers are lowercase by using
.lower(). Do not change the position of header
names as the positions are used to find the data.
2022-01-25 18:03:34 -08:00
Also, ensure there are no blank line items.
2022-01-25 18:03:34 -08:00
2024-03-14 21:44:37 -07:00
delim: (string) Delimiter to use. By default `u\2063`
2022-01-25 18:03:34 -08:00
(invisible separator) is used since it is unlikely
to ever be seen in terminal output. You can change
this for troubleshooting purposes or if there is a
delimiter conflict with your data.
Returns:
List of Dictionaries
2022-01-20 09:46:24 -08:00
2024-03-14 21:44:37 -07:00