2024-03-14 21:44:37 -07:00
|
|
|
[Home](https://kellyjonbrazil.github.io/jc/)
|
2022-01-25 17:07:47 -08:00
|
|
|
<a id="jc.parsers.universal"></a>
|
2022-01-20 09:46:24 -08:00
|
|
|
|
|
|
|
# jc.parsers.universal
|
2022-01-25 17:07:47 -08:00
|
|
|
|
2024-03-14 21:58:43 -07:00
|
|
|
## Table of Contents
|
2024-03-14 21:44:37 -07:00
|
|
|
|
2024-03-16 18:28:52 -07:00
|
|
|
* [jc.parsers.universal](#jc.parsers.universal)
|
|
|
|
* [simple_table_parse](#jc.parsers.universal.simple_table_parse)
|
|
|
|
* [sparse_table_parse](#jc.parsers.universal.sparse_table_parse)
|
2024-03-14 21:44:37 -07:00
|
|
|
|
2022-03-04 13:35:16 -08:00
|
|
|
jc - JSON Convert universal parsers
|
2022-01-20 09:46:24 -08:00
|
|
|
|
2022-01-25 17:07:47 -08:00
|
|
|
<a id="jc.parsers.universal.simple_table_parse"></a>
|
|
|
|
|
2024-03-14 21:44:37 -07:00
|
|
|
### simple_table_parse
|
2022-01-25 17:07:47 -08:00
|
|
|
|
2022-01-20 09:46:24 -08:00
|
|
|
```python
|
2022-03-24 16:58:45 -07:00
|
|
|
def simple_table_parse(data: Iterable[str]) -> List[Dict]
|
2022-01-20 09:46:24 -08:00
|
|
|
```
|
|
|
|
|
2022-03-10 15:18:27 -08:00
|
|
|
Parse simple tables. There should be no blank cells. The last column
|
|
|
|
may contain data with spaces.
|
|
|
|
|
|
|
|
Example Table:
|
|
|
|
|
2022-03-11 10:25:24 -08:00
|
|
|
col_1 col_2 col_3 col_4 col_5
|
2022-03-10 15:36:11 -08:00
|
|
|
apple orange pear banana my favorite fruits
|
|
|
|
carrot squash celery spinach my favorite veggies
|
|
|
|
chicken beef pork eggs my favorite proteins
|
2022-01-20 09:46:24 -08:00
|
|
|
|
2022-03-11 10:25:24 -08:00
|
|
|
[{'col_1': 'apple', 'col_2': 'orange', 'col_3': 'pear', 'col_4':
|
|
|
|
'banana', 'col_5': 'my favorite fruits'}, {'col_1': 'carrot',
|
|
|
|
'col_2': 'squash', 'col_3': 'celery', 'col_4': 'spinach', 'col_5':
|
|
|
|
'my favorite veggies'}, {'col_1': 'chicken', 'col_2': 'beef',
|
|
|
|
'col_3': 'pork', 'col_4': 'eggs', 'col_5': 'my favorite proteins'}]
|
2022-03-10 16:50:55 -08:00
|
|
|
|
2022-01-25 18:03:34 -08:00
|
|
|
Parameters:
|
2022-01-20 09:46:24 -08:00
|
|
|
|
2022-03-24 16:58:45 -07:00
|
|
|
data: (iter) Text data to parse that has been split into lines
|
2022-01-25 18:03:34 -08:00
|
|
|
via .splitlines(). Item 0 must be the header row.
|
|
|
|
Any spaces in header names should be changed to
|
|
|
|
underscore '_'. You should also ensure headers are
|
|
|
|
lowercase by using .lower().
|
2022-01-20 09:46:24 -08:00
|
|
|
|
2023-01-31 17:00:42 -08:00
|
|
|
Also, ensure there are no blank rows in the data.
|
2022-01-20 09:46:24 -08:00
|
|
|
|
2022-01-25 18:03:34 -08:00
|
|
|
Returns:
|
|
|
|
|
|
|
|
List of Dictionaries
|
2022-01-20 09:46:24 -08:00
|
|
|
|
2022-01-25 17:07:47 -08:00
|
|
|
<a id="jc.parsers.universal.sparse_table_parse"></a>
|
2022-01-20 09:46:24 -08:00
|
|
|
|
2024-03-14 21:44:37 -07:00
|
|
|
### sparse_table_parse
|
2022-01-20 09:46:24 -08:00
|
|
|
|
|
|
|
```python
|
2022-03-20 10:16:29 -07:00
|
|
|
def sparse_table_parse(data: Iterable[str],
|
|
|
|
delim: str = '\u2063') -> List[Dict]
|
2022-01-20 09:46:24 -08:00
|
|
|
```
|
|
|
|
|
|
|
|
Parse tables with missing column data or with spaces in column data.
|
2022-03-10 16:50:55 -08:00
|
|
|
Blank cells are converted to None in the resulting dictionary. Data
|
|
|
|
elements must line up within column boundaries.
|
2022-03-10 15:18:27 -08:00
|
|
|
|
|
|
|
Example Table:
|
|
|
|
|
2022-03-11 10:25:24 -08:00
|
|
|
col_1 col_2 col_3 col_4 col_5
|
2022-03-10 15:36:11 -08:00
|
|
|
apple orange fuzzy peach my favorite fruits
|
|
|
|
green beans celery spinach my favorite veggies
|
|
|
|
chicken beef brown eggs my favorite proteins
|
2022-01-20 09:46:24 -08:00
|
|
|
|
2022-03-11 10:25:24 -08:00
|
|
|
[{'col_1': 'apple', 'col_2': 'orange', 'col_3': None, 'col_4':
|
|
|
|
'fuzzy peach', 'col_5': 'my favorite fruits'}, {'col_1':
|
|
|
|
'green beans', 'col_2': None, 'col_3': 'celery', 'col_4': 'spinach',
|
|
|
|
'col_5': 'my favorite veggies'}, {'col_1': 'chicken', 'col_2':
|
|
|
|
'beef', 'col_3': None, 'col_4': 'brown eggs', 'col_5':
|
|
|
|
'my favorite proteins'}]
|
2022-03-10 16:50:55 -08:00
|
|
|
|
2022-01-25 18:03:34 -08:00
|
|
|
Parameters:
|
|
|
|
|
2022-03-20 10:16:29 -07:00
|
|
|
data: (iter) An iterable of string lines (e.g. str.splitlines())
|
|
|
|
Item 0 must be the header row. Any spaces in header
|
|
|
|
names should be changed to underscore '_'. You
|
|
|
|
should also ensure headers are lowercase by using
|
|
|
|
.lower(). Do not change the position of header
|
|
|
|
names as the positions are used to find the data.
|
2022-01-25 18:03:34 -08:00
|
|
|
|
2022-03-20 10:16:29 -07:00
|
|
|
Also, ensure there are no blank line items.
|
2022-01-25 18:03:34 -08:00
|
|
|
|
2024-03-14 21:44:37 -07:00
|
|
|
delim: (string) Delimiter to use. By default `u\2063`
|
2022-01-25 18:03:34 -08:00
|
|
|
(invisible separator) is used since it is unlikely
|
|
|
|
to ever be seen in terminal output. You can change
|
|
|
|
this for troubleshooting purposes or if there is a
|
|
|
|
delimiter conflict with your data.
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
|
|
|
List of Dictionaries
|
2022-01-20 09:46:24 -08:00
|
|
|
|
2024-03-14 21:44:37 -07:00
|
|
|
|