dsstools
dsstools
dssTools
dssTools is a Python wrapper around NetworkX to ease usage for network analysis for our case of Digital Social Sciences.
Category
Bases: LowercaseStrEnum
Enum providing for categorizing internal codes.
The following values are accepted: TEXT: Used for terms in text. This code is representative of a term found in a node text. MANUAL: Used for manually created codes in dssCode (or other sources).
Code
Bases: NamedTuple
Helper tuple class for passing complex arguments as node attributes.
to_str()
Convert the code to string.
Description
Class containing description drawing preferences.
set_text(text)
Sets the description setting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
Text to set as description |
required |
Returns:
ElementAttribute
Bases: Supplier
Class for graph element values (already set in the graph.
__init__(keyword)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keyword
|
str
|
the key of the inherent attribute |
required |
GenericMapping
Bases: ABC
Generic Interface for mapping visual attributes to graph elements values.
get(graph_element, graph)
abstractmethod
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph_element
|
NxElementView
|
NxElementView |
required |
graph
|
Graph
|
nx.Graph |
required |
Returns: A dictionary with graph_element as the keys and the value as visual mapping.
GraphElement
set_alphas(arg)
Sets alpha based on argument.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | float
|
The text transparency as mapping between 0 and 1. |
required |
Returns: self
set_colors(arg)
Sets the colors of the displayed nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | str
|
Colors to set. String values will be mapped onto all nodes. |
required |
Returns:
| Type | Description |
|---|---|
|
self |
set_sizes(arg)
Sets size of labels for nodes as pt sizes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | int | float
|
Font size for text labels |
required |
Returns: self
set_transparency(arg)
Sets transparency based on argument.
This is the same as set_alphas(). Alpha is the value for the transparency of a
color.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | float
|
The text transparency as mapping between 0 and 1. |
required |
Returns: self
ImageGenerator
Base class for setting up image generation.
change_graph(graph)
Sets the graph attribute.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
A NetworkX graph object. |
required |
Returns:
| Type | Description |
|---|---|
|
self |
deepcopy()
Create deep copy of the object.
This is the same as calling copy.deepcopy() on the object
draw_description()
Draw description below the image according to the settings.
draw_edges()
Draw edges according to the settings.
draw_labels()
Draws labels based on values.
draw_legend()
Not yet implemented.
draw_nodes()
Draw nodes according to the settings.
set_axis(axis)
Sets an existing matplotlib axis object for the ImageGenerator object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
axis
|
Matplotlib axis |
required |
Returns:
| Type | Description |
|---|---|
|
self |
set_legend(legend=True)
Sets the legend setting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
legend
|
(default = True) Whether to show the legend. |
True
|
Returns:
| Type | Description |
|---|---|
|
self |
write_file(path)
Write file to disk on the given path.
Will also close the internal figure object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
str | Path: Path to write the file to. |
required |
Returns:
| Type | Description |
|---|---|
|
self |
write_json(path)
Writes the graph data to a json file following nx.node_link_data format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
saving location and name for the json-file |
required |
Returns:
| Type | Description |
|---|---|
'ImageGenerator'
|
self |
Labels
Bases: GraphElement
set_alphas(arg)
Sets alpha based on argument.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | float
|
The text transparency as mapping between 0 and 1. |
required |
Returns: self
set_colors(arg)
Sets the colors of the displayed nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | str
|
Colors to set. String values will be mapped onto all nodes. |
required |
Returns:
| Type | Description |
|---|---|
|
self |
set_font_families(arg)
Sets font family for all labels if single font is passed,.
Allows for multiple fonts to be set if an array of fonts is passed, allows for fonts to be individually set for labels based on the given node if a dictionary is passed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | str
|
Font family |
required |
set_labels(arg)
Sets labels for nodes based on arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
dict
|
node identifier as the integer and the label as the string |
required |
set_sizes(arg)
Sets size of labels for nodes as pt sizes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | int | float
|
Font size for text labels |
required |
Returns: self
set_transparency(arg)
Sets transparency based on argument.
This is the same as set_alphas(). Alpha is the value for the transparency of a
color.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | float
|
The text transparency as mapping between 0 and 1. |
required |
Returns: self
Layouter
create_layout(graph, seed=None, pos=None, **kwargs)
Create position dictionary according to set layout engine. Default layout is Spring.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
Graph
|
Graph object |
required |
seed
|
int
|
Set a default seed (default None) |
None
|
pos
|
Pre-populated positions |
None
|
Returns:
| Type | Description |
|---|---|
None
|
Dictionary of node and positions. |
read_from_file(filename, **kwargs)
Reads position from JSON file under filepath.
The following structure for the JSON is expected, where each key contains an array of length 2 containing the coordinates. Coordinates should be in the range [-1,1]:
{
"domain1": [-0.1467271130230262, 0.25512246449304427],
"domain2": [-0.3683594304205127, 0.34942480334119136],
}
This structure is generated through dsstools.Layouter().write_to_file().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
Union[Path]
|
Path to file to be read. |
required |
graph
|
Graph
|
|
required |
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary of nodes and positions. |
read_from_graph(graph, pos_name=('x', 'y'))
Read positions from node attributes in the graph.
This is relevant when importing from Pajek or GEXF files where the positions are already set with another tool. Imported values are normalized onto [-1,1] in all directions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
Graph
|
Graph object including the node attributes. |
required |
pos_name
|
tuple
|
Node attribute names to look for. These depend on the imported file format. |
('x', 'y')
|
Returns:
| Type | Description |
|---|---|
|
Dictionary of positions per Node. |
read_or_create_layout(filepath, graph, seed, overwrite=False, **kwargs)
Read positions from file. If non-existant create pos and write to file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filename
|
Union[str, Path]
|
Filename to read positions from |
required |
graph
|
Graph
|
Graph object to update |
required |
overwrite
|
bool
|
Overwrite existing file (default False) |
False
|
Returns:
| Type | Description |
|---|---|
dict
|
Dictionary of positions per Node. Will return an empty dict if creation |
dict
|
failed. |
Nodes
Bases: GraphElement
set_alphas(arg)
Sets alpha based on argument.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | float
|
The text transparency as mapping between 0 and 1. |
required |
Returns: self
set_colors(arg)
Sets the colors of the displayed nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | str
|
Colors to set. String values will be mapped onto all nodes. |
required |
Returns:
| Type | Description |
|---|---|
|
self |
set_contour_colors(arg)
Sets the contour color of the displayed nodes.
Contour means the outer border of a node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | str
|
Colors to set. String values will be mapped onto all node contours. Additional options contain "node" and "edge" to automatically select the corresponding color. |
required |
Returns:
| Type | Description |
|---|---|
|
self |
set_contour_sizes(arg)
Sets the contour sizes of the displayed nodes.
Contour means the outer border of a node.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | float | int
|
Sizes to set. Integer values will be mapped onto all node contours. String values will get mapped onto the corresponding data arrays or closeness values per node. |
required |
Returns:
| Type | Description |
|---|---|
|
self |
set_positions(pos)
Sets the node positions as a dict or list.
When using a file, use set_position_file() instead.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pos
|
dict | list | Path | str
|
dict | list: Array of positions. Dicts should be keyed by node ID. |
required |
Returns:
| Type | Description |
|---|---|
|
self |
set_sizes(arg)
Sets the sizes of the displayed nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | float | int
|
Sizes to set. Scalar values will be mapped onto all nodes. String values |
required |
Returns:
| Type | Description |
|---|---|
|
self |
set_transparency(arg)
Sets transparency based on argument.
This is the same as set_alphas(). Alpha is the value for the transparency of a
color.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arg
|
GenericMapping | float
|
The text transparency as mapping between 0 and 1. |
required |
Returns: self
NumpyEncoder
Bases: JSONEncoder
Json encoder for numpy arrays.
Percentile
PositionKeyCoder
Provides methods to consistently en- & decode position data for nx.Graphs
decode_typed_keys(dct)
object_hook for json.load() that recognises the prefixes set by the encoder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dct
|
dict
|
A dictionary from a json file. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
A decoded version of the dictionary respectively node position data |
encode_typed_keys(obj)
Recursively unpacks json-formats that we use for saving positions.
Use this to prepare a positions file for json.dumps. This ensures that integers can be set as keys respectively nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj
|
any
|
The json-content that needs to be encoded. |
required |
Returns:
| Type | Description |
|---|---|
dict | list
|
A valid format for the json.dump()-function. |
RawDictionary
Bases: Supplier
__init__(dictionary)
Assign a dictionary to the graph elements.
This can be a subgraph or a normal graph. The keys must match at least one of the node ids or edge tuples. Example: The dictionaries returned by NetworkX calculations on graphs return suitable dictionaries
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dictionary
|
dict
|
key is GraphElement, value is value to be supplied |
required |
StructuralAttribute
Bases: Supplier
Class for providing structural graph element values.
These are attributes based on the graph structure and need to be calculated.
__init__(keyword=None, *, reverse=False, alt_nx_calculation=None)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
keyword
|
Optional[str]
|
the keyword for the calculation |
None
|
reverse
|
bool
|
determines if the calculation should use inverted edge values if True, default False |
False
|
Supplier
Bases: ABC
Basic interface for supplying graph element values.
TextSearch
Bases: WDCGeneric
Class allowing to search for keywords in the WDC API.
token
property
writable
Get the password token.
__init__(identifier=None, *, token=None, api='https://dss-wdc.wiso.uni-hamburg.de/api', insecure=False, timeout=60, params=None)
Base class for interacting with the WDC API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier
|
Optional[str]
|
Identifier of the network data. |
None
|
token
|
Optional[str]
|
Token for authorization. |
None
|
api
|
str
|
API address to send request to. Leave this as is. |
'https://dss-wdc.wiso.uni-hamburg.de/api'
|
insecure
|
bool
|
Hide warning regarding missing https. |
False
|
timeout
|
int
|
Set the timeout to the server. Increase this if you request large networks. |
60
|
params
|
Optional[dict]
|
These are additional keyword arguments passed onto the API endpoint. See https://dss-wdc.wiso.uni-hamburg.de/#_complex_datatypes_for_the_api_requests for further assistance. |
None
|
search(domains, terms, exact=True)
Searches the given keywords across a Graph or iterator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
domains
|
Graph | list
|
Set of identifiers to search in. |
required |
terms
|
list[str]
|
Terms to search for. |
required |
Returns:
| Type | Description |
|---|---|
|
Updated graph or dict containing the responses, Set of all failed |
|
|
responses |
WDCGeneric
token
property
writable
Get the password token.
__init__(identifier=None, *, token=None, api='https://dss-wdc.wiso.uni-hamburg.de/api', insecure=False, timeout=60, params=None)
Base class for interacting with the WDC API.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
identifier
|
Optional[str]
|
Identifier of the network data. |
None
|
token
|
Optional[str]
|
Token for authorization. |
None
|
api
|
str
|
API address to send request to. Leave this as is. |
'https://dss-wdc.wiso.uni-hamburg.de/api'
|
insecure
|
bool
|
Hide warning regarding missing https. |
False
|
timeout
|
int
|
Set the timeout to the server. Increase this if you request large networks. |
60
|
params
|
Optional[dict]
|
These are additional keyword arguments passed onto the API endpoint. See https://dss-wdc.wiso.uni-hamburg.de/#_complex_datatypes_for_the_api_requests for further assistance. |
None
|
calculate_betweenness_centrality(graph, name='_betweenness', **kwargs)
Updates the nodes in the graph with betweenness centrality.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
Graph
|
nx.Graph The graph to calculate on. |
required |
name
|
str
|
Name of the centrality type. |
'_betweenness'
|
**kwargs
|
All arguments passed onto nx.betweenness_centrality (see https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html#betweenness-centrality) |
{}
|
Returns:
| Type | Description |
|---|---|
|
Graph including the closeness centrality. |
calculate_closeness_centrality(graph, name='_closeness', **kwargs)
Updates the nodes in the graph with closeness centrality.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
Graph
|
nx.Graph The graph to calculate on. |
required |
name
|
str
|
Name of the centrality type. |
'_closeness'
|
**kwargs
|
All arguments passed onto nx.closeness_centrality (see https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.closeness_centrality.html#closeness-centrality) |
{}
|
Returns:
| Type | Description |
|---|---|
|
Graph including the betweenness centrality. |
clean_graph_data_attributes(graph)
Replace empty strings in data attributes with np.nan.
ensure_file_format(path, user_saving_format, *, default_format)
Ensures that the provided path has a saving format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str | Path
|
the path that needs to be validated |
required |
user_saving_format
|
str | None
|
the saving format provided by the user |
required |
default_format
|
str
|
the format a programmer can set that will be used as default, if no format was provided at all |
required |
Returns:
| Type | Description |
|---|---|
tuple[Path, str]
|
the filepath and format (without leading dot) as an 2-Tuple |
filtering(base, new_values, attr, predicate)
Filter the visual values of the graph element based on the attr values.
If the predicate is True the value of the new Mapping is applied, keyed by node. If the predicate is False the value of the base Mapping is applied, keyed by node. If the base mapping contains a fallback, the nodes with the fallback value will retain that value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base
|
GenericMapping
|
the original Mapping whose values will be assigned if the supplier value |
required |
new_values
|
GenericMapping
|
the new Mapping whose values will be assigned if the supplier value |
required |
attr
|
str | Supplier
|
attribute that provides the values to be evaluated by the predicate |
required |
predicate
|
the expression to evaluate the attr values, must return a boolean |
required |
Returns:
| Type | Description |
|---|---|
|
Filter Mapping filtered by predicate applied to attr by assigning new_values if |
|
|
predicate is True and base values if predicate is False. |
Examples:
G.add_node("a", rating=3, school_type="uni")
G.add_node("b", rating=7, school_type="college")
rating = sequential("rating", out_range=(12, 36))
new_mapping = fixed(1)
ig.nodes.set_sizes(filtering(rating, new_mapping, "school_type", lambda x:x is "uni"))
sizes = a: 1, b: 36
node a has been given the value 1 because the "school_type" is "uni"
node b remains unchanged
fixed(value)
Set a fixed value, that is constant across all items in the chosen graph element.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
v |
required |
Returns:
| Type | Description |
|---|---|
|
FixedValue |
Examples:
from_node(node_mapping, source, *, fallback=None)
Assign a value from a node to the edges.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_mapping
|
GenericMapping
|
GenericMapping the color should correspond to. |
required |
source
|
Literal['incoming', 'outgoing', 'matching']
|
"incoming" uses the value from the incoming node | "outgoing" |
required |
fallback
|
the fallback value assigned when incoming and outgoing nodes |
None
|
import_attributes_from_csv(graph, filepath, import_columns, index_label='', cleanup_functions=None)
Import attributes from CSV file with some cleanup.self.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
DiGraph
|
Graph on which the data should be applied to. |
required |
filepath
|
str
|
Path of the CSV file. |
required |
import_columns
|
list[str]
|
Columns to be imported, can be None. |
required |
index_label
|
Column name used as index, defaults to first column. (default None) |
''
|
|
cleanup_functions
|
Functions
|
to be applied on the DataFrame. (default None) |
None
|
Returns:
| Type | Description |
|---|---|
DiGraph
|
nx.DiGraph: Graph with the applied data. |
import_attributes_from_dataframe(graph, df)
Import attributes from Pandas dataframe.
The index of the dataframe should be the name of the graph node. Non-existing nodes are ignored and will not get the attribute.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
DiGraph
|
Graph on which the data should be applied to. |
required |
filepath
|
DataFrame
|
Path of the CSV file. |
required |
import_columns
|
list[str]
|
Columns to be imported, can be None. |
required |
Returns:
| Type | Description |
|---|---|
DiGraph
|
nx.DiGraph: Graph with the applied data. |
import_from_dsscode(slug, snapshot, token, domain='dss-graph.wiso.uni-hamburg.de', cache=True, remove_selfloops=True, contract_redirects=False, explicit_include=False)
Import Graph object from dssCode.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slug
|
str
|
Name slug of the project (see dssCode-Interface) |
required |
snapshot
|
str
|
Snapshot hash |
required |
domain
|
str
|
The domain for the API call |
'dss-graph.wiso.uni-hamburg.de'
|
cache
|
(bool, Path, str)
|
Pass the cache directory. Defaults to temporary dir. |
True
|
remove_selfloops
|
bool
|
Remove edge selfloops. |
True
|
contract_redirects
|
bool
|
Contract redirecting nodes into one. |
False
|
explicit_include
|
bool
|
Include only explicitely marked nodes into graph |
False
|
Returns:
| Type | Description |
|---|---|
DiGraph
|
nx.DiGraph: Graph with the imported data. |
import_network(filepath, remove_selfloops=True)
Import network as a NetworkX directed graph and clean up circular edges.
percentile(base, new_values, attr, perc_range=None, method='linear')
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base
|
GenericMapping
|
the Mapping, whose values are assigned to the node if the attr values are inside the perc_range |
required |
new_values
|
GenericMapping
|
the Mapping, whose values are assigned to the node if the attr values are outside the perc_range |
required |
attr
|
str | Supplier
|
the attribute with which the percentile is calculated, must contain numeric values |
required |
perc_range
|
tuple
|
tuple of min and max range for the percentile calculation, must be between 0 and 100 |
None
|
method
|
str
|
str, optional, default "linear" This parameter specifies the method to use for estimating the percentile. There are many different methods, some unique to NumPy. See the notes for explanation. The options sorted by their R type as summarized in the H&F paper [1]_ are:
The first three methods are discontinuous. NumPy further defines the following discontinuous variations of the default 'linear' (7.) option:
|
'linear'
|
Returns:
| Type | Description |
|---|---|
|
Filter Mapping filtered by percentile of attr assigning values based on new_values |
Examples:
qualitative(attr, mapping=None, *, cmap=None)
Use an attribute or colormap as value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attr
|
str | Code
|
str name of the category |
required |
mapping
|
dict | None
|
dict of category values as the key and desired values as the values |
None
|
cmap
|
str of a valid colormap or colormap object |
None
|
Returns:
| Type | Description |
|---|---|
|
Nominal |
Examples:
G.add_node("a", pet="dog")
G.add_node("b", pet="cat")
ig.nodes.set_colors(qualitative("rating", {"cat": "red", "dog": "green"}))
# color for node "a" is now "green" and "red" for node "b"
ig.nodes.set_colors(qualitative("rating", cmap="Pastel1"))
# color for node "a" is now the first color in the Pastel1 colormap and the
# second color for node "b"
read_from_pickle(folder='', timestamp='')
Read cached graph from directory.
Automatically selects the newest instance, except a timestamp is given.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dir
|
(str, Path)
|
Path to directory to search for pickles. If empty, default to temp dir. |
required |
timestamp
|
str
|
timestamp to explicitely select for. |
''
|
Returns:
| Type | Description |
|---|---|
DiGraph
|
nx.DiGraph: Graph with the imported data. |
sequential(attr, scale='lin', out_range=None, in_range=None, fallback=None, cmap=None)
Scale on graph element or structural graph attributes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*
|
|
required | |
attr
|
str | Supplier
|
str name of graph element or structural graph attribute to be scaled |
required |
scale
|
str | Callable
|
scale on which the values should be assigned |
'lin'
|
out_range
|
tuple of the min and max values of the final scale |
None
|
|
in_range
|
tuple of the min and max values of the set before normalization |
None
|
|
fallback
|
a color value or numeric value for None values |
None
|
Returns:
| Type | Description |
|---|---|
|
Sequential |
Examples: