Zum Inhalt

dsstools

dsstools

dssTools

dssTools is a Python wrapper around NetworkX to ease usage for network analysis for our case of Digital Social Sciences.

Category

Bases: LowercaseStrEnum

Enum providing for categorizing internal codes.

The following values are accepted: TEXT: Used for terms in text. This code is representative of a term found in a node text. MANUAL: Used for manually created codes in dssCode (or other sources).

Code

Bases: NamedTuple

Helper tuple class for passing complex arguments as node attributes.

to_str()

Convert the code to string.

Description

Class containing description drawing preferences.

set_text(text)

Sets the description setting.

Parameters:

Name Type Description Default
text str

Text to set as description

required

Returns:

ElementAttribute

Bases: Supplier

Class for graph element values (already set in the graph.

__init__(keyword)

Parameters:

Name Type Description Default
keyword str

the key of the inherent attribute

required

GenericMapping

Bases: ABC

Generic Interface for mapping visual attributes to graph elements values.

get(graph_element, graph) abstractmethod

Parameters:

Name Type Description Default
graph_element NxElementView

NxElementView

required
graph Graph

nx.Graph

required

Returns: A dictionary with graph_element as the keys and the value as visual mapping.

GraphElement

set_alphas(arg)

Sets alpha based on argument.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

set_colors(arg)

Sets the colors of the displayed nodes.

Parameters:

Name Type Description Default
arg GenericMapping | str

Colors to set. String values will be mapped onto all nodes.

required

Returns:

Type Description

self

set_sizes(arg)

Sets size of labels for nodes as pt sizes.

Parameters:

Name Type Description Default
arg GenericMapping | int | float

Font size for text labels

required

Returns: self

set_transparency(arg)

Sets transparency based on argument.

This is the same as set_alphas(). Alpha is the value for the transparency of a color.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

ImageGenerator

Base class for setting up image generation.

change_graph(graph)

Sets the graph attribute.

Parameters:

Name Type Description Default
graph

A NetworkX graph object.

required

Returns:

Type Description

self

deepcopy()

Create deep copy of the object.

This is the same as calling copy.deepcopy() on the object

draw_description()

Draw description below the image according to the settings.

draw_edges()

Draw edges according to the settings.

draw_labels()

Draws labels based on values.

draw_legend()

Not yet implemented.

draw_nodes()

Draw nodes according to the settings.

set_axis(axis)

Sets an existing matplotlib axis object for the ImageGenerator object.

Parameters:

Name Type Description Default
axis

Matplotlib axis

required

Returns:

Type Description

self

set_legend(legend=True)

Sets the legend setting.

Parameters:

Name Type Description Default
legend

(default = True) Whether to show the legend.

True

Returns:

Type Description

self

write_file(path)

Write file to disk on the given path.

Will also close the internal figure object.

Parameters:

Name Type Description Default
path str | Path

str | Path: Path to write the file to.

required

Returns:

Type Description

self

write_json(path)

Writes the graph data to a json file following nx.node_link_data format.

Parameters:

Name Type Description Default
path str | Path

saving location and name for the json-file

required

Returns:

Type Description
'ImageGenerator'

self

Labels

Bases: GraphElement

set_alphas(arg)

Sets alpha based on argument.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

set_colors(arg)

Sets the colors of the displayed nodes.

Parameters:

Name Type Description Default
arg GenericMapping | str

Colors to set. String values will be mapped onto all nodes.

required

Returns:

Type Description

self

set_font_families(arg)

Sets font family for all labels if single font is passed,.

Allows for multiple fonts to be set if an array of fonts is passed, allows for fonts to be individually set for labels based on the given node if a dictionary is passed.

Parameters:

Name Type Description Default
arg GenericMapping | str

Font family

required

set_labels(arg)

Sets labels for nodes based on arguments.

Parameters:

Name Type Description Default
arg dict

node identifier as the integer and the label as the string

required

set_sizes(arg)

Sets size of labels for nodes as pt sizes.

Parameters:

Name Type Description Default
arg GenericMapping | int | float

Font size for text labels

required

Returns: self

set_transparency(arg)

Sets transparency based on argument.

This is the same as set_alphas(). Alpha is the value for the transparency of a color.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

Layouter

create_layout(graph, seed=None, pos=None, **kwargs)

Create position dictionary according to set layout engine. Default layout is Spring.

Parameters:

Name Type Description Default
graph Graph

Graph object

required
seed int

Set a default seed (default None)

None
pos

Pre-populated positions

None

Returns:

Type Description
None

Dictionary of node and positions.

read_from_file(filename, **kwargs)

Reads position from JSON file under filepath.

The following structure for the JSON is expected, where each key contains an array of length 2 containing the coordinates. Coordinates should be in the range [-1,1]:

{
    "domain1": [-0.1467271130230262, 0.25512246449304427],
    "domain2": [-0.3683594304205127, 0.34942480334119136],
}

This structure is generated through dsstools.Layouter().write_to_file().

Parameters:

Name Type Description Default
filename Union[Path]

Path to file to be read.

required
graph Graph
required

Returns:

Type Description
dict

Dictionary of nodes and positions.

read_from_graph(graph, pos_name=('x', 'y'))

Read positions from node attributes in the graph.

This is relevant when importing from Pajek or GEXF files where the positions are already set with another tool. Imported values are normalized onto [-1,1] in all directions.

Parameters:

Name Type Description Default
graph Graph

Graph object including the node attributes.

required
pos_name tuple

Node attribute names to look for. These depend on the imported file format.

('x', 'y')

Returns:

Type Description

Dictionary of positions per Node.

read_or_create_layout(filepath, graph, seed, overwrite=False, **kwargs)

Read positions from file. If non-existant create pos and write to file.

Parameters:

Name Type Description Default
filename Union[str, Path]

Filename to read positions from

required
graph Graph

Graph object to update

required
overwrite bool

Overwrite existing file (default False)

False

Returns:

Type Description
dict

Dictionary of positions per Node. Will return an empty dict if creation

dict

failed.

Nodes

Bases: GraphElement

set_alphas(arg)

Sets alpha based on argument.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

set_colors(arg)

Sets the colors of the displayed nodes.

Parameters:

Name Type Description Default
arg GenericMapping | str

Colors to set. String values will be mapped onto all nodes.

required

Returns:

Type Description

self

set_contour_colors(arg)

Sets the contour color of the displayed nodes.

Contour means the outer border of a node.

Parameters:

Name Type Description Default
arg GenericMapping | str

Colors to set. String values will be mapped onto all node contours. Additional options contain "node" and "edge" to automatically select the corresponding color.

required

Returns:

Type Description

self

set_contour_sizes(arg)

Sets the contour sizes of the displayed nodes.

Contour means the outer border of a node.

Parameters:

Name Type Description Default
arg GenericMapping | float | int

Sizes to set. Integer values will be mapped onto all node contours. String values will get mapped onto the corresponding data arrays or closeness values per node.

required

Returns:

Type Description

self

set_positions(pos)

Sets the node positions as a dict or list.

When using a file, use set_position_file() instead.

Parameters:

Name Type Description Default
pos dict | list | Path | str

dict | list: Array of positions. Dicts should be keyed by node ID.

required

Returns:

Type Description

self

set_sizes(arg)

Sets the sizes of the displayed nodes.

Parameters:

Name Type Description Default
arg GenericMapping | float | int

Sizes to set. Scalar values will be mapped onto all nodes. String values

required

Returns:

Type Description

self

set_transparency(arg)

Sets transparency based on argument.

This is the same as set_alphas(). Alpha is the value for the transparency of a color.

Parameters:

Name Type Description Default
arg GenericMapping | float

The text transparency as mapping between 0 and 1.

required

Returns: self

NumpyEncoder

Bases: JSONEncoder

Json encoder for numpy arrays.

Percentile

Bases: Supplier

Class for filtering an existing supplier by the percentile.

__init__(supplier)

Parameters:

Name Type Description Default
supplier Supplier

Supplier whose values will be evaluated based on the percentile range, must contain numeric values

required

RawDictionary

Bases: Supplier

__init__(dictionary)

Assign a dictionary to the graph elements.

This can be a subgraph or a normal graph. The keys must match at least one of the node ids or edge tuples. Example: The dictionaries returned by NetworkX calculations on graphs return suitable dictionaries

Parameters:

Name Type Description Default
dictionary dict

key is GraphElement, value is value to be supplied

required

StructuralAttribute

Bases: Supplier

Class for providing structural graph element values.

These are attributes based on the graph structure and need to be calculated.

__init__(keyword=None, *, reverse=False, alt_nx_calculation=None)

Parameters:

Name Type Description Default
keyword Optional[str]

the keyword for the calculation

None
reverse bool

determines if the calculation should use inverted edge values if True, default False

False

Supplier

Bases: ABC

Basic interface for supplying graph element values.

TextSearch

Bases: WDCGeneric

Class allowing to search for keywords in the WDC API.

token property writable

Get the password token.

__init__(identifier=None, *, token=None, api='https://dss-wdc.wiso.uni-hamburg.de/api', insecure=False, timeout=60, params=None)

Base class for interacting with the WDC API.

Parameters:

Name Type Description Default
identifier Optional[str]

Identifier of the network data.

None
token Optional[str]

Token for authorization.

None
api str

API address to send request to. Leave this as is.

'https://dss-wdc.wiso.uni-hamburg.de/api'
insecure bool

Hide warning regarding missing https.

False
timeout int

Set the timeout to the server. Increase this if you request large networks.

60
params Optional[dict]

These are additional keyword arguments passed onto the API endpoint. See https://dss-wdc.wiso.uni-hamburg.de/#_complex_datatypes_for_the_api_requests for further assistance.

None

search(domains, terms, exact=True)

Searches the given keywords across a Graph or iterator.

Parameters:

Name Type Description Default
domains Graph | list

Set of identifiers to search in.

required
terms list[str]

Terms to search for.

required

Returns:

Type Description

Updated graph or dict containing the responses, Set of all failed

responses

WDCGeneric

token property writable

Get the password token.

__init__(identifier=None, *, token=None, api='https://dss-wdc.wiso.uni-hamburg.de/api', insecure=False, timeout=60, params=None)

Base class for interacting with the WDC API.

Parameters:

Name Type Description Default
identifier Optional[str]

Identifier of the network data.

None
token Optional[str]

Token for authorization.

None
api str

API address to send request to. Leave this as is.

'https://dss-wdc.wiso.uni-hamburg.de/api'
insecure bool

Hide warning regarding missing https.

False
timeout int

Set the timeout to the server. Increase this if you request large networks.

60
params Optional[dict]

These are additional keyword arguments passed onto the API endpoint. See https://dss-wdc.wiso.uni-hamburg.de/#_complex_datatypes_for_the_api_requests for further assistance.

None

calculate_betweenness_centrality(graph, name='_betweenness', **kwargs)

Updates the nodes in the graph with betweenness centrality.

Parameters:

Name Type Description Default
graph Graph

nx.Graph The graph to calculate on.

required
name str

Name of the centrality type.

'_betweenness'
**kwargs

All arguments passed onto nx.betweenness_centrality (see https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html#betweenness-centrality)

{}

Returns:

Type Description

Graph including the closeness centrality.

calculate_closeness_centrality(graph, name='_closeness', **kwargs)

Updates the nodes in the graph with closeness centrality.

Parameters:

Name Type Description Default
graph Graph

nx.Graph The graph to calculate on.

required
name str

Name of the centrality type.

'_closeness'
**kwargs

All arguments passed onto nx.closeness_centrality (see https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.closeness_centrality.html#closeness-centrality)

{}

Returns:

Type Description

Graph including the betweenness centrality.

clean_graph_data_attributes(graph)

Replace empty strings in data attributes with np.nan.

ensure_file_format(path, user_saving_format, *, default_format)

Ensures that the provided path has a saving format.

Parameters:

Name Type Description Default
path str | Path

the path that needs to be validated

required
user_saving_format str | None

the saving format provided by the user

required
default_format str

the format a programmer can set that will be used as default, if no format was provided at all

required

Returns:

Type Description
tuple[Path, str]

the filepath and format (without leading dot) as an 2-Tuple

filtering(base, new_values, attr, predicate)

Filter the visual values of the graph element based on the attr values.

If the predicate is True the value of the new Mapping is applied, keyed by node. If the predicate is False the value of the base Mapping is applied, keyed by node. If the base mapping contains a fallback, the nodes with the fallback value will retain that value.

Parameters:

Name Type Description Default
base GenericMapping

the original Mapping whose values will be assigned if the supplier value

required
new_values GenericMapping

the new Mapping whose values will be assigned if the supplier value

required
attr str | Supplier

attribute that provides the values to be evaluated by the predicate

required
predicate

the expression to evaluate the attr values, must return a boolean

required

Returns:

Type Description

Filter Mapping filtered by predicate applied to attr by assigning new_values if

predicate is True and base values if predicate is False.

Examples:

G.add_node("a", rating=3, school_type="uni")
G.add_node("b", rating=7, school_type="college")

rating = sequential("rating", out_range=(12, 36))

new_mapping = fixed(1)

ig.nodes.set_sizes(filtering(rating, new_mapping, "school_type", lambda x:x is "uni"))

sizes = a: 1, b: 36
node a has been given the value 1 because the "school_type" is "uni"
node b remains unchanged

fixed(value)

Set a fixed value, that is constant across all items in the chosen graph element.

Parameters:

Name Type Description Default
value

v

required

Returns:

Type Description

FixedValue

Examples:

ig.nodes.set_sizes(fixed(75))
# the size of all nodes is now 75

ig.edges.set_colors("green")
# the color of all nodes is now green

from_node(node_mapping, source, *, fallback=None)

Assign a value from a node to the edges.

Parameters:

Name Type Description Default
node_mapping GenericMapping

GenericMapping the color should correspond to.

required
source Literal['incoming', 'outgoing', 'matching']

"incoming" uses the value from the incoming node | "outgoing"

required
fallback

the fallback value assigned when incoming and outgoing nodes

None

import_attributes_from_csv(graph, filepath, import_columns, index_label='', cleanup_functions=None)

Import attributes from CSV file with some cleanup.self.

Parameters:

Name Type Description Default
graph DiGraph

Graph on which the data should be applied to.

required
filepath str

Path of the CSV file.

required
import_columns list[str]

Columns to be imported, can be None.

required
index_label

Column name used as index, defaults to first column. (default None)

''
cleanup_functions Functions

to be applied on the DataFrame. (default None)

None

Returns:

Type Description
DiGraph

nx.DiGraph: Graph with the applied data.

import_attributes_from_dataframe(graph, df)

Import attributes from Pandas dataframe.

The index of the dataframe should be the name of the graph node. Non-existing nodes are ignored and will not get the attribute.

Parameters:

Name Type Description Default
graph DiGraph

Graph on which the data should be applied to.

required
filepath DataFrame

Path of the CSV file.

required
import_columns list[str]

Columns to be imported, can be None.

required

Returns:

Type Description
DiGraph

nx.DiGraph: Graph with the applied data.

import_from_dsscode(slug, snapshot, token, domain='dss-graph.wiso.uni-hamburg.de', cache=True, remove_selfloops=True, contract_redirects=False, explicit_include=False)

Import Graph object from dssCode.

Parameters:

Name Type Description Default
slug str

Name slug of the project (see dssCode-Interface)

required
snapshot str

Snapshot hash

required
domain str

The domain for the API call

'dss-graph.wiso.uni-hamburg.de'
cache (bool, Path, str)

Pass the cache directory. Defaults to temporary dir.

True
remove_selfloops bool

Remove edge selfloops.

True
contract_redirects bool

Contract redirecting nodes into one.

False
explicit_include bool

Include only explicitely marked nodes into graph

False

Returns:

Type Description
DiGraph

nx.DiGraph: Graph with the imported data.

import_network(filepath, remove_selfloops=True)

Import network as a NetworkX directed graph and clean up circular edges.

percentile(base, new_values, attr, perc_range=None, method='linear')

Parameters:

Name Type Description Default
base GenericMapping

the Mapping, whose values are assigned to the node if the attr values are inside the perc_range

required
new_values GenericMapping

the Mapping, whose values are assigned to the node if the attr values are outside the perc_range

required
attr str | Supplier

the attribute with which the percentile is calculated, must contain numeric values

required
perc_range tuple

tuple of min and max range for the percentile calculation, must be between 0 and 100

None
method str

str, optional, default "linear" This parameter specifies the method to use for estimating the percentile. There are many different methods, some unique to NumPy. See the notes for explanation. The options sorted by their R type as summarized in the H&F paper [1]_ are:

  1. 'inverted_cdf'
  2. 'averaged_inverted_cdf'
  3. 'closest_observation'
  4. 'interpolated_inverted_cdf'
  5. 'hazen'
  6. 'weibull'
  7. 'linear' (default)
  8. 'median_unbiased'
  9. 'normal_unbiased'

The first three methods are discontinuous. NumPy further defines the following discontinuous variations of the default 'linear' (7.) option:

  • 'lower'
  • 'higher',
  • 'midpoint'
  • 'nearest'
'linear'

Returns:

Type Description

Filter Mapping filtered by percentile of attr assigning values based on new_values

Examples:

G.add_node("a", rating=3)
G.add_node("b", rating=7)

base_mapping = sequential("rating", fallback="orange", cmap="viridis")
new_mapping = fixed("blue")

ig.nodes.set_colors(percentile(base_mapping, new_mapping, "degree", perc_range=(0, 50))

qualitative(attr, mapping=None, *, cmap=None)

Use an attribute or colormap as value.

Parameters:

Name Type Description Default
attr str | Code

str name of the category

required
mapping dict | None

dict of category values as the key and desired values as the values

None
cmap

str of a valid colormap or colormap object

None

Returns:

Type Description

Nominal

Examples:

G.add_node("a", pet="dog")
G.add_node("b", pet="cat")

ig.nodes.set_colors(qualitative("rating", {"cat": "red", "dog": "green"}))
# color for node "a" is now "green" and "red" for node "b"

ig.nodes.set_colors(qualitative("rating", cmap="Pastel1"))
# color for node "a" is now the first color in the Pastel1 colormap and the
# second color for node "b"

read_from_pickle(folder='', timestamp='')

Read cached graph from directory.

Automatically selects the newest instance, except a timestamp is given.

Parameters:

Name Type Description Default
dir (str, Path)

Path to directory to search for pickles. If empty, default to temp dir.

required
timestamp str

timestamp to explicitely select for.

''

Returns:

Type Description
DiGraph

nx.DiGraph: Graph with the imported data.

sequential(attr, scale='lin', out_range=None, in_range=None, fallback=None, cmap=None)

Scale on graph element or structural graph attributes.

Parameters:

Name Type Description Default
*
required
attr str | Supplier

str name of graph element or structural graph attribute to be scaled

required
scale str | Callable

scale on which the values should be assigned

'lin'
out_range

tuple of the min and max values of the final scale

None
in_range

tuple of the min and max values of the set before normalization

None
fallback

a color value or numeric value for None values

None

Returns:

Type Description

Sequential

Examples:

G.add_node("a", rating=3)
G.add_node("b", rating=7)

ig.nodes.set_sizes(sequential("degree", "log", out_range=(12, 36), fallback=5))

ig.nodes.set_sizes(sequential("rating", linear(), out_range=(12, 36), fallback=5))

ig.nodes.set_colors(sequential("rating", fallback="orange", cmap="viridis"))